Abstract
A major problem with most speaker adaptation schemes is that they
rely on the speaker providing at least one example of each acoustic
unit (word, phone, triphone etc.) in the vocabulary in order to
adapt the appropriate model. Rapid adaptation is difficult to achieve
and some sounds may never be adapted because they are never heard.
In this paper, a technique of adapting all the speech models
to a new speaker's voice when he has
given an incomplete set of the vocabulary is presented.
The technique is based upon using the training-set to obtain
estimates of correlations between sounds.
Given some sounds from a new speaker at
recognition time, these correlations are used to obtain estimates of unheard
sounds which are used to adapt the speech models.
The technique was applied to a database of 104 speakers speaking the English
alphabet. When speakers spoke half of the vocabulary for enrollment prior to
recognition, the technique gave a 78\% decrease in error.
Original language | English |
---|---|
Pages (from-to) | 1-17 |
Number of pages | 17 |
Journal | Computer Speech and Language |
Volume | 9 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 1995 |