This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are inves- tigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs bet- ter than unsupervised speaker adaptation, even if the latter uses five times more adaptation data. Combining unsupervised AID- based model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.
|Title of host publication||Proceedings of Interspeech 2014|
|Place of Publication||Singapore|
|Publication status||Published - Sep 2014|