Supervised and Unsupervised Adaptation to Regional Accented Speech using Limited Data for Automatic Speech Recognition

Maryam Najafian, Andrea de Marco, Stephen Cox, Martin Russell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper is concerned with automatic speech recognition (ASR) for accented speech. Given a small amount of speech from a new speaker, is it better to apply speaker adaptation to the baseline, or to use accent identification (AID) to identify the speaker’s accent and select an accent-dependent acoustic model? Three accent-based model selection methods are inves- tigated: using the ‘true’ accent model, and unsupervised model selection using i-Vector and phonotactic-based AID. All three methods outperform the unadapted baseline. Most significantly, AID-based model selection using 43s of speech performs bet- ter than unsupervised speaker adaptation, even if the latter uses five times more adaptation data. Combining unsupervised AID- based model selection and speaker adaptation gives an average relative reduction in ASR error rate of up to 47%.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2014
Place of PublicationSingapore
Publication statusPublished - Sep 2014

Cite this