This work compares the accuracy of fundamental frequency and formant frequency estimation methods and maximum a posteriori (MAP) prediction from MFCC vectors with hand-corrected references. Five fundamental frequency estimation methods are compared to fundamental frequency prediction from MFCC vectors in both clean and noisy speech. Similarly, three formant frequency estimation and prediction methods are compared. An analysis of estimation and prediction accuracy shows that prediction from MFCCs provides the most accurate voicing classification across clean and noisy speech. On clean speech, fundamental frequency estimation outperforms prediction from MFCCs, but as noise increases the performance of prediction is significantly more robust than estimation. Formant frequency prediction is found to be more accurate than estimation in both clean and noisy speech. A subjective analysis of the estimation and prediction methods is also made by reconstructing speech from the acoustic features.
|Number of pages
|Published - 2007
|8th Annual Conference of the International Speech Communication Association (Interspeech 2007) - Antwerp, Belgium
Duration: 27 Aug 2007 → 31 Aug 2007
|8th Annual Conference of the International Speech Communication Association (Interspeech 2007)
|27/08/07 → 31/08/07