This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCCs and formant frequencies using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predicts formants from the closest, in some sense, cluster to the input MFCC vector, while the second method takes a weighted contribution of formants predicted from all clusters. Experimental results are presented using the ETSI Aurora connected digit database and show that predicted formant frequencies are within 3.2% of reference formant frequencies.
|Number of pages||4|
|Publication status||Published - 2005|
|Event||IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) - Philadelphia, United States|
Duration: 18 Mar 2005 → 23 Mar 2005
|Conference||IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)|
|Period||18/03/05 → 23/03/05|