This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCC vectors and formant vectors using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predicts formants from the closest, in some sense, cluster to the input MFCC vector, while the second method takes a weighted contribution of formants from all clusters. Experimental results are presented using the ETSI Aurora connected digit database and show that the predicted formant frequency is within 3.25% of the reference formant frequency, as measured from hand-corrected formant tracks.
|Publication status||Published - Aug 2004|
|Event||COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (Robust2004) - University of East Anglia, Norwich, United Kingdom|
Duration: 30 Aug 2004 → 31 Aug 2004
|Conference||COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (Robust2004)|
|Period||30/08/04 → 31/08/04|