Abstract
This work proposes a novel method of predicting formant frequencies from a stream of mel-frequency cepstral coefficients (MFCC) feature vectors. Prediction is based on modelling the joint density of MFCC vectors and formant vectors using a Gaussian mixture model (GMM). Using this GMM and an input MFCC vector, two maximum a posteriori (MAP) prediction methods are developed. The first method predicts formants from the closest, in some sense, cluster to the input MFCC vector, while the second method takes a weighted contribution of formants from all clusters. Experimental results are presented using the ETSI Aurora connected digit database and show that the predicted formant frequency is within 3.25% of the reference formant frequency, as measured from hand-corrected formant tracks.
Original language | English |
---|---|
Publication status | Published - Aug 2004 |
Event | COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (Robust2004) - University of East Anglia, Norwich, United Kingdom Duration: 30 Aug 2004 → 31 Aug 2004 |
Conference
Conference | COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction (Robust2004) |
---|---|
Country/Territory | United Kingdom |
City | Norwich |
Period | 30/08/04 → 31/08/04 |