Abstract
This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%.
| Original language | English |
|---|---|
| Pages | 321-324 |
| Number of pages | 4 |
| Publication status | Published - Sept 2005 |
| Event | 9th European Conference on Speech Communication and Technology - Lisbon, Portugal Duration: 4 Sept 2005 → 8 Sept 2005 |
Conference
| Conference | 9th European Conference on Speech Communication and Technology |
|---|---|
| Abbreviated title | INTERSPEECH-2005 |
| Country/Territory | Portugal |
| City | Lisbon |
| Period | 4/09/05 → 8/09/05 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver