Pitch prediction from MFCC vectors for speech reconstruction

X. Shao, B. P. Milner

Research output: Contribution to conferenceOther

33 Citations (Scopus)


The paper proposes a technique for reconstructing an acoustic speech signal solely from a stream of Mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first method is based on a Gaussian mixture model (GMM) while the second scheme utilises the temporal correlation available from a hidden Markov model (HMM) framework. A formal measurement of both frame classification accuracy and RMS pitch error shows that an HMM-based scheme with 5 clusters per state is able to classify correctly over 94% of frames and has an RMS pitch error of 3.1 Hz in comparison to a reference pitch. Informal listening tests and analysis of spectrograms reveals that speech reconstructed solely from the MFCC vectors is almost indistinguishable from that using the reference pitch.
Original languageEnglish
Publication statusPublished - May 2004
EventIEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) - Philadelphia, United States
Duration: 18 Mar 200523 Mar 2005


ConferenceIEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
Country/TerritoryUnited States

Cite this