Pitch prediction from MFCC vectors for speech reconstruction

X. Shao, B. P. Milner

Research output: Contribution to conferenceOther

34 Citations (Scopus)

Abstract

The paper proposes a technique for reconstructing an acoustic speech signal solely from a stream of Mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first method is based on a Gaussian mixture model (GMM) while the second scheme utilises the temporal correlation available from a hidden Markov model (HMM) framework. A formal measurement of both frame classification accuracy and RMS pitch error shows that an HMM-based scheme with 5 clusters per state is able to classify correctly over 94% of frames and has an RMS pitch error of 3.1 Hz in comparison to a reference pitch. Informal listening tests and analysis of spectrograms reveals that speech reconstructed solely from the MFCC vectors is almost indistinguishable from that using the reference pitch.
Original languageEnglish
PagesI-97-100
DOIs
Publication statusPublished - May 2004
EventIEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) - Philadelphia, United States
Duration: 18 Mar 200523 Mar 2005

Conference

ConferenceIEEE International Conference on Acoustics Speech and Signal Processing (ICASSP)
Country/TerritoryUnited States
CityPhiladelphia
Period18/03/0523/03/05

Cite this