Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end

Ben P. Milner, Xu Shao

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)


The aim of this work is to enable a noise-free time-domain speech signal to be reconstructed from a stream of MFCC vectors and fundamental frequency and voicing estimates, such as may be received in a distributed speech recognition system. To facilitate reconstruction, both a sinusoidal model and a source-filter model of speech are compared by listening tests and spectrogram analysis, with the result that the former provides higher quality speech reconstruction. Analysis of the sinusoidal model shows that for clean speech reconstruction, both a noise-free spectral envelope and a robust estimate of the fundamental frequency and voicing are necessary. Investigation into fundamental frequency estimation reveals that an auditory model based approach gives superior performance over other methods of estimation. This leads to the proposal of an integrated front-end which uses the auditory model for both fundamental frequency and voicing estimation, and as the filterbank stage in MFCC extraction, and thereby reduces computation. Applying spectral subtraction to the auditory model parameters improves the spectral envelope estimates needed for clean speech reconstruction. Experiments on the Aurora connected digits database show that the auditory model-based MFCCs give comparable performance to that attained with conventional MFCCs. Speech reconstruction tests reveal that the combination of robust fundamental frequency and voicing estimation with spectral subtraction in the integrated front-end leads to intelligible and relatively noise-free speech.
Original languageEnglish
Pages (from-to)697-715
Number of pages19
JournalSpeech Communication
Issue number6
Publication statusPublished - Jun 2006

Cite this