Integrated Pitch and MFCC Extraction for Speech Reconstruction and Speech Recognition Applications

Research output: Contribution to conferencePaper

6 Citations (Scopus)

Abstract

This paper proposes an integrated speech front-end for both speech recognition and speech reconstruction applications. Speech is first decomposed into a set of frequency bands by an auditory model. The output of this is then used to extract both robust pitch estimates and MFCC vectors. Initial tests used a 128 channel auditory model, but results show that this can be reduced significantly to between 23 and 32 channels. A detailed analysis of the pitch classification accuracy and the RMS pitch error shows the system to be more robust than both comb function and LPC-based pitch extraction. Speech recognition results show that the auditory-based cepstral coefficients give very similar performance to conventional MFCCs. Spectrograms and informal listening tests also reveal that speech reconstructed from the auditory-based cepstral coefficients and pitch has similar quality to that reconstructed from conventional MFCCs and pitch.
Original languageEnglish
Pages1725-1728
Number of pages4
Publication statusPublished - Sep 2003
EventEurospeech-2003 — 8th European Conference on Speech Communication and Technology - Geneva, Switzerland
Duration: 1 Sep 20034 Sep 2003

Conference

ConferenceEurospeech-2003 — 8th European Conference on Speech Communication and Technology
Country/TerritorySwitzerland
CityGeneva
Period1/09/034/09/03

Cite this