Abstract
This paper proposes an integrated speech front-end for both speech recognition and speech reconstruction applications. Speech is first decomposed into a set of frequency bands by an auditory model. The output of this is then used to extract both robust pitch estimates and MFCC vectors. Initial tests used a 128 channel auditory model, but results show that this can be reduced significantly to between 23 and 32 channels. A detailed analysis of the pitch classification accuracy and the RMS pitch error shows the system to be more robust than both comb function and LPC-based pitch extraction. Speech recognition results show that the auditory-based cepstral coefficients give very similar performance to conventional MFCCs. Spectrograms and informal listening tests also reveal that speech reconstructed from the auditory-based cepstral coefficients and pitch has similar quality to that reconstructed from conventional MFCCs and pitch.
Original language | English |
---|---|
Pages | 1725-1728 |
Number of pages | 4 |
Publication status | Published - Sep 2003 |
Event | Eurospeech-2003 — 8th European Conference on Speech Communication and Technology - Geneva, Switzerland Duration: 1 Sep 2003 → 4 Sep 2003 |
Conference
Conference | Eurospeech-2003 — 8th European Conference on Speech Communication and Technology |
---|---|
Country/Territory | Switzerland |
City | Geneva |
Period | 1/09/03 → 4/09/03 |