Speech feature extraction and reconstruction

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

This chapter is concerned with feature extraction and back-end speech reconstruction and is particularly aimed at distributed speech recognition (DSR) and the work carried out by the ETSI Aurora group. Feature extraction is examined first and begins with a basic implementation of mel-frequency cepstral coefficients (MFCCs). Additional processing, in the form of noise and channel compensation, is explained and has the aim of increasing speech recognition accuracy in real-world environments. Source and channel coding issues relevant to DSR are also briefly discussed. Back-end speech reconstruction using a sinusoidal model is explained and it is shown how this is possible by transmitting additional source information (voicing and fundamental frequency) from the terminal device. An alternative method of back-end speech reconstruction is then explained, where the voicing and fundamental frequency are predicted from the received MFCC vectors. This enables speech to be reconstructed solely from the MFCC vector stream and requires no explicit voicing and fundamental frequency transmission.
Original languageEnglish
Title of host publicationAutomatic Speech Recognition on Mobile Devices and over Communication Networks
PublisherSpringer
Pages107–130
Number of pages24
VolumeChapter 6
DOIs
Publication statusPublished - 2008

Publication series

NameAdvances in Pattern Recognition

Cite this