Transform-based feature vector compression for distributed speech recognition

Ben P. Milner, Xu Shao

Research output: Contribution to conferencePaper

Abstract

The technique of distributed speech recognition (DSR) has recently become an interesting area of research. One of the main issues with DSR is the need to compress the feature vector stream, produced on the terminal device, into a sufficiently low bit-rate such that it can be sent across low bandwidth channels. This work proposes a compression technique based upon first transforming a block of feature vectors into a more compact matrix representation. Columns of the resulting matrix that correspond to faster temporal variation can be removed without loss in recognition performance. The number of bits allocated to the remaining coefficients in the matrix is determined automatically, based on a measure of the information present. Experiments show that the transform-based compression gives good recognition accuracy at bit rates of 4800, 2400 and 1200bps. For example at 1200bps the recognition performance is 98.03% compared to 98.57% with uncompressed speech.
Original languageEnglish
Pages2233-2236
Number of pages4
Publication statusPublished - Sep 2002
Event7th International Conference on Spoken Language Processing (ICSLP-2002) - Denver, Colorado, United States
Duration: 16 Sep 200220 Sep 2002

Conference

Conference7th International Conference on Spoken Language Processing (ICSLP-2002)
CountryUnited States
CityDenver, Colorado
Period16/09/0220/09/02

Cite this