The technique of distributed speech recognition (DSR) has recently become an interesting area of research. One of the main issues with DSR is the need to compress the feature vector stream, produced on the terminal device, into a sufficiently low bit-rate such that it can be sent across low bandwidth channels. This work proposes a compression technique based upon first transforming a block of feature vectors into a more compact matrix representation. Columns of the resulting matrix that correspond to faster temporal variation can be removed without loss in recognition performance. The number of bits allocated to the remaining coefficients in the matrix is determined automatically, based on a measure of the information present. Experiments show that the transform-based compression gives good recognition accuracy at bit rates of 4800, 2400 and 1200bps. For example at 1200bps the recognition performance is 98.03% compared to 98.57% with uncompressed speech.
|Number of pages||4|
|Publication status||Published - Sep 2002|
|Event||7th International Conference on Spoken Language Processing (ICSLP-2002) - Denver, Colorado, United States|
Duration: 16 Sep 2002 → 20 Sep 2002
|Conference||7th International Conference on Spoken Language Processing (ICSLP-2002)|
|Period||16/09/02 → 20/09/02|