The paper presents a novel method for the low bit-rate compression of a feature vector stream with particular application to distributed speech recognition. The scheme operates by grouping feature vectors into non-overlapping blocks and applying a transformation to give a more compact matrix representation. Both Karhunen-Loeve and discrete cosine transforms are considered. Following transformation, higher-order columns of the matrix can be removed without loss in recognition performance. The number of bits allocated to the remaining elements in the matrix is determined automatically using a measure of their relative information content. Analysis of the amplitude distribution of the elements indicates that non-linear quantisation is more appropriate than linear quantisation. Comparative results, based on both spectral distortion and speech recognition accuracy, confirm this. Speech recognition tests using the ETSI Aurora database demonstrate that compression to bits rates of 2400 bps, 1200 bps and 800 bps has very little effect on recognition accuracy. For example at a bit rate of 1200 bps, recognition accuracy is 98.0% compared to 98.6% with no compression.
|Number of pages
|Published - Apr 2003
|IEEE International Conference on Acoustics Speech and Signal Processing - Hong Kong, China
Duration: 6 Apr 2003 → 10 Apr 2003
|IEEE International Conference on Acoustics Speech and Signal Processing
|6/04/03 → 10/04/03