This paper addresses the problem of achieving robust distributed speech recognition in the presence of burst-like packet loss. To compensate for packet loss a number of techniques are investigated to provide estimates of lost vectors. Experimental results on both a connected digits task and a large vocabulary continuous speech recognition task show that simple methods, such as repetition, are not as effective as interpolation methods which are better able to preserve the dynamics of the feature vector stream. Best performance is given by maximum a-posteriori (MAP) estimation of lost vectors which utilizes statistics of the feature vector stream. At longer burst lengths the performance of these compensation techniques deteriorates as the temporal correlation in the received feature vector stream reduces. To compensate for this interleaving is proposed which aims to disperse bursts of loss into a series of unconnected smaller bursts. Results show substantial gains in accuracy, to almost that of the no loss condition, when interleaving is combined with estimation techniques, although this is at the expense of introducing delay. This leads to the proposal that, for a distributed speech recognition application, it is more beneficial to trade delay for accuracy rather than trading bit-rate for accuracy as in forward error correction schemes.
|Number of pages||9|
|Journal||IEEE Transactions on Audio, Speech, and Language Processing|
|Publication status||Published - 2006|