Abstract
There are many circumstances in which it is useful or necessary to recognise phones rather than words, but phone recognition is inherently less accurate than word recognition. We describe here a post-recognition method for "translating" an errorful phone string output by a speech recogniser into a string that more closely matches the transcription. The technique owes something to Kohonen's idea of "dynamically expanding context" in that it learns from the errors made by the recogniser in a particular context, but it uses many contexts rather than a single context to estimate the "translation" of a recognised phone. The weights given to the different contexts in estimating the translation are determined discriminatively. On the WSJCAM0 database, the technique gives a 19.2% relative improvement in phone errors (including insertions) over the baseline, compared with a 6.2% improvement obtained using dynamically expanding context.
Original language | English |
---|---|
Pages | 2061-2064 |
Number of pages | 4 |
Publication status | Published - 2004 |
Event | 8th International Conference on Spoken Language Processing (Interspeech 2004) - Jeju Island, South Korea Duration: 4 Oct 2004 → 8 Oct 2004 |
Conference
Conference | 8th International Conference on Spoken Language Processing (Interspeech 2004) |
---|---|
Country/Territory | South Korea |
City | Jeju Island |
Period | 4/10/04 → 8/10/04 |