Abstract
In this paper we investigate the limits of automated lip-reading systems and we consider the improvement that could be gained were additional information from other (non-visible) speech articulators available to the recogniser. Hidden Markov model (HMM) speech recognisers are trained using electromagnetic articulography (EMA) data drawn from the MOCHA-TIMIT data set. Articulatory information is systematically withheld from the recogniser and the performance is tested and compared with that of a typical state of the art lip-reading system. We find that, as expected, the performance of the recogniser degrades as articulatory information is lost, and that a typical lip-reading system achieves a level of performance similar to an EMA-based recogniser that uses information from only the front of the tongue forwards. Our results show that there is significant information in the articulator positions towards the back of the mouth that could be exploited were it available, but even this is insufficient to achieve the same level of performance as can be achieved by an acoustic speech recogniser.
Original language | English |
---|---|
Publication status | Published - 2010 |
Event | International Conference on Auditory-Visual Speech Processing - Hakone, Kanagawa, Japan Duration: 1 Jan 2010 → … |
Conference
Conference | International Conference on Auditory-Visual Speech Processing |
---|---|
Country/Territory | Japan |
City | Hakone, Kanagawa |
Period | 1/01/10 → … |