Resolution Limits on visual speech recognition

Helen L. Bear, Richard Harvey, Yuxuan Lan

Research output: Contribution to conferencePaperpeer-review

17 Citations (Scopus)

Abstract

Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.
Original languageEnglish
Pages1371-1375
DOIs
Publication statusPublished - Oct 2014
EventInternational Conference on Image Processing - San Diego, United States
Duration: 12 Oct 200815 Oct 2008

Conference

ConferenceInternational Conference on Image Processing
Country/TerritoryUnited States
CitySan Diego
Period12/10/0815/10/08

Cite this