To undertake machine lip-reading, we try to recognise speech from a visual signal. Current work often uses viseme classification supported by language models with varying degrees of success. A few recent works suggest phoneme classification, in the right circumstances, can outperform viseme classification. In this work we present a novel two-pass method of training phoneme classifiers which uses previously trained visemes in the first pass. With our new training algorithm, we show classification performance which significantly improves on previous lip-reading results.
|Publication status||Published - 2016|
|Event||International Conference on Acoustics, Speech, and Signal Processing - Shanghai, China|
Duration: 21 Mar 2016 → 25 Mar 2016
|Conference||International Conference on Acoustics, Speech, and Signal Processing|
|Period||21/03/16 → 25/03/16|
- weak learning
- visual speech