Abstract
We describe experiments in visual-only language identification (VLID), in which only lip shape, appearance and motion are used to determine the language of a spoken utterance. In previous work, we had shown that this is possible in speaker-dependent mode, i.e. identifying the language spoken by a multi-lingual speaker. Here, by appropriately modifying techniques that have been successful in audio language identification, we extend the work to discriminating two languages in speaker-independent mode. Our results indicate that even with viseme accuracy as low as about 34%, reasonable discrimination can be obtained. A simulation of degraded accuracy viseme recognition performance indicates that high VLID accuracy should be achievable with viseme recognition errors of the order of 50%.
| Original language | English |
|---|---|
| Pages | 5026-5029 |
| Number of pages | 4 |
| DOIs | |
| Publication status | Published - Mar 2010 |
| Event | IEEE International Conference on Acoustics, Speech, and Signal Processing - Dallas, United States Duration: 14 Mar 2010 → 19 Mar 2010 |
Conference
| Conference | IEEE International Conference on Acoustics, Speech, and Signal Processing |
|---|---|
| Country/Territory | United States |
| City | Dallas |
| Period | 14/03/10 → 19/03/10 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver