Abstract
A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy.
Original language | English |
---|---|
Publication status | Published - 2015 |
Event | FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing - Austria, Vienna, Austria Duration: 11 Sep 2015 → 13 Sep 2015 http://www.isca-speech.org/archive/avsp15/av15_127.html |
Conference
Conference | FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing |
---|---|
Abbreviated title | FAAVSP 2015 |
Country/Territory | Austria |
City | Vienna |
Period | 11/09/15 → 13/09/15 |
Internet address |
Profiles
-
Ben Milner
- School of Computing Sciences - Senior Lecturer
- Interactive Graphics and Audio - Member
- Smart Emerging Technologies - Member
Person: Research Group Member, Academic, Teaching & Research