A study is presented to determine the relative importance of different visual features for speech recognition which includes pixel-based, model-based, contour-based and physical features. Analysis to determine the discriminability of features is per- formed through F-ratio and J-measures for both static and tem- poral derivatives, the results of which were found to correlate highly with speech recognition accuracy (r = 0.97). Princi- pal component analysis is then used to combine all visual fea- tures into a single feature vector, of which further analysis is performed on the resulting basis functions. An optimal feature vector is obtained which outperforms the best individual feature (AAM) with 93.5 % word accuracy.
|Publication status||Published - 2015|
|Event||FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing - Austria, Vienna, Austria|
Duration: 11 Sep 2015 → 13 Sep 2015
|Conference||FAAVSP - The 1st Joint Conference on Facial Analysis, Animation and Auditory-Visual Speech Processing|
|Abbreviated title||FAAVSP 2015|
|Period||11/09/15 → 13/09/15|