A Real-Time Speech-Driven Talking Head using Active Appearance Models

Barry-John Theobald, Nicholas Wilkinson

Research output: Contribution to conferencePaper

Abstract

In this paper we describe a real-time speech-driven method for synthesising realistic video sequences of a subject enunciating arbitrary phrases. In an offline training phase an active appearance model (AAM) is constructed from hand-labelled images and is used to encode the face of a subject reciting a few training sentences. Canonical correlation analysis (CCA) coupled with linear regression is then used to model the relationship between auditory and visual features, which is later used to predict visual features from the auditory features for novel utterances. We present results from experiments conducted: 1) to determine the suitability of several auditory features for use in an AAM-based speech-driven talking head, 2) to determine the effect of the size of the training set on the correlation between the auditory and visual features, 3) to determine the influence of context on the degree of correlation, and 4) to determine the appropriate window size from which the auditory features should be calculated. This approach shows promise and a longer term goal is to develop a fully expressive, three-dimensional talking head.
Original languageEnglish
Publication statusPublished - 2007
EventInternational Conference on Audio-Visual Speech Processing (AVSP) - Kasteel Groenendaal, Hilvarenbeek, Netherlands
Duration: 31 Aug 20073 Sep 2007

Conference

ConferenceInternational Conference on Audio-Visual Speech Processing (AVSP)
Country/TerritoryNetherlands
CityHilvarenbeek
Period31/08/073/09/07

Cite this