Two dimensional (2D) shape and appearance models are applied to the problem of creating a near-videorealistic talking head. A speech corpus of a talker uttering a set of phonetically balanced training sentences is analysed using a generative model of the human face. Segments of original parameter trajectories, corresponding to the synthesis unit (e.g.~triphone), are extracted from a codebook, then normalised, blended, concatenated and smoothed before being applied to the model to give natural, realistic animations of novel utterances. The system provides a 2D image sequence corresponding to the face of a talker. It is also used to animate the face of a 3D avatar by displacing the mesh according to movements of points in the shape model and dynamically texturing the face polygons using the appearance model.
|Number of pages||10|
|Publication status||Published - 2003|
|Event||British Machine Vision Conference - Oxford Brookes University, Oxford, United Kingdom|
Duration: 5 Sep 2005 → 8 Sep 2005
|Conference||British Machine Vision Conference|
|Period||5/09/05 → 8/09/05|