Two dimensional (2D) shape and appearance models are applied to the problem of creating a near-videorealistic talking head. A speech corpus of a talker uttering a set of phonetically balanced training sentences is analysed using a generative model of the human face. Segments of original parameter trajectories, corresponding to the synthesis unit (e.g.~triphone), are extracted from a codebook, then normalised, blended, concatenated and smoothed before being applied to the model to give natural, realistic animations of novel utterances. The system provides a 2D image sequence corresponding to the face of a talker. It is also used to animate the face of a 3D avatar by displacing the mesh according to movements of points in the shape model and dynamically texturing the face polygons using the appearance model.
|Number of pages
|Published - 2003
|British Machine Vision Conference - Oxford Brookes University, Oxford, United Kingdom
Duration: 5 Sep 2005 → 8 Sep 2005
|British Machine Vision Conference
|5/09/05 → 8/09/05