Abstract
In this paper we present initial work towards a video-realistic visual speech synthesiser based on statistical models of shape and appearance. A synthesised image sequence corresponding to an utterance is formed by concatenation of synthesis units (in this case phonemes) from a pre-recorded corpus of training data. A smoothing spline is applied to the concatenated parameters to ensure smooth transitions between frames and the resultant parameters applied to the model—early results look promising.
Original language | English |
---|---|
Pages | 3892-3895 |
Number of pages | 4 |
DOIs | |
Publication status | Published - May 2002 |
Event | IEEE International Conference on Acoustics, Speech and Signal Processing - Orlando, United States Duration: 13 May 2002 → 17 May 2002 |
Conference
Conference | IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP- 2002 |
Country/Territory | United States |
City | Orlando |
Period | 13/05/02 → 17/05/02 |