Abstract
We present work towards videorealistic synthetic visual speech using non-rigid appearance models. These models are used to track a talking face enunciating a set of training sentences. The resultant parameter trajectories are used in a concatenative synthesis scheme, where samples of original data are extracted from a corpus and concatenated to form new unseen sequences. Here we explore the effect on the synthesiser output of blending several synthesis units considered similar to the desired unit. We present preliminary subjective and objective results used to judge the realism of the system.
Original language | English |
---|---|
Pages | 800-803 |
Number of pages | 4 |
DOIs | |
Publication status | Published - 2003 |
Event | IEEE International Conference on Acoustics, Speech and Signal Processing - Hong Kong, China Duration: 6 Apr 2003 → 10 Apr 2003 |
Conference
Conference | IEEE International Conference on Acoustics, Speech and Signal Processing |
---|---|
Abbreviated title | ICASSP '03 |
Country/Territory | China |
City | Hong Kong |
Period | 6/04/03 → 10/04/03 |