Projects per year
Abstract
Automatic control of conversational agents has applications from animation, through human-computer interaction, to robotics. In interactive communication, an agent must move to express its own discourse, and also react naturally to incoming speech. In this paper we propose a Flow Variational Autoencoder (Flow-VAE) deep learning architecture for transforming conversational speech to body gesture, during both speaking and listening. The model uses a normalising flow to perform variational inference in an autoencoder framework and is a more expressive distribution than the Gaussian approximation of conventional variational autoencoders. Our model is non-deterministic, so can produce variations of plausible gestures for the same speech. Our evaluation demonstrates that our approach produces expressive body motion that is close to the ground truth using a fraction of the trainable parameters compared with previous state of the art.
Original language | English |
---|---|
Title of host publication | CVMP '21: European Conference on Visual Media Production |
Pages | 1-9 |
Number of pages | 9 |
Publication status | Published - 6 Dec 2021 |
Keywords
- Speech animation
- normalising flows
- conversational agents
- variational autoencoders
Projects
- 1 Finished
-
Dynamically Accurate Avatars
Taylor, S.
Engineering and Physical Sciences Research Council
25/06/18 → 28/06/22
Project: Fellowship