Projects per year
Data-driven synthesis of human motion during conversational speech is an active research area with applications that include character animation, computer gaming and conversational agents. Natural looking motion is key to both perceived realism and understanding of any synthesised animation. Multi-modal speech and body-motion data is scarce and limited, so it is common to augment real motion data by mirroring the body pose to double the number of training samples. This augmentation is based on the assumption that a person’s gesturing is not affected by handedness and that the reflected pose is plausible. In this study, we explore the validity of this assumption by evaluating the reflective symmetry of a speaker’s arms during conversational exchanges. We analyse the left and right arm motion of 36 subjects during dyadic conversation and present the per-frame symmetry of the arm gestures. To identify temporal offsets caused by the presence of a leading hand, we compute the time lag between movements of the left and right arms. We perform a nearest neighbour search to test the validity of any mirrored pose. We also consider information theory to examine the information gain from mirroring the data. We implement a speech-to-gesture generative model to determine the efficacy of lateral mirroring techniques for data augmentation. Our findings suggest that both positional symmetry and left–right motion offsets vary from speaker to speaker. We conclude that data augmentation by mirroring is valid in certain cases when considering the mirrored pose as a new virtual identity, but that it should be carefully considered as a generic approach if the gesturing style and handedness of the original speaker is to be maintained.
- Speech-driven conversational agents
- Motion symmetry
- Conversational gesture analysis
- 1 Finished