Noisy audio speech enhancement using Wiener filters derived from visual speech

Ben Milner, Ibrahim Almajai

Research output: Contribution to conferenceOther


The aim of this paper is to use visual speech information to create Wiener filters for audio speech enhancement. Wiener filters require estimates of both clean speech statistics and noisy speech statistics. Noisy speech statistics are obtained from the noisy input audio while obtaining clean speech statistics is more difficult and is a major problem in the creation of Wiener filters for speech enhancement. In this work the clean speech statistics are estimated from frames of visual speech that are extracted in synchrony with the audio. The estimation procedure begins by modelling the joint density of clean audio and visual speech features using a Gaussian mixture model (GMM). Using the GMM and an input visual speech vector a maximum a posterior (MAP) estimate of the audio feature is made. The effectiveness of speech enhancement using the visually-derived Wiener filter has been compared to a conventional audio-based Wiener filter implementation using a perceptual evaluation of speech quality (PESQ) analysis. PESQ scores in train noise at different signal-to-noise ratios (SNRs) show that the visuallyderived Wiener filter significantly outperforms the audio- Wiener filter at lower SNRs.
Original languageEnglish
Publication statusPublished - 2007
EventAuditory-Visual Speech Processing 2007 (AVSP2007) - Kasteel Groenendaal, Hilvarenbeek, Netherlands
Duration: 31 Aug 20073 Sep 2007


ConferenceAuditory-Visual Speech Processing 2007 (AVSP2007)

Cite this