Fundamental Frequency and Voicing Prediction from MFCCs for Speech Reconstruction from Unconstrained Speech

Ben P. Milner, Xu Shao, Jonathan Darch

Research output: Contribution to conferencePaper

Abstract

This work proposes a method to predict the fundamental frequency and voicing of a frame of speech from its MFCC representation. This has particular use in distributed speech recognition systems where the ability to predict fundamental frequency and voicing allows a time-domain speech signal to be reconstructed solely from the MFCC vectors. Prediction is achieved by modeling the joint density of MFCCs and fundamental frequency with a combined hidden Markov model-Gaussian mixture model (HMM-GMM) framework. Prediction results are presented on unconstrained speech using both a speaker-dependent database and a speaker-independent database. Spectrogram comparisons of the reconstructed and original speech are also made. The results show for the speaker-dependent task a percentage fundamental frequency prediction error of 3.1% is made while for the speaker-independent task this rises to 8.3%.
Original languageEnglish
Pages321-324
Number of pages4
Publication statusPublished - Sep 2005
EventInterspeech 2005 - Lisbon, Portugal
Duration: 4 Sep 20058 Sep 2005

Conference

ConferenceInterspeech 2005
CountryPortugal
CityLisbon
Period4/09/058/09/05

Cite this