Towards speech recogniser assessment using a human reference standard

Stephen J. Cox, Paul W. Linford, William B. Hill, R. Denis Johnston

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)

Abstract

The measurement of the word error rate (WER) of a speech recognizer is valuable for the development of new algorithms but provides only the most limited information about the performance of the recognizer. We propose the use of a human reference standard to assess the performance of speech recognizers, so that the performance of a recognizer could be quoted as being equivalent to the performance of a human hearing speech which is subject to X dB of degradation. This approach should have the major advantage of being independent of the database and speakers used for testing. Furthermore, it would allow factors beyond the word error rate to be measured, such as the performance within an interactive speech system. In this paper, we report on preliminary work to explore the viability of this approach. This has consisted of recording a suitable database for experimentation, devising a method of degrading the speech in a controlled way and conducting two set of experiments on listeners to measure their responses to degraded speech to establish a reference. Results from these experiments raise several questions about the technique but encourage us to experiment with comparisons with automatic recognizers.
Original languageEnglish
Pages (from-to)375-391
Number of pages17
JournalComputer Speech and Language
Volume12
Issue number4
DOIs
Publication statusPublished - Oct 1998

Cite this