Is the Speaker Done Yet? Faster and More Accurate End-of-Utterance Detection Using Prosody in Human-Computer Dialog

Citation

Ferrer, L., Shriberg, E., & Stolcke, A. (2002). Is the speaker done yet? Faster and more accurate end-of-utterance detection using prosody. In Seventh international conference on spoken language processing.

Abstract

We examine the problem of end-of-utterance (EOU) detection for real-time speech recognition, particularly in the context of a human-computer dialog system. Current EOU detection algorithms use only a simple pause threshold for making this decision, leading to two problems. First, users often pause inside utterances, resulting in a premature cut off by the system. Second, when users really are done, the minimum system wait is always the threshold value, needlessly adding time to the interaction. We have developed a new approach to EOU detection that uses prosodic features to address both of these problems. Prosodic features are modeled by decision trees and combined with an event N-gram language model to obtain a score that measures the likelihood that any nonspeech region is an EOU. We find that this approach dramatically improves both the accuracy and speed of online EOU detection.


Read more from SRI

  • A photo of Mary Wagner

    Recognizing the life and work of Mary Wagner 

    A cherished SRI colleague and globally respected leader in education research, Mary Wagner leaves behind an extraordinary legacy of groundbreaking work supporting children and youth with disabilities and their families.

  • Testing XRGo in a robotics laboratory

    Robots in the cleanroom

    A global health leader is exploring how SRI’s robotic telemanipulation technology can enhance pharmaceutical manufacturing.

  • SRI research aims to make generative AI more trustworthy

    Researchers have developed a new framework that reduces generative AI hallucinations by up to 32%.