Speech & natural language publications
-
Is the Speaker Done Yet? Faster and More Accurate End-of-Utterance Detection Using Prosody in Human-Computer Dialog
We examine the problem of end-of-utterance (EOU) detection for real-time speech recognition, particularly in the context of a human-computer dialog system.
-
Using Prosodic and Lexical Information for Speaker Identification
We investigate the incorporation of larger time-scale information, such as prosody, into standard speaker ID systems. Our study is based on the Extended Data Task of the NIST 2001 Speaker…
-
DynaSpeak: SRI’s Scalable Speech Recognizer for Embedded and Mobile Systems
We introduce SRI's new speech recognition engine, DynaSpeak(TM), which is characterized by its scalability and flexibility, high recognition accuracy, memory and speed efficiency, adaptation capability, efficient grammar optimization, support for…
-
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
In this paper, we report on our research and progress on the DARPA-sponsored Hub-4 continuous speech recognition evaluations, with an emphasis on efficient modeling.
-
Building an ASR System for Noisy Environments: SRI’s 2001 SPINE Evaluation System
We describe SRI's recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments.
-
Prosody Modeling for Automatic Speech Recognition and Understanding
This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech.
-
Multispeaker Speech Activity Detection for the ICSI Meeting Recorder
We have developed a more sophisticated approach for multichannel speech activity detection using a simple hidden Markov model (HMM).
-
Can Prosody Aid the Automatic Processing of Multi-Party Meetings? Evidence from Predicting Punctuation, Disfluencies,and Overlapping Speech
We investigate whether probabilistic modeling of prosody can aid various automatic labeling tasks essential for processing of multi-party meetings.
-
Modeling Word Durations
We describe a new method of modeling duration at word level. These duration models are easily trained from the acoustic training data and can be used to rescore N-best lists…
-
Prosody Modeling for Automatic Speech Understanding: An Overview of Recent Research at SRI
In this paper, we summarize recent work at SRI International in the area of computational prosody modeling, and results from several recognition tasks where prosodic knowledge proved to be of…
-
Observations on Overlap: Findings and Implications for Automatic Processing of Multi-Party Conversation
We examine the distribution of overlapping speech in different corpora of natural multi-party conversations, including two types of meetings, and two corpora of telephone conversations.
-
Improved Maximum Mutual Information Estimation Training of Continuous Density HMMs
We derive a new set of equations for MMIE based on a quasi-Newton algorithm, without relying on EBW. We find that by adopting a generalized form of the MMIE criterion,…