Speech & natural language publications
-
Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing
We describe a direct modeling approach to using prosody in various speech technology tasks. Prosodic features are extracted directly from the speech signal and from the output of an automatic…
-
A Fine-Grained Evaluation Method for Speech-to-Speech Machine Translation Using Concept Annotations
This paper describes the development of a concept annotation method for evaluating a narrow domain speech-to-speech translation system and discusses how the scores produced by that method relate to naïve…
-
Prosody Modeling for Speech Recognition and Understanding
This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction,…
-
Deductive Question Answering from Multiple Resources
Questions in natural language are answered by consulting multiple sources and inferring answers from information they provide. An automated deduction system, equipped with an axiomatic application-domain theory, serves as the…
-
Limited-Domain Speech-to-Speech Translation between English and Pashto
This paper describes a prototype system for near-real-time spontaneous, bidirectional translation between spoken English and Pashto.
-
Speaker Recognition using Prosodic and Lexical Features
We investigate the contribution of modeling prosodic and lexical patterns, on performance in the NIST 2003 Speaker Recognition Evaluation extended data task.
-
The Relationship Between Dialogue Acts and Hot Spots in Meetings
We examine the relationship between hot spots and dialogue acts in roughly 32 hours of speech data from naturally-occurring meetings. Results reveal that four independently-motivated involvement categories (non-involved, disagreeing, amused,…
-
Automatic Disfluency Identification in Conversational Speech Using Multiple Knowledge Sources
This work investigates a number of knowledge sources for disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns, a part-of-speech (POS) based LM, and rule-based…
-
Modeling Duration Patterns for Speaker Recognition
We present a method for speaker recognition that uses the duration patterns of speech units to aid speaker classification. The approach represents each word and/or phone by a feature vector…
-
Development of Phrase Translation Systems for Handheld Computers: from Concept to Field
We describe the development and conceptual evolution of handheld spoken phrase translation systems, beginning with an initial undirectional system for translation of English phrases, and later extending to a limited…
-
Spotting “Hot Spots” in Meetings: Human Judgments and Prosodic Cues
Recent interest in the automatic processing of meetings is motivated by a desire to summarize, browse, and retrieve important information from lengthy archives of spoken data. One of the most…
-
Iterative Statistical Language Model Generation for Use with an Agent-Oriented Natural Language Interface
We describe a method for developing a statistical language model (SLM) with high keyword spotting accuracy for a natural language interface (NLI). The NLI is based on the Adaptive Agent…