Speech & natural language publications
Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech?
This study asks whether current approaches, which use mainly word information, could be improved by adding prosodic information. The study is based on more than 1000 conversations from the Switchboard…
MVIEWS: Multimodal Tools for the Video Analyst
SRI has developed MVIEWS, a system for annotating, indexing, extracting, and disseminating information from video streams for surveillance and intelligence applications. MVIEWS is implemented within the Open Agent Architecture, a…
Discriminative Training of Minimum Cost Speaker Verification Systems
This paper presents a new training procedure for speaker verification systems. Results are presented from the 1997 NIST Speaker Recognition Evaluation corpus indicating that the VCF performance can be improved…
Automatic Detection of Discourse Structure for Speech Recognition and Understanding
We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 `Dialog Acts' (DAs), (question, answer, backchannel, agreement,…
A Prosody-Only Decision-Tree Model for Disfluency Detection
We have developed a disfluency detection method using decision tree classifiers that use only local and automatically extracted prosodic features. Because the model doesn't rely on lexical information, it is…
Mixture Input Transformations for Adaptation of Hybrid Connectionist Speech Recognizers
In this paper, we propose a new algorithm to train mixtures of transformation networks (MTNs) in the hybrid connectionist recognition framework. We apply the new algorithm to nonnative speaker adaptation,…
Explicit Word Error Minimization in N-best List Rescoring
We show that the standard hypothesis scoring paradigm used in maximum-likelihood-based speech recognition systems is not optimal with regard to minimizing the word error rate, the commonly used performance metric…
Automatic Pronunciation Scoring of Specific Phone Segments for Language Instruction
The aim of the work described in this paper is to develop methods for automatically assessing the pronunciation quality of specific phone segments uttered by students learning a foreign language.
A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition
In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving.
Structure and Performance of a Dependency Language Model
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar.
A Study of Multilingual Speech Recognition
This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian…
Acoustic Clustering and Adaptation for Robust Speech Recognition
We describe an algorithm based on acoustic clustering and acoustic adaptation to significantly improve speech recognition performance. The method is particularly useful when speech from multiple speakers is to be…