Publications
-
A Lognormal Tied Mixture Model of Pitch for Prosody-Based Speaker Recognition
In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving.
-
Modeling Linguistic Segment and Turn Boundaries for N-best Rescoring of Spontaneous Speech
We present an N-best rescoring algorithm that removes the effect of segmentation mismatch. Furthermore, we show that explicit language modeling of hidden linguistic segment boundaries is improved by including turn-boundary…
-
Explicit Word Error Minimization in N-best List Rescoring
We show that the standard hypothesis scoring paradigm used in maximum-likelihood-based speech recognition systems is not optimal with regard to minimizing the word error rate, the commonly used performance metric…
-
Mixture Input Transformations for Adaptation of Hybrid Connectionist Speech Recognizers
In this paper, we propose a new algorithm to train mixtures of transformation networks (MTNs) in the hybrid connectionist recognition framework. We apply the new algorithm to nonnative speaker adaptation,…
-
Acoustic Clustering and Adaptation for Robust Speech Recognition
We describe an algorithm based on acoustic clustering and acoustic adaptation to significantly improve speech recognition performance. The method is particularly useful when speech from multiple speakers is to be…
-
Structure and Performance of a Dependency Language Model
We present a maximum entropy language model that incorporates both syntax and semantics via a dependency grammar.
-
Speech: A Privileged Modality
In this article, we use our interaction model to demonstrate that during multimodal fusion, speech should be a privileged modality, driving the interpretation of a query, and that in certain…
-
HMM State Clustering Across Allophone Class Boundaries
We present a novel approach to hidden Markov model (HMM) state clustering based on the use of broad phone classes and an allophone class entropy measure. Our algorithm allows clustering…
-
A Study of Multilingual Speech Recognition
This paper describes our work in developing multilingual (Swedish and English) speech recognition systems in the ATIS domain. The acoustic component of the multilingual systems is realized through sharing Gaussian…
-
Multimodal Interfaces for Internet
In this paper, we present a Java-enabled application with a multimodal (pen and voice) interface over the web. Our implementation approach was to add Java to the set of languages…
-
Using Differential Constraints to Reconstruct Complex Surfaces from Stereo
Stereo reconstruction algorithms often fail to properly deal with complex surfaces, because there is not enough image information. We propose to guide the reconstruction process using a priori information about…
-
Model Transformation for Robust Speaker Recognition from Telephone Data
In the context of automatic speaker recognition, we propose a model transformation technique that renders speaker models more robust to acoustic mismatches and to data scarcity by appropriately increasing their…