Speech & natural language publications
-
Speech recognition in unseen and noisy channel conditions
This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions in speech recognition.
-
Analysis and prediction of heart rate using speech features from natural speech
We predict HR from speech using the SRI BioFrustration Corpus.In contrast to previous studies we use continuous spontaneous speech as input.
-
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend
This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge and its extension.
-
Conversational In-Vehicle Dialog Systems: The past, present, and future
We review research and development activities for in-vehicle dialog systems, examine findings, discuss key challenges, and share our visions for voice-enabled interaction and intelligent assistance for smart vehicles over the…
-
Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets
This work investigates learning acoustic units in an unsupervised manner from real-world speech data by using a cascade of an autoencoder and a Kohonen net.
-
Privacy- preserving speech analytics for automatic assessment of student collaboration
This work investigates whether nonlexical information from speech can automatically predict the quality of small-group collaborations. Audio was collected from students as they collaborated in groups of three to solve…
-
The Speakers in the Wild (SITW) Speaker Recognition Database
The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology.
-
Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
This work investigates the performance of traditional deep neural networks under varying acoustic conditions and evaluates their performance with speech recorded under realistic background conditions that are mismatched with respect…
-
Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems
This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation.
-
The SRI CLEO Speaker-State Corpus
We introduce the SRI CLEO (Conversational Language about Everyday Objects) Speaker-State Corpus of speech, video, and biosignals.
-
On the Issue of Calibration in DNN-Based Speaker Recognition Systems
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. We propose a hybrid alignment framework, which stems…
-
The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created…