Speech & natural language publications
-
Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
This work investigates the performance of traditional deep neural networks under varying acoustic conditions and evaluates their performance with speech recorded under realistic background conditions that are mismatched with respect…
-
Minimizing Annotation Effort for Adaptation of Speech-Activity Detection Systems
This paper focuses on the problem of selecting the best-possible subset of available audio data given a budgeted time for annotation.
-
The SRI CLEO Speaker-State Corpus
We introduce the SRI CLEO (Conversational Language about Everyday Objects) Speaker-State Corpus of speech, video, and biosignals.
-
Exploring the role of phonetic bottleneck features for speaker and language recognition
Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification.
-
A Phonetically Aware System for Speech Activity Detection
In this paper, we focus on a dataset of highly degraded signals, developed under the DARPA Robust Automatic Transcription of Speech (RATS) program.
-
Analyzing the effect of channel mismatch on the SRI language recognition evaluation 2015 system
We present the work done by our group for the 2015 language recognition evaluation (LRE) organized by the National Institute of Standards and Technology (NIST).
-
Noise and reverberation effects on depression detection from speech
This study compares the effect of noise and reverberation on depression prediction using standard mel-frequency cepstral coefficients, and features designed for noise robustness, damped oscillator cepstral coefficients.
-
The MERL/SRI System for the 3rd chime challenge using beamforming, robust feature extraction and advanced speech recognition
This paper introduces the MERL/SRI system designed for the 3rd CHiME speech separation and recognition challenge (CHiME-3).
-
Improving robustness against reverberation for automatic speech recognition
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
-
Time-frequency convolutional networks for robust speech recognition
This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution…
-
Study of senone-based deep neural network approaches for spoken language recognition
This paper compares different approaches for using deep neural networks (DNNs) trained to predict senone posteriors for the task of spoken language recognition (SLR).
-
Speech-based assessment of PTSD in a military population using diverse feature classes
We analyzed recordings of the Clinician-Administered PTSD Scale (CAPS) interview from military personnel diagnosed as PTSD positive versus negative.