Speech & natural language publications
-
Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system
This study examines two questions: how do undesirable system responses affect people physiologically, and to what extent can we predict physiological changes from the speech signal alone?
-
Mitigating the effects of non-stationary unseen noises on language recognition performance
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance.
-
Improved speaker recognition using DCT coefficients as features
We recently proposed the use of coefficients extracted from the 2D discrete cosine transform (DCT) of log Mel filter bank energies to improve speaker recognition over the traditional Mel frequency…
-
Effects of feature type, learning algorithm and speaking style for depression detection from speech
We systematically study the effects of feature type, machine learning approach, and speaking style (read versus spontaneous) on depression prediction in the AVEC-2014 evaluation corpus.
-
Cross-corpus depression prediction from speech
We study a new corpus of patient-clinician interactions recorded when patients are admitted to a hospital for suicide risk and again when they are released.
-
Detection of Demographics and Identity in Spontaneous Speech and Writing
This chapter focuses on the automatic identification of demographic traits and identity in both speech and writing.
-
Softsad: Integrated frame-based speech confidence for speaker recognition
In this paper we propose softSAD: the direct integration of speech posteriors into a speaker recognition system instead of using speech activity detection (SAD).
-
Advances in deep neural network approaches to speaker recognition
In this work, we report the same achievement in DNN-based SID performance on microphone speech. We consider two approaches to DNN-based SID: one that uses the DNN to extract features,…
-
Enhanced end-of-turn detection for speech to a personal assistant
We collected and compared two elicitation corpora differing in naturalness and task complexity.
-
The SRI biofrustration corpus: Audio, video and physiological signals for continuous user modeling
We describe the SRI BioFrustration Corpus, an inprogress corpus of time-aligned audio, video, and autonomic nervous system signals recorded while users interact with a dialog system to make returns of…
-
Bilingual Recurrent Neural Networks for Improved Statistical Machine Translation
In SMT, we investigate using bilingual word-aligned sentences to train a bilingual recurrent neural network model.
-
Deep convolutional nets and robust features for reverberations-robust speech recognition
In this work, we present robust acoustic features motivated by human speech perception for use in a convolutional deep neural network-based acoustic model for recognizing continuous speech in a reverberant…