Author: Andreas Kathol

March 1, 2017

Toward human-assisted lexical unit discovery without text resources

This work addresses lexical unit discovery for languages without (usable) written resources.
March 1, 2017

Analysis and prediction of heart rate using speech features from natural speech

We predict HR from speech using the SRI BioFrustration Corpus.In contrast to previous studies we use continuous spontaneous speech as input.
September 1, 2016

Automatic Speech Transcription for Low-Resource Languages — The Case of Yoloxóchitl Mixtec (Mexico)

In the present study, we focus exclusively on progress in developing speech recognition for the language of interest, Yoloxóchitl Mixtec (YM), an Oto-Manguean language spoken by fewer than 5000 speakers on the Pacific coast of Guerrero, Mexico.
September 1, 2016

The SRI CLEO Speaker-State Corpus

We introduce the SRI CLEO (Conversational Language about Everyday Objects) Speaker-State Corpus of speech, video, and biosignals.
September 1, 2015

Prediction of heart rate changes from speech features during interaction with a misbehaving dialog system

This study examines two questions: how do undesirable system responses affect people physiologically, and to what extent can we predict physiological changes from the speech signal alone?
March 1, 2015

The SRI biofrustration corpus: Audio, video and physiological signals for continuous user modeling

We describe the SRI BioFrustration Corpus, an inprogress corpus of time-aligned audio, video, and autonomic nervous system signals recorded while users interact with a dialog system to make returns of faulty consumer items.
November 1, 2014

The SRI AVEC-2014 Evaluation System

We explore a diverse set of features based only on spoken audio to understand which features correlate with self-reported depression scores according to the Beck depression rating scale.
May 1, 2014

Robust Features and System Fusion for Reverberation-robust Speech Recognition

In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and demonstrate that these features provide reasonable robustness to reverberation effects compared to traditional mel-filterbank-based features.
August 1, 2013

Strategies for high accuracy keyword detection in noisy channels

We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.
May 1, 2013

“Can You Give Me Another Word for Hyperbaric?”: Improving Speech Translation Using Targeted Clarification Questions

We present a novel approach for improving communication success between users of speech-to-speech translation systems by automatically detecting errors in the output of automatic speech recognition (ASR) and statistical machine translation (SMT) systems.
May 1, 2011

Acoustic data sharing for Afghan and Persian languages

In this work, we compare several known approaches for multilingual acoustic modeling for three languages, Dari, Farsi and Pashto, which are of recent geo-political interest.
April 1, 2009

Recent advances in SRI’s IraqComm Iraqi Arabic-English speech-to-speech translation system

We summarize recent progress on SRI’s IraqComm™ IraqiArabic-English two-way speech-to-speech translation system.