Author: Martin Graciarena
-
Recent Improvements in SRI’s Keyword Detection System for Noisy Audio
We present improvements to a keyword spotting (KWS) system that operates in highly adverse channel conditions with very low signal-to-noise ratio levels.
-
Feature Fusion for High-Accuracy Keyword Spotting
This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting—KWS) under heavily degraded channel and noise corrupted conditions.
-
Medium-Duration Modulation Cepstral Feature for Robust Speech Recognition
In this paper, we present the Modulation of Medium Duration Speech Amplitude feature, which is a composite feature capturing subband speech modulations and a summary modulation.
-
Strategies for high accuracy keyword detection in noisy channels
We present design strategies for a keyword spotting (KWS) system that operates in highly degraded channel conditions with very low signal-to-noise ratio levels.
-
Modulation features for noise robust speaker identification
In this paper, we present a robust acoustic feature on top of robust modeling techniques to further improve speaker identification performance.
-
All for one: Feature combination for highly channel-degraded speech activity detection
This paper presents a feature combination approach to improve SAD on highly channel degraded speech as part of the Defense Advanced Research Projects Agency’s (DARPA) Robust Automatic Transcription of Speech (RATS) program.
-
A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation
This paper presents SRI’s submission along with a careful analysis of the approaches that provided gains for this challenging evaluation including a multiclass voice-activity detection system, the use of noisy data in system training, and the fusion of subsystems using acoustic characterization metadata.
-
Improving Language Identification Robustness to Highly Channel-Degraded Speech through Multiple System Fusion
We describe a language identification system developed for robustess to noise conditions such as those encountered under the DARPA RATS program, which is focused on multi-channel audio collected in high noise conditions.
-
Damped oscillator cepstral coefficients for robust speech recognition
This paper presents a new signal-processing technique motivated by the physiology of human auditory system.
-
Improving Speaker Identification Robustness to Highly Channel-Degraded Speech Through Multiple System Fusion
This article describes our submission to the speaker identification (SID) evaluation for the first phase of the DARPA Robust Audio and Transcription of Speech (RATS) program.
-
Effects of audio and ASR quality on cepstral and high-level speaker verification systems
We evaluate the effect that improved audio quality has for speaker verification performance, using a recently released full-bandwidth version of microphone data from the SRE2010 evaluation.
-
Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
In this work, we present an amplitude modulation feature derived from Teager’s nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features…