Speech & natural language publications
-
Calibration and Multiple System Fusion for Spoken Term Detection Using Linear Logistic Regression
This study presents an efficient and effective score calibration technique for keyword detection that is based on the logistic regression calibration approach commonly used in forensic speaker identification.
-
Effective Use of DCTS for Contextualizing Features for Speaker Recognition
This article proposes a new approach for contextualizing features for speaker recognition through the discrete cosine transform (DCT).
-
Feature Fusion for High-Accuracy Keyword Spotting
This paper assesses the role of robust acoustic features in spoken term detection (a.k.a keyword spotting—KWS) under heavily degraded channel and noise corrupted conditions.
-
Adaptive and Discriminative Modeling for Improved Mispronunciation Detection
In the context of computer-aided language learning, automatic detection of specific phone mispronunciations by nonnative speakers can be used to provide detailed feedback about specific pronunciation problems.
-
Articulatory features from neural networks and their role in speech recognition
This paper presents a deep neural network (DNN) to extract articulatory information from the speech signal and explores different ways to use such information in a continuous speech recognition task.
-
Robust Features and System Fusion for Reverberation-robust Speech Recognition
In this work, we present robust acoustic features motivated by the knowledge gained from human speech perception and production, and demonstrate that these features provide reasonable robustness to reverberation effects…
-
ASR Error Detection Using Recurrent Neural Network Language Model and Complementary ASR
Our goal is to locate errors in an utterance so that the dialogue manager can pose appropriate clarification questions to the users.
-
A Novel Scheme for Speaker Recognition Using a Phonetically-Aware Deep Neural Network
We propose a novel framework for speaker recognition in which extraction of sufficient statistics for the state-of-the-art i-vector model is driven by a deep neural network (DNN) trained for automatic…
-
Lexical Stress Classification for Language Learning Using Spectral and Segmental Features
We present a system for detecting lexical stress in English words spoken by English learners. The system uses both spectral and segmental features to detect three levels of stress for…
-
Deduction for Natural Language Access to Data
We outline a general approach to automated natural-language question answering that uses first-order logic and automated deduction. Our interest is in answering queries over structured data resources
-
Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions
This paper investigates the effect of utterance duration to the calibration of a modern i-vector speaker recognition system with probabilistic linear discriminant analysis (PLDA) modeling.
-
Recent Developments in Voice Biometrics: Robustness and High Accuracy
We highlight SRI’s innovations that resulted from the IARPA Biometrics Exploitation Science & Technology (BEST) and the DARPA Robust Automatic Transcription of Speech (RATS) programs, as well as SRI’s approach…