Author: Aaron Lawson
-
Toward Fail-Safe Speaker Recognition: Trial-Based Calibration with a Reject Option
In this work, we extend the TBC method, proposing a new similarity metric for selecting training data that results in significant gains over the one proposed in the original work.
-
Robust Speaker Recognition from Distant Speech under Real Reverberant Environments Using Speaker Embeddings
This article focuses on speaker recognition using speech acquired using a single distant or far-field microphone in an indoors environment.
-
Analysis of Complementary Information Sources in the Speaker Embeddings Framework
In this study, our aim is analyzing the behavior of the speaker recognition systems based on speaker embeddings toward different front-end features, including the standard MFCC, as well as PNCC, and PLP.
-
Voices Obscured in Complex Environmental Settings (VOiCES) corpus
This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.
-
Analysis of Phonetic Markedness and Gestural Effort Measures for Acoustic Speech-Based Depression Classification
In this paper we analyze articulatory measures to gain further insight into how articulation is affected by depression.
-
Calibration Approaches for Language Detection
In this paper, we focus on situations in which either (1) the system-modeled languages are not observed during use or (2) the test data contains OOS languages that are unseen during modeling or calibration.
-
Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data
We benchmark these approaches on several distinctly different databases, after we describe our SRICON-UAM team system submission for the NIST 2016 SRE.
-
On the Issue of Calibration in DNN-Based Speaker Recognition Systems
This article is concerned with the issue of calibration in the context of Deep Neural Network (DNN) based approaches to speaker recognition. We propose a hybrid alignment framework, which stems from our previous work in DNN senone alignment, that uses the bottleneck features only for the alignment of features during statistics calculation.
-
The 2016 Speakers in the Wild Speaker Recognition Evaluation
This article provides details of the SITW speaker recognition challenge and analysis of evaluation results. We provide an analysis of some of the top performing systems submitted during the evaluation and provide future research directions.
-
The Speakers in the Wild (SITW) Speaker Recognition Database
The Speakers in the Wild (SITW) speaker recognition database contains hand-annotated speech samples from open-source media for the purpose of benchmarking text-independent speaker recognition technology.
-
Exploring the role of phonetic bottleneck features for speaker and language recognition
Using bottleneck features extracted from a deep neural network (DNN) trained to predict senone posteriors has resulted in new, state-of-the-art technology for language and speaker identification.
-
Mitigating the effects of non-stationary unseen noises on language recognition performance
We introduce a new dataset for the study of the effect of highly non-stationary noises on language recognition (LR) performance.