Speech & natural language publications
-
Crowdsourcing Emotional Speech
We describe the methodology for the collection and annotation of a large corpus of emotional speech data through crowdsourcing.
-
Noise-robust Exemplar Matching for Rescoring Query-by-Example Search
This paper describes a two-step approach for keyword spotting task in which a query-by-example search is followed by noise robust exemplar matching rescoring.
-
Language Diarization for Semi-supervised Bilingual Acoustic Model Training
In this paper, we investigate several automatic transcription schemes for using raw bilingual broadcast news data in semi-supervised bilingual acoustic model training.
-
Tackling Unseen Acoustic Conditions in Query-by-Example Search Using Time and Frequency Convolution for Multilingual Deep Bottleneck Features
This paper revisits two neural network architectures developed for noise and channel robust ASR, and applies them to building a state-of-art multilingual QbE system.
-
Analysis of Phonetic Markedness and Gestural Effort Measures for Acoustic Speech-Based Depression Classification
In this paper we analyze articulatory measures to gain further insight into how articulation is affected by depression.
-
Inferring Stance from Prosody
Speech conveys many things beyond content, including aspects of stance and attitude that have not been much studied.
-
Calibration Approaches for Language Detection
In this paper, we focus on situations in which either (1) the system-modeled languages are not observed during use or (2) the test data contains OOS languages that are unseen…
-
Leveraging Deep Neural Network Activation Entropy to Cope with Unseen Data in Speech Recognition
This work aims to estimate the propagation of such distortion in the form of network activation entropy, which is measured over a short-time running window on the activation from each…
-
Improving Robustness of Speaker Recognition to New Conditions Using Unlabeled Data
We benchmark these approaches on several distinctly different databases, after we describe our SRICON-UAM team system submission for the NIST 2016 SRE.
-
Hybrid Convolutional Neural Networks for Articulatory and Acoustic Information Based Speech Recognition
This work explores using deep neural networks (DNNs) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space. Our speech-inversion results indicate that the CNN models…
-
Toward human-assisted lexical unit discovery without text resources
This work addresses lexical unit discovery for languages without (usable) written resources.
-
Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks
This paper investigates using deep neural networks (DNN) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space.