Author: Horacio Franco
-
Wideband Spectral Monitoring Using Deep Learning
We present a system to perform spectral monitoring of a wide band of 666.5 MHz, located within a range of 6 GHz of Radio Frequency (RF) bandwidth, using state-of-the-art deep learning approaches.
-
Voices Obscured in Complex Environmental Settings (VOiCES) corpus
This work is a multi-organizational effort led by SRI International and Lab41 with the intent to push forward state-of-the-art distant microphone approaches in signal processing and speech recognition.
-
Tackling Unseen Acoustic Conditions in Query-by-Example Search Using Time and Frequency Convolution for Multilingual Deep Bottleneck Features
This paper revisits two neural network architectures developed for noise and channel robust ASR, and applies them to building a state-of-art multilingual QbE system.
-
Noise-robust Exemplar Matching for Rescoring Query-by-Example Search
This paper describes a two-step approach for keyword spotting task in which a query-by-example search is followed by noise robust exemplar matching rescoring.
-
Leveraging Deep Neural Network Activation Entropy to Cope with Unseen Data in Speech Recognition
This work aims to estimate the propagation of such distortion in the form of network activation entropy, which is measured over a short-time running window on the activation from each neuron of a given hidden layer, and these measurements are then used to compute summary entropy.
-
Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks
This paper investigates using deep neural networks (DNN) and convolutional neural networks (CNNs) for mapping speech data into its corresponding articulatory space.
-
Speech recognition in unseen and noisy channel conditions
This work investigates robust features, feature-space maximum likelihood linear regression (fMLLR) transform, and deep convolutional nets to address the problem of unseen channel and noise conditions in speech recognition.
-
Coping with Unseen Data Conditions: Investigating Neural Net Architectures, Robust Features, and Information Fusion for Robust Speech Recognition
This work investigates the performance of traditional deep neural networks under varying acoustic conditions and evaluates their performance with speech recorded under realistic background conditions that are mismatched with respect to the training data.
-
Unsupervised Learning of Acoustic Units Using Autoencoders and Kohonen Nets
This work investigates learning acoustic units in an unsupervised manner from real-world speech data by using a cascade of an autoencoder and a Kohonen net.
-
Time-frequency convolutional networks for robust speech recognition
This work presents a modified CDNN architecture that we call the time-frequency convolutional network (TFCNN), in which two parallel layers of convolution are performed on the input feature space: convolution across time and frequency, each using a different pooling layer.
-
Improving robustness against reverberation for automatic speech recognition
In this work, we explore the role of robust acoustic features motivated by human speech perception studies, for building ASR systems robust to reverberation effects.
-
Classification of Lexical Stress Using Spectral and Prosodic Features for Computer-assisted Language Learning Systems
We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software.