Citation
Mitra, V., Wang, W., Franco, H., Lei, Y., Bartels, C., & Graciarena, M. (2014). Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions. In Fifteenth annual conference of the international speech communication association.
Abstract
Deep Neural Network (DNN) based acoustic models have shown significant improvement over their Gaussian Mixture Model (GMM) counterparts in the last few years. While several studies exist that evaluate the performance of GMM systems under noisy and channel degraded conditions, noise robustness studies on DNN systems have been far fewer. In this work we present a study exploring both conventional DNNs and deep Convolutional Neural Networks (CNN) for noise- and channel-degraded speech recognition tasks using the Aurora4 dataset. We compare the baseline mel-filterbank energies with noise-robust features that we have proposed earlier and show that the use of robust features helps to improve the performance of DNNs or CNNs compared to melfilterbank energies. We also show that vocal tract length normalization has a positive role in improving the performance of the robust acoustic features. Finally, we show that by combining multiple systems together we can achieve even further improvement in recognition accuracy.
Index Terms: deep neural networks, convolutional neural networks, noise-robust speech recognition, continuous speech recognition, modulation features, damped oscillators.