Application of Convolutional Neural Networks to Language Identification in Noisy Conditions

,

Citation

Lei, Y., Ferrer, L., Lawson, A., McLaren, M., & Scheffer, N. (2014, June). Application of Convolutional Neural Networks to Language Identification in Noisy Conditions. In Odyssey.

Abstract

This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained for automatic speech recognition (ASR).  In the CNN/i-vector frontend, the CNN is used to obtain the posterior probabilities for i-vector training and extraction instead of a universal background model (UBM).  The CNN/posterior frontend is somewhat similar to a phonetic system in that the occupation counts of (tied) triphone states (senones) given by the CNN are used for classification.  They are compressed to a low dimensional vector using probabilistic principal component analysis (PPCA). Evaluated on heavily degraded speech data, the proposed front ends provide significant improvements of up to 50% on average equal error rate compared to a UBM/i-vector baseline.  Moreover, the proposed frontends are complementary and give significant gains of up to 20% relative to the best single system when combined.


Read more from SRI

  • The US Capitol Dome

    Quantum on Capitol Hill

    The SRI-managed Quantum Economic Development Consortium convened quantum innovators and members of Congress to explore the future of quantum technology.

  • Rays of light

    Building the photonic circuits of the future

    SRI’s work on DARPA’s HAPPI program seeks to measurably advance the capability of circuits that transmit information using light rather than electrons.

  • Turning AI into a problem-solving teammate

    To chart the future of human-machine teaming, SRI’s COLLEAGUE project is building an AI-based system designed to act as a true collaborative partner.