August 1, 2013

Damped oscillator cepstral coefficients for robust speech recognition

Citation

V. Mitra, H. Franco, and M. Graciarena, “Damped oscillator cepstral coefficients for robust speech recognition,” in Proc. Interspeech, 2013, pp. 886–890.

Abstract

This paper presents a new signal-processing technique motivated by the physiology of human auditory system. In this approach, auditory hair cells are modeled as damped oscillators that are stimulated by band-limited time domain speech signals acting as forcing functions. Oscillation synchrony is induced by time aligning and three-way coupling of the forcing functions across the individual bands such that a given oscillator is induced not only by its critical band’s forcing function but also by its two neighboring functions. We present two separate features; one which uses the damped oscillator response to the forcing functions without synchrony which we name as the Damped Oscillator Cepstral Coefficient (DOCC) and the other which uses the damped oscillator response to a time synchronized forcing function and we name it as the Synchronized Damped Oscillator Cepstral Coefficient (SyDOCC). The proposed features are used in an Aurora4 noise- and channel-degraded speech recognition task, and the results indicate that they improved speech-recognition performance in all conditions compared to the baseline melcepstral feature and other published noise robust features.

Index Terms—robust speech recognition, damped oscillators, modulation features, noise and channel degradation.

↓ Download

Damped oscillator cepstral coefficients for robust speech recognition

Abstract

Read more from SRI

Vaalia Health: Causal AI for precision healthcare

Ashish Gehani named 2025 SRI Fellow

Major NIH study helps families manage sleep and screentime