June 1, 2000

Word-Level Rate of Speech Modeling Using Rate-Specific Phones and Pronunciations

Citation

Jing Zheng, H. Franco, Fuliang Weng, A. Sankar and H. Bratt, “Word-level rate of speech modeling using rate-specific phones and pronunciations,” 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100), 2000, pp. 1775-1778 vol.3, doi: 10.1109/ICASSP.2000.862097.

Abstract

Variations in rate of speech (ROS) produce changes in both spectral features and word pronunciations that affect ASR systems. To cope with these effects, we propose to use rate-specific phone models and pronunciations for ROS modeling at the word level. Words are given three types of pronunciations — fast, slow, and medium — consisting of rate-specific phone models, respectively. This approach allows us to model within-sentence rate variation. To better model coarticulation effects, we introduce the concept of zero-length phones, which enables short phones to be skipped without having to change their neighboring phones’ contexts. A data-driven approach is used to prune the pronunciation dictionary derived from rules for phone reduction. We tested these approaches on the Hub 4 database and achieved a relative improvement of 2.0% over the baseline — an evaluation-quality version of SRI’s DECIPHERTM continuous speech recognition system — for clean native speech in the 1996 development set.

↓ Download

↓ View online

Word-Level Rate of Speech Modeling Using Rate-Specific Phones and Pronunciations

Abstract

Read more from SRI

SRI and University of Houston receive $3.6M to develop a microreactor to convert carbon dioxide to methanol using renewable energy

Teaching machines to learn like humans could help autonomous systems deal with unfamiliar environments

Office of Special Education Programs extends SRI’s funding for the Center for IDEA Early Childhood Data Systems