Citation
Sankar, A., Heck, L., & Stolcke, A. (1997, February). Acoustic modeling for the SRI Hub4 partitioned evaluation continuous speech recognition system. In Proceedings of the 1997 DARPA Speech Recognition Workshop.
Abstract
We describe the development of the SRI system evaluated in the 1996 DARPA continuous speech recognition (CSR) Hub4 partitioned evaluation (PE). The task for the Hub4 evaluation was to recognition speech from broadcast television and radio shows. Recognizing such speech by machines poses many challenges. First, the segments to be recognized could be very long. This introduces a problem in training and recognition because of the consequent increased system memory requirement. A simple segmentation technique is used to break long segments into shorter, more manageable lengths. The speech from broadcast news sources exhibits a variety of difficult acoustic conditions, such as spontaneous speech, band-limited speech, and speech in the presence of noise, music, or background speakers. Such background conditions lead to significant degradation in performance. We describe techniques, based on acoustic adaptation, that adapt recognition models to the different acoustic background conditions, so as to improve recognition performance. We also present a novel algorithm that clusters the test data segments so that the resulting clusters are homogeneous with respect to speakers. This is followed by acoustic adaptation to the individual clusters, resulting in a significant performance improvement. Finally, we briefly describe our studies in language modeling for the Hub4 evaluation which is detailed further in another paper in these proceedings.