Citation
Stolcke, A., Ryant, N., Mitra, V., Yuan, J., Wang, W., & Liberman, M. (2014, May). Highly accurate phonetic segmentation using boundary correction models and system fusion. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5552-5556). IEEE.
Abstract
Accurate phone-level segmentation of speech remains an important task for many subfields of speech research. We investigate techniques for boosting the accuracy of automatic phonetic segmentation based on HMM acoustic-phonetic models. In prior work [25] we were able to improve on state-of-the-art alignment accuracy by employing special phone boundary HMM models, trained on phonetically segmented training data, in conjunction with a simple boundary-time correction model. Here we present further improved results by using more powerful statistical models for boundary correction that are conditioned on phonetic context and duration features. Furthermore, we find that combining multiple acoustic front-ends gives additional gains in accuracy, and that conditioning the combiner on phonetic context and side information helps. Overall, we reduce segmentation errors on the TIMIT corpus by almost one half, from 93.9% to 96.8% boundary accuracy with a 20-ms tolerance.
Index Terms— phonetic segmentation, phone boundary model, forced alignment, HMM, regression, system fusion.