Speech & natural language publications
-
The SRI March 2000 Hub-5 Conversational Speech Transcription System
We describe SRI's large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation.
-
Language Modelling for Multilingual Speech Translation
As with acoustic modelling, sparse training data is one of the main problems in language modelling tasks. We ideally want to have enough properly matched data to train models for…
-
Prosody-Based Automatic Segmentation of Speech into Sentences and Topics
Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the…
-
Phonetic Consequences of Speech Disfluency
Analyses of American English show that disfluency affects a variety of phonetic aspects of speech, including segment durations, intonation, voice quality, vowel quality, and coarticulation patterns. These effects provide clues…
-
Data-Driven Subclassification of Disfluent Repetitions Based on Prosodic Features
This study delves into the acoustic and prosodic information of repetitions, one of the most common disfluencies. A hierarchical clustering of prosodic features reveals three subsets of repetitions, each reflecting…
-
Finding Consensus Among Words: Lattice-based Word Error Minimization
We describe a new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER). Our approach thus overcomes the mismatch between…
-
Modeling the Prosody of Hidden Events for Improved Word Recognition
We investigate a new approach for using speech prosody as a knowledge source for speech recognition. The idea is to penalize word hypotheses that are inconsistent with prosodic features such…
-
Combining Words and Prosody for Information Extraction from Speech
In this work we demonstrate the use of em prosodic cues, alone and in combination with words, for segmentation and name finding. In experiments, we find that prosodic cues alone…
-
Robust Text-Independent Speaker Identification over Telephone Channels
This paper addresses the issue of closed-set text-independent speaker identifcation from samples of speech recorded over the telephone. It focuses on the effects of acoustic mismatches between training and testing…
-
Combining Words and Speech Prosody for Automatic Topic Segmentation
We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models,…
-
Efficient Lattice Representation and Generation
We describe two new techniques for reducing word lattice sizes without eliminating hypotheses.
-
How Far Do Speakers Back Up in Repairs? A Quantitative Model
We propose a quantitative model that predicts the overall distribution of retrace lengths in a large corpus of spontaneous speech, based only on word position. Results have implications for modeling…