Speech & natural language publications
-
Error-Driven Generalist+Experts (EDGE): a Multi-Stage Ensemble Framework for Text Categorization
We introduce a multi-stage ensemble framework, Error-Driven Generalist+ Expert or Edge, for improved classification on large-scale text categorization problems.
-
An anticorrelation kernel for improved system combination in speaker verification
This paper presents a method for training SVM-based classification systems for combination with other existing classification systems designed for the same task.
-
Automatic Labeling Inconsistencies Detection and Correction for Sentence Unit Segmentation in Conversational Speech
In this work, we present various methods to detect labeling inconsistencies in the ICSI meeting corpus. We show that by automatically detecting and removing the inconsistent examples from the training…
-
Detecting nonnative speech using speaker recognition approaches
Detecting whether a talker is speaking his native language is useful for speaker recognition, speech recognition, and intelligence applications. We study the problem of detecting nonnative speakers of American English,…
-
Integrating Several Annotation Layers for Statistical Information Distillation
We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources.
-
Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages
We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are…
-
Reranking Machine Translation Hypotheses With Structured and Web-based Language Models
In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.
-
Building A Highly Accurate Mandarin Speech Recognizer
We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but with similar…
-
OOV Detection by Joint Word/Phone Lattice Alignment
We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated…
-
Capturing and Answering Questions Posed to a Knowledge-Based System
As part of the ongoing project, Project Halo, our goal is to build a system capable of answering questions posed by novice users to a formal knowledge base. In our…
-
Extending Boosting for Large Scale Spoken Language Understanding
We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. Our results indicate that it is possible to obtain the same…
-
Capturing a Taxonomy of Failures During Automatic Interpretation of Questions Posed in Natural Language
In this paper, we present a study – conducted in the context of the Halo Project – cataloging the types of failures that occur when capturing knowledge from natural language.