Speech & natural language publications
-
An anticorrelation kernel for improved system combination in speaker verification
This paper presents a method for training SVM-based classification systems for combination with other existing classification systems designed for the same task.
-
Recognizing Arabic speakers with English phones
We investigate the question of whether phone recognition models trained on large English databases can be used for speaker recognition in another language.
-
Improving NER in Arabic using a morphological tagger
We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer.
-
Automatic Annotation of Dialogue Structure from Simple User Interaction
We investigate, through the transformation of human annotations into hypothetical idealized user interactions, the relative utility of various modes of user interaction and techniques for their interpretation.
-
OOV Detection by Joint Word/Phone Lattice Alignment
We propose a new method for detecting out-of-vocabulary (OOV) words for large vocabulary continuous speech recognition (LVCSR) systems. Our method is based on performing a joint alignment between independently generated…
-
Integrating Several Annotation Layers for Statistical Information Distillation
We present a sentence extraction algorithm for Information Distillation, a task where for a given templated query, relevant passages must be extracted from massive audio and textual document sources.
-
Morph-Based Speech Recognition and Modeling of Out-of-Vocabulary Words Across Languages
We explore the use of morph-based language models in large-vocabulary continuous speech recognition systems across four so-called “morphologically rich” languages: Finnish, Estonian, Turkish, and Egyptian Colloquial Arabic. The morphs are…
-
Reranking Machine Translation Hypotheses With Structured and Web-based Language Models
In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.
-
Building A Highly Accurate Mandarin Speech Recognizer
We describe a highly accurate large-vocabulary continuous Mandarin speech recognizer, a collaborative effort among four research organizations. Particularly, we build two acoustic models (AMs) with significant differences but with similar…
-
Capturing a Taxonomy of Failures During Automatic Interpretation of Questions Posed in Natural Language
In this paper, we present a study – conducted in the context of the Halo Project – cataloging the types of failures that occur when capturing knowledge from natural language.
-
Capturing and Answering Questions Posed to a Knowledge-Based System
As part of the ongoing project, Project Halo, our goal is to build a system capable of answering questions posed by novice users to a formal knowledge base. In our…
-
Extending Boosting for Large Scale Spoken Language Understanding
We propose three methods for extending the Boosting family of classifiers motivated by the real-life problems we have encountered. Our results indicate that it is possible to obtain the same…