Author: Dayne Freitag
-
Accelerating Human Authorship of Information Extraction Rules
We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.
-
VALET: Rule-Based Information Extraction for Rapid Deployment
We present VALET, a framework for rule-based information extraction written in Python. We show how a handful of rules suffices to implement sophisticated matching, and describe a user interface that facilitates exploration for development and maintenance of rule sets.
-
Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses
We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.
-
An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text
We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.
-
Unsupervised Discovery and Extraction of Semi-Structured Regions in Text Via Self-Information
We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.
-
A corpus of online discussions for research into linguistic memes
We describe a 460-million word corpus of online discussions.
-
Airborne Observation of Aerosol Optical Depth During Arctas: Vertical Profiles, Inter-Comparison and Fine-Mode Fraction
We describe aerosol optical depth (AOD) measured during the Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS) experiment, focusing on vertical profiles, inter-comparison with correlative observations and fine-mode fraction.
-
Name Transliteration with Bidirectional Perceptron Edit Models
We report on our efforts as part of the shared task on the NEWS 2009 Machine Transliteration Shared Task. We applied an orthographic perceptron character edit model that we have used previously for name transliteration…
-
Improving NER in Arabic using a morphological tagger
We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer.
-
A Sequence Alignment Model Based on the Averaged Perceptron
We describe a discriminatively trained sequence alignment model based on the averaged perceptron. In common with other approaches to sequence modeling using perceptrons, and in contrast with comparable generative models, this model permits and transparently exploits arbitrary features of input strings.
-
New experiments in distributional representations of synonymy
We generated a TOEFL-like test using WordNet, containing thousands of questions and composed only of words occurring with sufficient corpus frequency to support sound distributional comparisons.
-
Morphology induction from term clusters
We address the problem of learning a morphological automaton directly from a monolingual text corpus without recourse to additional resources.