Dayne Freitag

October 8, 2022

Accelerating Human Authorship of Information Extraction Rules

We simulate the process of corpus review and word list creation, showing that several simple interventions greatly improve recall as a function of simulated labor.

June 2, 2022

VALET: Rule-Based Information Extraction for Rapid Deployment

We present VALET, a framework for rule-based information extraction written in Python. We show how a handful of rules suffices to implement sophisticated matching, and describe a user interface that facilitates exploration for development and maintenance of rule sets.

August 1, 2016

Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses

We consider the use of distant supervision for biological information extraction, and introduce two understudied corpora of this form, the Biological Expression Language (BEL) Large Corpus and the Pathway Logic (PL) Datum Corpus.

January 1, 2016

An Annotated Corpus and Method for Analysis of Ad-Hoc Structures Embedded in Text

We describe a method for identifying and performing functional analysis of structured regions that are embedded in natural language documents, such as tables or key-value lists.

October 1, 2013

Unsupervised Discovery and Extraction of Semi-Structured Regions in Text Via Self-Information

We present initial work that uses significant patterns to generate extraction rules, and conclude with a discussion of future directions of our work.

January 1, 2012

A corpus of online discussions for research into linguistic memes

We describe a 460-million word corpus of online discussions.

January 1, 2011

Airborne Observation of Aerosol Optical Depth During Arctas: Vertical Profiles, Inter-Comparison and Fine-Mode Fraction

We describe aerosol optical depth (AOD) measured during the Arctic Research of the Composition of the Troposphere from Aircraft and Satellites (ARCTAS) experiment, focusing on vertical profiles, inter-comparison with correlative observations and fine-mode fraction.

January 1, 2009

Name Transliteration with Bidirectional Perceptron Edit Models

We report on our efforts as part of the shared task on the NEWS 2009 Machine Transliteration Shared Task. We applied an orthographic perceptron character edit model that we have used previously for name transliteration…

January 1, 2008

Improving NER in Arabic using a morphological tagger

We discuss a named entity recognition system for Arabic, and show how we incorporated the information provided by MADA, a full morphological tagger which uses a morphological analyzer.

January 1, 2007

A Sequence Alignment Model Based on the Averaged Perceptron

We describe a discriminatively trained sequence alignment model based on the averaged perceptron. In common with other approaches to sequence modeling using perceptrons, and in contrast with comparable generative models, this model permits and transparently exploits arbitrary features of input strings.

January 1, 2005

Morphology induction from term clusters

We address the problem of learning a morphological automaton directly from a monolingual text corpus without recourse to additional resources.

January 1, 2005

New experiments in distributional representations of synonymy

We generated a TOEFL-like test using WordNet, containing thousands of questions and composed only of words occurring with sufficient corpus frequency to support sound distributional comparisons.

Author: Dayne Freitag