Author: Dayne Freitag

January 1, 2004

Toward full automation of lexicon construction

We describe a technique of this nature called information theoretic co clustering and give results of a series of experiments built around it that demonstrate the main ingredients of lexical optimization.
January 1, 2004

Toward Unsupervised Whole-Corpus Tagging

We present a system for unsupervised tagging of words into classes produced by a distributional clustering technique called co-clustering.
January 1, 2004

Trained Named Entity Recognition Using Distributional Clusters

This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition.
January 1, 2000

Maximum Entropy Markov Models for Information Extraction and Segmentation

We address: modeling sequential data with HMMs, problems with previous methods: motivation, the maximum entropy Markov model, segmentation of FAQs: experiments and results.
January 1, 2000

Bridging the lexical chasm: statistical approaches to answer-finding

This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions.
January 1, 2000

Boosted wrapper induction

We describe an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which we then apply to conventional information extraction problems using boosting.
January 1, 1999

Information extraction using HMMs and shrinkage

This paper advocates for the use of HMMs for information extraction.
January 1, 1998

Multistrategy learning for information extraction

We describe three different multistrategy approaches. Experiments on two IE domains a collection of electronic seminar announcements from a university computer science department and a set of newswire articles describing corporate acquisitions from the Reuters collection demonstrate the effectiveness of all three approaches.
January 1, 1998

Information extraction from HTML: application of a general machine learning approach

We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational learning in solving it.
January 1, 1997

Using grammatical inference to improve precision in information extraction

The field of information extraction (IE) is concerned with applying natural language processing (NLP) and information retrieval (IR) techniques to the automatic extraction of essential details from text documents. We are exploring the use of machine learning methods for IE.
January 1, 1996

A Machine Learning Architecture for Optimizing Web Search Engines

We describe a wide range of such heuristics including a novel one inspired by reinforcement learning techniques for propagating rewards through a graph|which can be used to affect a search engine’s rankings.
January 1, 1995

WebWatcher: A Learning Apprentice for the World Wide Web

We describe an information seeking assistant for the world wide web. This agent, called WebWatcher, interactively helps users locate desired information by employing learned knowledge about which hyperlinks are likely to lead to the target information.