Author: Dayne Freitag
-
Trained Named Entity Recognition Using Distributional Clusters
This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition.
-
Toward full automation of lexicon construction
We describe a technique of this nature called information theoretic co clustering and give results of a series of experiments built around it that demonstrate the main ingredients of lexical optimization.
-
Toward Unsupervised Whole-Corpus Tagging
We present a system for unsupervised tagging of words into classes produced by a distributional clustering technique called co-clustering.
-
Boosted wrapper induction
We describe an algorithm that learns simple, low-coverage wrapper-like extraction patterns, which we then apply to conventional information extraction problems using boosting.
-
Maximum Entropy Markov Models for Information Extraction and Segmentation
We address: modeling sequential data with HMMs, problems with previous methods: motivation, the maximum entropy Markov model, segmentation of FAQs: experiments and results.
-
Bridging the lexical chasm: statistical approaches to answer-finding
This paper investigates whether a machine can automatically learn the task of finding, within a large collection of candidate responses, the answers to questions.
-
Information extraction using HMMs and shrinkage
This paper advocates for the use of HMMs for information extraction.
-
Multistrategy learning for information extraction
We describe three different multistrategy approaches. Experiments on two IE domains a collection of electronic seminar announcements from a university computer science department and a set of newswire articles describing corporate acquisitions from the Reuters collection demonstrate the effectiveness of all three approaches.
-
Information extraction from HTML: application of a general machine learning approach
We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational learning in solving it.
-
Using grammatical inference to improve precision in information extraction
The field of information extraction (IE) is concerned with applying natural language processing (NLP) and information retrieval (IR) techniques to the automatic extraction of essential details from text documents. We are exploring the use of machine learning methods for IE.
-
A Machine Learning Architecture for Optimizing Web Search Engines
We describe a wide range of such heuristics including a novel one inspired by reinforcement learning techniques for propagating rewards through a graph|which can be used to affect a search engine’s rankings.
-
WebWatcher: A Learning Apprentice for the World Wide Web
We describe an information seeking assistant for the world wide web. This agent, called WebWatcher, interactively helps users locate desired information by employing learned knowledge about which hyperlinks are likely to lead to the target information.