Trained Named Entity Recognition Using Distributional Clusters

Citation

Freitag D. Trained Named Entity Recognition Using Distributional Clusters, in Proceedings of EMNLP 2004, 2004.

Abstract

This work applies boosted wrapper induction (BWI), a machine learning algorithm for information extraction from semi-structured documents, to the problem of named entity recognition. The default feature set of BWI is augmented with features based on distributional term clusters induced from a large unlabeled text corpus. Using no traditional linguistic resources, such as syntactic tags or special purpose gazetteers, this approach yields results near the state of the art in the MUC 6 named entity domain. Supervised learning using features derived through unsupervised corpus analysis may be regarded as an alternative to bootstrapping methods.


Read more from SRI