Citation
Caruana, R., & Freitag, D. (1994). Greedy attribute selection. In Machine Learning Proceedings 1994 (pp. 28-36). Morgan Kaufmann.
Abstract
Many real-world domains bless us with a wealth of attributes to use for learning. This blessing is often a curse: most inductive methods generalize worse given too many attributes than if given a good subset of those attributes. We examine this problem for two learning tasks taken from a calendar scheduling domain. We show that ID3/C4.5 generalizes poorly on these tasks if allowed to use all available attributes. We examine five greedy hillclimbing procedures that search for attribute sets that generalize well with ID3/C4.5. Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance. We present a caching scheme that makes attribute hillclimbing more practical computationally. We also compare the results of hillclimbing in attribute space with FOCUS and RELIEF on the two tasks.