Abstract
In this paper we describe novel feature subset selection methods, based on the estimation of feature salience i.e. the quantification of the relative importance of individual features, in the presence of other features, for determining the classes of records in a dataset. We present a definition of what we mean by feature salience and a method for estimating this feature salience. Five synthetic datasets were used to demonstrate the utility of the salience estimation technique. It was found that the estimation techniques produced good approximations to the calculated saliencies in most cases.
The use of feature salience as the basis of three methods of feature subset selection is described. These methods were evaluated on real world data sets by constructing classifiers using all features and comparing these with classifiers constructed using only a selected subset of features. It was found that the results compared well with other state of the art techniques and that the methods were simpler to implement and significantly faster to execute.
On average, applying our best feature subset selection method resulted in trees that used only 49% of the features used by trees constructed with the full set of features. This reduction in number of features used was associated with a 1% improvement in classifier accuracy.
Original language | English |
---|---|
Pages (from-to) | 3-21 |
Number of pages | 19 |
Journal | Intelligent Data Analysis |
Volume | 10 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 2006 |