TY - JOUR
T1 - The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification
AU - de la Iglesia, B.
AU - Richards, G.
AU - Philpott, M. S.
AU - Rayward-Smith, V. J.
PY - 2006/3/16
Y1 - 2006/3/16
N2 - In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.
AB - In this paper, we present an application of multi-objective metaheuristics to the field of data mining. We introduce the data mining task of nugget discovery (also known as partial classification) and show how the multi-objective metaheuristic algorithm NSGA II can be modified to solve this problem. We also present an alternative algorithm for the same task, the ARAC algorithm, which can find all rules that are best according to some measures of interest subject to certain constraints. The ARAC algorithm provides an excellent basis for comparison with the results of the multi-objective metaheuristic algorithm as it can deliver the Pareto optimal front consisting of all partial classification rules that lie in the upper confidence/coverage border, for databases of limited size. We present the results of experiments with various well-known databases for both algorithms. We also discuss how the two methods can be used complementarily for large databases to deliver a set of best rules according to some predefined criteria, providing a powerful tool for knowledge discovery in databases.
U2 - 10.1016/j.ejor.2004.08.025
DO - 10.1016/j.ejor.2004.08.025
M3 - Article
VL - 169
SP - 898
EP - 917
JO - European Journal of Operational Research
JF - European Journal of Operational Research
SN - 0377-2217
IS - 3
ER -