TY - JOUR
T1 - A hybrid classification method
T2 - Discrete canonical variate analysis using a genetic algorithm
AU - Kemsley, E. K.
N1 - Funding Information:
This work was funded by the Biotechnology and Biological Sciences Research Council. The author thanks A. Parr for providing some of the example data, and P.K. Hopke and D.L. Massart for making the olive oil data (of which example data set F is a subset), available in the public domain [20] .
PY - 2001/1/13
Y1 - 2001/1/13
N2 - This paper describes a novel, hybrid multivariate classification method: discrete canonical variate analysis (DCVA), which is integrated in the present implementation with a genetic algorithm (GA). DCVA transforms a multivariate data set into a set of discrete scores of lower dimensionality, intended specifically to act as classifiers of observations into one out of multiple pre-defined groups. The condition for selecting the DCVA loadings is maximization of the ratio of the between-groups to within-groups variance of the scores, but unlike conventional CVA, there is a non-linear, discontinuous relationship between the scores and loadings. The performance of the DCVA method is compared with that of two competing classification methods, Artificial Neural Networks (ANNs) and Mahalanobis distance-based Linear discriminant analysis (LDA) using six example problems. In all cases, internal (leave-one-out) cross-validation was used, and classification success rates retained from both the training and test segments. Of the methods studied, DCVA clearly performed the best in training, producing the highest mean success rates for four out of the six example data sets. For the test segments, DCVA produced the best performance for two of the data sets, and equalled that of LDA and ANN for a third. However, LDA produced the best performance from the remaining three data sets. This is suggestive of a greater tendency of DCVA, like other search-based methods, to overfit.
AB - This paper describes a novel, hybrid multivariate classification method: discrete canonical variate analysis (DCVA), which is integrated in the present implementation with a genetic algorithm (GA). DCVA transforms a multivariate data set into a set of discrete scores of lower dimensionality, intended specifically to act as classifiers of observations into one out of multiple pre-defined groups. The condition for selecting the DCVA loadings is maximization of the ratio of the between-groups to within-groups variance of the scores, but unlike conventional CVA, there is a non-linear, discontinuous relationship between the scores and loadings. The performance of the DCVA method is compared with that of two competing classification methods, Artificial Neural Networks (ANNs) and Mahalanobis distance-based Linear discriminant analysis (LDA) using six example problems. In all cases, internal (leave-one-out) cross-validation was used, and classification success rates retained from both the training and test segments. Of the methods studied, DCVA clearly performed the best in training, producing the highest mean success rates for four out of the six example data sets. For the test segments, DCVA produced the best performance for two of the data sets, and equalled that of LDA and ANN for a third. However, LDA produced the best performance from the remaining three data sets. This is suggestive of a greater tendency of DCVA, like other search-based methods, to overfit.
KW - Canonical variate analysis-CVA
KW - Classification
KW - Genetic algorithm-GA
KW - Non-linear
UR - http://www.scopus.com/inward/record.url?scp=0035852406&partnerID=8YFLogxK
U2 - 10.1016/S0169-7439(00)00114-3
DO - 10.1016/S0169-7439(00)00114-3
M3 - Article
AN - SCOPUS:0035852406
VL - 55
SP - 39
EP - 51
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
SN - 0169-7439
IS - 1-2
ER -