On diversity and accuracy of homogeneous and heterogeneous ensembles.

Shun Bian, Wenjia Wang

Research output: Contribution to journalArticlepeer-review

Abstract

The ensemble learning approach has been increasingly used in data mining for improving performance. However, the gain on the learning performance appears varying considerably from application to application. In some cases there were little or no gains achieved even when the same ensemble paradigms were used. This means that there are still some problems in understanding some basic and fundamental issues in ensemble methodology, especially on the factors that can affect the performance of an ensemble and the strategies for constructing effective ensembles. This paper attempts to address these issues. It first describes the possible influencing factors and then focuses on investigating the most important factor – diversity and its relationships with the accuracy of ensemble. In this study, two types of ensembles – homogeneous and heterogeneous ensembles are defined and constructed by using ten different learning algorithms and their diversity and accuracy are evaluated in order to find out which types of ensemble possess high diversity and are thus more accurate. For each of the ten learning algorithms, its ability for generating different types of diversity is estimated quantitatively by using ten common diversity measures and their characteristics are then analyzed to establish their correlation with ensemble performance. The study used fifteen popular data sets to verify the consistence and reliability of our experimental findings.
Original languageEnglish
Pages (from-to)103-128
Number of pages26
JournalInternational Journal of Hybrid Intelligent Systems
Volume4
Issue number2
Publication statusPublished - 2007

Cite this