Dynamic ensemble selection methods for heterogeneous data mining

Chris Ballard, Wenjia Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

16 Citations (Scopus)
7 Downloads (Pure)

Abstract

Big data is often collected from multiple sources with possibly different features, representations and granularity and hence is defined as heterogeneous data. Such multiple datasets need to be fused together in some ways for further analysis. Data fusion at feature level requires domain knowledge and can be time-consuming and ineffective, but it could be avoided if decision-level fusion is applied properly. Ensemble methods appear to be an appropriate paradigm to do just that as each subset of heterogeneous data sources can be separately used to induce models independently and their decisions are then aggregated by a decision fusion function in an ensemble. This study investigates how heterogeneous data can be used to generate more diverse classifiers to build more accurate ensembles. A Dynamic Ensemble Selection Optimisation (DESO) framework is proposed, using the local feature space of heterogeneous data to increase diversity among classifiers and Simulated Annealing for optimisation. An implementation example of DESO — BaggingDES is provided with Bagging as a base platform of DESO, to test its performance and also explore the relationship between diversity and accuracy. Experiments are carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method — decision tree, and reasonably better than the classic Bagging.and accuracy. Experiments were carried out with some heterogeneous datasets derived from real-world benchmark datasets. The statistical analyses of the results show that BaggingDES performed significantly better than the baseline method - decision tree, and reasonably better than the classic Bagging.
Original languageEnglish
Title of host publication12th World Congress on Intelligent Control and Automation (WCICA), 2016
PublisherIEEE Press
ISBN (Electronic)978-1-4673-8414-8
ISBN (Print)978-1-4673-8415-5
DOIs
Publication statusPublished - 29 Sep 2016
EventIEEE World Congress on Intelligent Control and Automation - Guilin, China
Duration: 12 Jun 201615 Jun 2016

Conference

ConferenceIEEE World Congress on Intelligent Control and Automation
Country/TerritoryChina
CityGuilin
Period12/06/1615/06/16

Keywords

  • Bagging
  • Classification algorithms
  • Clustering algorithms
  • Data integration
  • Training
  • Simulated annealing

Cite this