Benchmarking the Semi-Supervised Naïve Bayes Classifier

Research output: Contribution to conferencePaperpeer-review

1 Citation (Scopus)
16 Downloads (Pure)

Abstract

Semi-supervised learning involves constructing predictive models with both labelled and unlabelled training data. The need for semi-supervised learning is driven by the fact that unlabelled data are often easy and cheap to obtain, whereas labelling data requires costly and time consuming human intervention and expertise. Semi-supervised methods commonly use self training, which involves using the labelled data to predict the unlabelled data, then iteratively reconstructing classifiers using the predicted labels. Our aim is to determine whether self training classifiers actually improves performance. Expectation maximization is a commonly used self training scheme.

We investigate whether an expectation maximization scheme improves a naïve Bayes classifier through experimentation with 30 discrete and 20 continuous real world benchmark UCI datasets. Rather surprisingly we find that in practice the self training actually makes the classifier worse. The cause for this detrimental affect on performance could either be with the self training scheme itself, or how self training works in conjunction with the classifier. Our hypothesis is that it is the latter cause, and the violation of the naïve Bayes model assumption of independence of attributes means predictive errors propagate through the self training scheme. To test whether this is the case, we generate simulated data with the same attribute distribution as the UCI data, but where the attributes are independent. Experiments with this data demonstrate that semi-supervised learning does improve performance, leading to significantly more accurate classifiers.

These results demonstrate that semi-supervised learning cannot be applied blindly without considering the nature of the classifier, because the assumptions implicit in the classifier may result in a degradation in performance.
Original languageEnglish
DOIs
Publication statusPublished - 12 Jul 2015
EventThe International Joint Conference on Neural Networks - Killarney, Ireland
Duration: 12 Jul 201517 Jul 2015

Conference

ConferenceThe International Joint Conference on Neural Networks
Country/TerritoryIreland
CityKillarney
Period12/07/1517/07/15

Cite this