Abstract
There are a number of approaches to classify text documents. Here, we use Partially Supervised Classification (PSC) and argue that it is an effective and efficient approach for real-world problems. PSC uses a two-step strategy to cut down on the labelling effort. There are a number of methods that have been proposed for each step. An evaluation of various methods is conducted using real-world medical documents. The results show that using EM to build the classifier yields better results than SVM. We also experimentally show that careful selection of a subset of features to represent the documents can improve performance.
| Original language | English |
|---|---|
| Pages (from-to) | 268-287 |
| Number of pages | 20 |
| Journal | International Journal of Data Mining and Bioinformatics |
| Volume | 2 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - Sept 2008 |
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver