A Comparison of Two Document Clustering Approaches for Clustering Medical Documents

F.H. Saad, B. de la Iglesia, G. D. Bell

Research output: Contribution to conferencePaper

Abstract

Medical data is often presented as free text in the form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from each cluster, thus introducing structure into the data and facilitating the application of conventional data mining techniques. For this approach to work, it is essential to produce high-quality clustering. Thus, the main goals of this paper are (1) to experimentally evaluate the performance of six criterion functions in the context of partitional clustering approach, (2) to compare the clustering results of agglomerative approach and partitional approach for each of the criterion functions using real-world medical documents, and (3) to establish the right clustering algorithm to produce high quality clustering of real-world medical documents in order to discover hidden knowledge by analyzing the produced clusters. Our experimental results show that the clustering solutions produced by the agglomerative approach are consistently better than those produced by the partitional approach for all the criterion functions. Moreover, the results show that different criterion functions lead to substantially different results. In addition, we examine the quality of the features produced for each cluster for a classification task. The task involves discriminating between successful and unsuccessful procedures. The features extracted are used to produce an accurate classification of the data.
Original languageEnglish
Pages425-431
Number of pages7
Publication statusPublished - 2006
Event2006 International Conference on Data Mining - Las Vegas, United States
Duration: 26 Jun 200629 Jun 2006

Conference

Conference2006 International Conference on Data Mining
Abbreviated titleDMIN-06
Country/TerritoryUnited States
CityLas Vegas
Period26/06/0629/06/06

Cite this