Quantitative and qualitative similarity measure for data clustering analysis

Jamil AlShaqsi, Wenjia Wang, Osama Drogham, Rami S. Alkhawaldeh

Research output: Contribution to journalArticlepeer-review

Abstract

This paper introduces a novel similarity function that evaluates both the quantitative and qualitative similarities between data instances, named QQ-Means (Qualitative and Quantitative-Means). The values are naturally scaled to fall within the range of − 1 to 1. The magnitude signifies the extent of quantitative similarity, while the sign denotes qualitative similarity. The effectiveness of the QQ-Means for cluster analysis is tested by incorporating it into the K-means clustering algorithm. We compare the results of the proposed distance measure with commonly used distance or similarity measures such as Euclidean distance, Hamming distance, Mutual Information, Manhattan distance, and Chebyshev distance. These measures are also applied to the classic K-means algorithm or its variations to ensure consistency in the experimental procedure and conditions. The QQ-Means similarity metric was evaluated on gene-expression datasets and real-world complex datasets. The experimental findings demonstrate the effectiveness of the novel similarity measurement method in extracting valuable information from the data.

Original languageEnglish
Pages (from-to)14977-15002
Number of pages26
JournalCluster Computing
Volume27
Issue number10
Early online date8 Aug 2024
DOIs
Publication statusPublished - Dec 2024

Keywords

  • Clustering analysis
  • Clustering purity
  • K-means clustering
  • Quantitative and qualitative similarity
  • Similarity measure

Cite this