Abstract
This paper introduces a novel similarity function that evaluates both the quantitative and qualitative similarities between data instances, named QQ-Means (Qualitative and Quantitative-Means). The values are naturally scaled to fall within the range of − 1 to 1. The magnitude signifies the extent of quantitative similarity, while the sign denotes qualitative similarity. The effectiveness of the QQ-Means for cluster analysis is tested by incorporating it into the K-means clustering algorithm. We compare the results of the proposed distance measure with commonly used distance or similarity measures such as Euclidean distance, Hamming distance, Mutual Information, Manhattan distance, and Chebyshev distance. These measures are also applied to the classic K-means algorithm or its variations to ensure consistency in the experimental procedure and conditions. The QQ-Means similarity metric was evaluated on gene-expression datasets and real-world complex datasets. The experimental findings demonstrate the effectiveness of the novel similarity measurement method in extracting valuable information from the data.
Original language | English |
---|---|
Pages (from-to) | 14977-15002 |
Number of pages | 26 |
Journal | Cluster Computing |
Volume | 27 |
Issue number | 10 |
Early online date | 8 Aug 2024 |
DOIs | |
Publication status | Published - Dec 2024 |
Keywords
- Clustering analysis
- Clustering purity
- K-means clustering
- Quantitative and qualitative similarity
- Similarity measure