Original language | English |
---|---|
Title of host publication | Wiley StatsRef: Statistics Reference Online |
Place of Publication | England |
Publisher | Wiley |
ISBN (Electronic) | 9781118445112 |
DOIs | |
Publication status | Published - 15 May 2018 |
Abstract
Clustering algorithms group data items based on clearly defined similarity between the items aiming to minimize the intracluster differences and maximize the intercluster distances. A wealth of efficient and good quality clustering algorithms are already available for traditional data, but there are challenges for applying them to big data due to the overwhelming volume and complexities of such data. Data volume is getting bigger at an incredible pace due to growing access to Internet, social media, mobile devices, and technological innovations, and improving clustering algorithms, their computational cost and scalability have been the focus of much of the research in this area. This article provides an introduction to the characteristics of big data, and an overview of available algorithms and the current improvement trend of clustering algorithms for dealing with the challenges of big data.
Keywords
- Big Data
- Clustering
- Distributed Computing
- Hadoop
- MapReduce
- Parallel Clustering