Big Data Clustering

Research output: Chapter in Book/Report/Conference proceedingEntry for encyclopedia/dictionary

Abstract

Clustering algorithms group data items based on clearly defined similarity between the items aiming to minimize the intracluster differences and maximize the intercluster distances. A wealth of efficient and good quality clustering algorithms are already available for traditional data, but there are challenges for applying them to big data due to the overwhelming volume and complexities of such data. Data volume is getting bigger at an incredible pace due to growing access to Internet, social media, mobile devices, and technological innovations, and improving clustering algorithms, their computational cost and scalability have been the focus of much of the research in this area. This article provides an introduction to the characteristics of big data, and an overview of available algorithms and the current improvement trend of clustering algorithms for dealing with the challenges of big data.
Original languageEnglish
Title of host publicationWiley StatsRef: Statistics Reference Online
Place of PublicationEngland
PublisherWiley
ISBN (Electronic)9781118445112
DOIs
Publication statusPublished - 15 May 2018

Keywords

  • Big Data
  • Clustering
  • Distributed Computing
  • Hadoop
  • MapReduce
  • Parallel Clustering

Cite this