The well-known 3V architectural paradigm for Big Data introduced by Laney (2011) provides a simplified framework for defining the architecture of a big data platform to be deployed in various scenarios tackling processing of massive datasets. While additional components such as Variability and Veracity have been discussed as an extension to the 3V model, the basic components (Volume, Variety, and Velocity) provide a quantitative framework while varia-bility and veracity target a more qualitative approach. In this paper we argue why the basic 3Vs are not equal due to the different requirements that need to be covered in case higher demands for a particular “V”. Similar to other conjectures such as the CAP theorem, 3V based architectures differ on their implementation. We call this paradigm heterogeneity and we provide a taxonomy of the existing tools (as of 2013) covering the Hadoop ecosystem from the perspective of heterogeneity. This paper contributes on the understanding of the Hadoop ecosystem from the perspective of different workloads and aims to help researchers and practitioners on the design of scalable platforms targeting different operational needs.
|Published - 2013
- Big data platforms
- Big data systems architecture