Delineation of shallow seismic source zones using K-means cluster analysis, with application to the Aegean region

Graeme Weatherill, Paul W. Burton

Research output: Contribution to journalArticlepeer-review

78 Citations (Scopus)


The selection of specific uniform seismic source zones for use in probabilistic seismic hazard analysis is often controversial. Recognizing that a consistent approach to source model development is not always possible, as the information available relating to geology and seismotectonics can vary from region to region, the K-means algorithm for hierarchical cluster analysis can be used to partition regions based on observed seismicity. The Aegean [incorporating Greece, Albania, Former Yugoslav Republic of Macedonia (F.Y.R.O.M.), southern Bulgaria and western Turkey], with its varied seismotectonics and generally high seismicity, is used as an important area of seismicity in which to develop and demonstrate the application of K-means. Two types of algorithm are considered. The first is a point-source K-means that can be used to partition a catalogue of earthquake hypocentres. The second is a novel line-source development of the algorithm, appropriate in seismology as these are analogues for the traces of active faults, which is then applied to a catalogue of known fault ruptures in the Aegean. The common problems of the K-means methodology are also addressed. Ensemble analyses are used to identify better choices of initial estimates for the cluster centres. A cluster quality index is used to identify the optimum number of clusters, and its robustness assessed when considering different subsets of the observed earthquake catalogue. An alternative approach is also implemented: Monte Carlo seismic hazard analysis is used to compare models with different numbers of clusters with the observed seismicity of the 20th century. Considerable variation is found in the optimum number of clusters identified either by the quality index or by stochastic seismic hazard analysis. Ultimately the K-means partitions of seismicity are developed into source models and their representation of Aegean seismotectonics assessed. The result is that models containing between 20 and 30 clusters emerge as the most appropriate in capturing the spatial variation in hypocentral distribution and fault type in the Aegean.
Original languageEnglish
Pages (from-to)565-588
Number of pages24
JournalGeophysical Journal International
Issue number2
Publication statusPublished - 2009

Cite this