Clustering time series from ARMA models with clipped data

A. J. Bagnall, G. J. Janacek

Research output: Contribution to conferencePaper

69 Citations (Scopus)

Abstract

Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. In this paper we focus on clustering data derived from Autoregressive Moving Average (ARMA) models using k-means and k-medoids algorithms with the Euclidean distance between estimated model parameters. We justify our choice of clustering technique and distance metric by reproducing results obtained in related research. Our research aim is to assess the affects of discretising data into binary sequences of above and below the median, a process known as clipping, on the clustering of time series. It is known that the fitted AR parameters of clipped data tend asymptotically to the parameters for unclipped data. We exploit this result to demonstrate that for long series the clustering accuracy when using clipped data from the class of ARMA models is not significantly different to that achieved with unclipped data. Next we show that if the data contains outliers then using clipped data produces significantly better clusterings. We then demonstrate that using clipped series requires much less memory and operations such as distance calculations can be much faster. Finally, we demonstrate these advantages on three real world data sets.
Original languageEnglish
Pages49-58
Number of pages10
DOIs
Publication statusPublished - Aug 2004
Event10thACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Seattle, United States
Duration: 22 Aug 200425 Aug 2004

Conference

Conference10thACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Abbreviated titleKDD '04
Country/TerritoryUnited States
CitySeattle
Period22/08/0425/08/04

Cite this