On time series classification with dictionary-based classifiers

James Large, Anthony Bagnall, Simon Malinowski, Romain Tavenard

Research output: Contribution to journalArticlepeer-review

40 Citations (Scopus)
47 Downloads (Pure)


A family of algorithms for time series classification (TSC) involve running a sliding window across each series, discretising the window to form a word, forming a histogram of word counts over the dictionary, then constructing a classifier on the histograms. A recent evaluation of two of this type of algorithm, Bag of Patterns (BOP) and Bag of Symbolic Fourier Approximation Symbols (BOSS) found a significant difference in accuracy between these seemingly similar algorithms. We investigate this phenomenon by deconstructing the classifiers and measuring the relative importance of the four key components between BOP and BOSS. We find that whilst ensembling is a key component for both algorithms, the effect of the other components is mixed and more complex. We conclude that BOSS represents the state of the art for dictionary-based TSC. Both BOP and BOSS can be classed as bag of words approaches. These are particularly popular in Computer Vision for tasks such as image classification. We adapt three techniques used in Computer Vision for TSC: Scale Invariant Feature Transform; Spatial Pyramids; and Histogram Intersection. We find that using Spatial Pyramids in conjunction with BOSS (SP) produces a significantly more accurate classifier. SP is significantly more accurate than standard benchmarks and the original BOSS algorithm. It is not significantly worse than the best shapelet-based or deep learning approaches, and is only outperformed by an ensemble that includes BOSS as a constituent module.
Original languageEnglish
Pages (from-to)1073-1089
Number of pages17
JournalIntelligent Data Analysis
Issue number5
Publication statusPublished - 24 Oct 2019

Cite this