TY - GEN
T1 - Multiple Imputation Ensembles for Time Series (MIE-TS)
AU - Aleryani, Aliya
AU - Bostrom, Aaron
AU - Wang, Wenjia
AU - Iglesia, Beatriz
N1 - Funding Information:
We acknowledge support from Grant Number ES/L011859/1, from The Business and Local Government Data Research Centre, funded by the Economic and Social Research Council to provide economic, scientific and social researchers and business analysts with secure data services.
Publisher Copyright:
© 2023 Association for Computing Machinery.
PY - 2023/2/22
Y1 - 2023/2/22
N2 - Time series classification has become an interesting field of research, thanks to the extensive studies conducted in the past two decades. Time series may have missing data, which may affect both the representation and also modeling of time series. Thus, recovering missing data using appropriate time series-based imputation methods is an essential step. Multiple imputation is a data recovery method where it produced multiple imputed data. The method proves its usefulness in terms of reflecting the uncertainty inherit in missing data; however, it is under-researched in time series problems. In this article, we propose two multiple imputation approaches for time series. The first is a multiple imputation method based on interpolation. The second is a multiple imputation and ensemble method. First, we simulate missing consecutive sub-sequences under a Missing Completely at Random mechanism; then, we use single/multiple imputation methods. The imputed data are used to build bagging and stacking ensembles. We build ensembles using standard classification algorithms as well as time series classifiers. The standard classifiers involve Random Forest, Support Vector Machines, K-Nearest Neighbour, C4.5, and PART while TSCHIEF, Proximity Forest, Time Series Forest, RISE, and BOSS are chosen as time series classifiers. Our findings show that the combination of multiple imputation and ensemble improves the performance of the majority of classifiers tested in this study, often above the performance obtained from the complete data, even under increasing missing data scenarios. This may be because the diversity injected by multiple imputation has a very favourable and stabilising effect on the classifier performance, which is a very important finding.
AB - Time series classification has become an interesting field of research, thanks to the extensive studies conducted in the past two decades. Time series may have missing data, which may affect both the representation and also modeling of time series. Thus, recovering missing data using appropriate time series-based imputation methods is an essential step. Multiple imputation is a data recovery method where it produced multiple imputed data. The method proves its usefulness in terms of reflecting the uncertainty inherit in missing data; however, it is under-researched in time series problems. In this article, we propose two multiple imputation approaches for time series. The first is a multiple imputation method based on interpolation. The second is a multiple imputation and ensemble method. First, we simulate missing consecutive sub-sequences under a Missing Completely at Random mechanism; then, we use single/multiple imputation methods. The imputed data are used to build bagging and stacking ensembles. We build ensembles using standard classification algorithms as well as time series classifiers. The standard classifiers involve Random Forest, Support Vector Machines, K-Nearest Neighbour, C4.5, and PART while TSCHIEF, Proximity Forest, Time Series Forest, RISE, and BOSS are chosen as time series classifiers. Our findings show that the combination of multiple imputation and ensemble improves the performance of the majority of classifiers tested in this study, often above the performance obtained from the complete data, even under increasing missing data scenarios. This may be because the diversity injected by multiple imputation has a very favourable and stabilising effect on the classifier performance, which is a very important finding.
KW - ensemble methods
KW - Missing data
KW - multiple imputation
KW - time series
UR - http://www.scopus.com/inward/record.url?scp=85152627178&partnerID=8YFLogxK
U2 - 10.1145/3551643
DO - 10.1145/3551643
M3 - Conference contribution
AN - SCOPUS:85152627178
VL - 17
T3 - ACM Transactions on Knowledge Discovery from Data
SP - 1
EP - 28
BT - ACM Transactions on Knowledge Discovery from Data
ER -