TY - JOUR
T1 - Novel methods for imputing missing values in water level monitoring data
AU - Khampuengson, Thakolpat
AU - Wang, Wenjia
N1 - Acknowledgements: The authors would like to thank the Hydro-Informatics Institute of Ministry of Higher Education, Science, Research and Innovation, Thailand, for providing the scholarship and the data for Thakolpat Khampuengson to do his PhD at the University of East Anglia.
PY - 2023/1
Y1 - 2023/1
N2 - Hydrological data are collected automatically from remote water level monitoring stations and then transmitted to the national water management centre via telemetry system. How- ever, the data received at the centre can be incomplete or anomalous due to some issues with the instruments such as power and sensor failures. Usually, the detected anomalies or missing data are just simply eliminated from the data, which could lead to inaccurate analysis or even false alarms. Therefore, it is very helpful to identify missing values and correct them as accurate as possible. In this paper, we introduced a new approach - Full Subsequence Matching (FSM), for imputing missing values in telemetry water level data. The FSM firstly identifies a sequence of missing values and replaces them with some constant values to create a dummy complete sequence. Then, searching for the most similar subsequence from the historical data. Finally, the identified subsequence will be adapted to fit the missing part based on their similarity. The imputation accuracy of the FSM was evaluated with telemetry water level data and compared to some well-established methods - Interpolation, k-NN, MissForest, and also a leading deep learning method - the Long Short-Term Memory (LSTM) technique. Experimental results show that the FSM technique can produce more precise imputations, particularly for those with strong periodic patterns.
AB - Hydrological data are collected automatically from remote water level monitoring stations and then transmitted to the national water management centre via telemetry system. How- ever, the data received at the centre can be incomplete or anomalous due to some issues with the instruments such as power and sensor failures. Usually, the detected anomalies or missing data are just simply eliminated from the data, which could lead to inaccurate analysis or even false alarms. Therefore, it is very helpful to identify missing values and correct them as accurate as possible. In this paper, we introduced a new approach - Full Subsequence Matching (FSM), for imputing missing values in telemetry water level data. The FSM firstly identifies a sequence of missing values and replaces them with some constant values to create a dummy complete sequence. Then, searching for the most similar subsequence from the historical data. Finally, the identified subsequence will be adapted to fit the missing part based on their similarity. The imputation accuracy of the FSM was evaluated with telemetry water level data and compared to some well-established methods - Interpolation, k-NN, MissForest, and also a leading deep learning method - the Long Short-Term Memory (LSTM) technique. Experimental results show that the FSM technique can produce more precise imputations, particularly for those with strong periodic patterns.
KW - Water level telemetry monitoring
KW - Missing Data
KW - Imputation
KW - Missing data imputation
KW - Time series
KW - Incomplete subsequence
UR - http://www.scopus.com/inward/record.url?scp=85145678525&partnerID=8YFLogxK
U2 - 10.1007/s11269-022-03408-6
DO - 10.1007/s11269-022-03408-6
M3 - Article
VL - 37
SP - 851
EP - 878
JO - Water Resources Management
JF - Water Resources Management
SN - 0920-4741
IS - 2
ER -