Abstract

Air pollution is a global problem. The assessment of air pollution concentration data is important for evaluating human exposure and the associated risk to health. Unfortunately, air pollution monitoring stations often have periods of missing data or do not measure all pollutants. In this study, we experiment with different approaches to estimate the whole time series for a missing pollutant at a monitoring station as well as missing values within a time series. The main goal is to reduce the uncertainty in air quality assessment. To develop our approach we combine single and multiple imputation, nearest neighbour geographical distance methods and a clustering algorithm for time series. For each station that measures ozone, we produce various imputations for this pollutant and measure the similarity/error between the imputed and the real values. Our results show that imputation by average based on clustering results combined with multiple imputation for missing values is the most reliable and is associated with lower average error and standard deviation.

Original languageEnglish
Title of host publicationHybrid Artificial Intelligent Systems
EditorsEnrique Antonio de la Cal, José Ramón Villar Flecha, Héctor Quintián, Emilio Corchado
Place of PublicationCham
PublisherSpringer International Publishing AG
Pages585-597
Number of pages13
ISBN (Print)978-3-030-61705-9
DOIs
Publication statusPublished - 4 Nov 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12344 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Keywords

  • Air quality
  • Imputation
  • Time series clustering
  • Uncertainty

Cite this