Using diverse data sources to impute missing air quality data collected in a resource-limited setting

Loading...
Thumbnail Image

Authors

Kebalepile, Moses Mogakolodi
Dzikiti, Loveness Nyaradzo
Voyi, Kuku

Journal Title

Journal ISSN

Volume Title

Publisher

MDPI

Abstract

The sustainable operation of ambient air quality monitoring stations in developing countries is not always possible. Intermittent failures and breakdowns at air quality monitoring stations often affect the continuous measurement of data as required. These failures and breakdowns result in missing data. This study aimed to impute NO2 , SO2 , O3 , and PM 10 to produce complete data sets of daily average exposures from 2010 to 2017. Models were built for (a) an individual pollutant at a monitoring station, (b) a combined model for the same pollutant from different stations, and (c) a data set with all the pollutants from all the monitoring stations. This study sought to evaluate the efficacy of the Multiple Imputation by Chain Equations (MICE) algorithm in successfully imputing air quality data that are missing at random. The application of classification and regression trees (CART) analysis using the MICE package in the R statistical programming language was compared with the predictive mean matching (PMM) method. The CART method performed better, with the pooled R-squared statistics of the imputed data ranging from 0.3 to 0.7, compared to a range of 0.02 to 0.25 for PMM. The MICE algorithm successfully resolved the incompleteness of the data. It was concluded that the CART method produced better reliable data than the PMM method. However, in this study, the pooled R2 values were accurate for NO2 , but not so much for other pollutants.

Description

DATA AVAILABILITY STATEMENT: Environmental data were available through the municipal offices of the cities and can be requested. The disclosure on the use of the data is a requirement of the cities. Data can be requested on behalf of cities from the Department of Environmental Affairs (DoEA). The data custodian for the DoEA is the South African Air Quality Services (SAAQS), and data can be requested through the SAAQS website using the following website link: Saaqis (environment.gov.za (accessed on 23 September 2022)).

Keywords

MICE imputation, Air quality, Missing data, Classification, Regression trees, Classification and regression trees (CART), Predictive mean matching (PMM), Multivariate imputation by chained equations (MICE), SDG-03: Good health and well-being, SDG-11: Sustainable cities and communities, SDG-13: Climate action

Sustainable Development Goals

SDG-03:Good heatlh and well-being
SDG-11:Sustainable cities and communities
SDG-13:Climate action

Citation

Kebalepile, M.M.; Dzikiti, L.N.; Voyi, K. Using Diverse Data Sources to Impute Missing Air Quality Data Collected in a Resource-Limited Setting. Atmosphere 2024, 15, 303. https://doi.org/10.3390/atmos15030303.