Abstract:
The sustainable operation of ambient air quality monitoring stations in developing countries
is not always possible. Intermittent failures and breakdowns at air quality monitoring stations often
affect the continuous measurement of data as required. These failures and breakdowns result in
missing data. This study aimed to impute NO2
, SO2
, O3
, and PM 10 to produce complete data sets
of daily average exposures from 2010 to 2017. Models were built for (a) an individual pollutant at a
monitoring station, (b) a combined model for the same pollutant from different stations, and (c) a
data set with all the pollutants from all the monitoring stations. This study sought to evaluate the
efficacy of the Multiple Imputation by Chain Equations (MICE) algorithm in successfully imputing
air quality data that are missing at random. The application of classification and regression trees
(CART) analysis using the MICE package in the R statistical programming language was compared
with the predictive mean matching (PMM) method. The CART method performed better, with the
pooled R-squared statistics of the imputed data ranging from 0.3 to 0.7, compared to a range of 0.02
to 0.25 for PMM. The MICE algorithm successfully resolved the incompleteness of the data. It was
concluded that the CART method produced better reliable data than the PMM method. However, in
this study, the pooled R2 values were accurate for NO2
, but not so much for other pollutants.
Description:
DATA AVAILABILITY STATEMENT: Environmental data were available through the municipal offices of the cities and can be requested. The disclosure on the use of the data is a requirement of the cities. Data can be requested on behalf of cities from the Department of Environmental Affairs (DoEA). The data custodian for the DoEA is the South African Air Quality Services (SAAQS), and data can be requested through the SAAQS website using the following website link: Saaqis (environment.gov.za (accessed on 23 September 2022)).