Abstract:
Over the last decade, advanced statistical inference
and machine learning have been used to fill the gaps in sparse
surface ocean CO2 measurements (Rödenbeck et al., 2015).
The estimates from these methods have been used to constrain
seasonal, interannual and decadal variability in sea–air
CO2 fluxes and the drivers of these changes (Landschützer
et al., 2015, 2016; Gregor et al., 2018). However, it is also
becoming clear that these methods are converging towards
a common bias and root mean square error (RMSE) boundary:
“the wall”, which suggests that pCO2 estimates are now
limited by both data gaps and scale-sensitive observations.
Here, we analyse this problem by introducing a new gapfilling
method, an ensemble average of six machine-learning
models (CSIR-ML6 version 2019a, Council for Scientific
and Industrial Research – Machine Learning ensemble with
Six members), where each model is constructed with a twostep
clustering-regression approach. The ensemble average is
then statistically compared to well-established methods. The
ensemble average, CSIR-ML6, has an RMSE of 17.16 μatm
and bias of 0.89 μatm when compared to a test dataset kept
separate from training procedures. However, when validating
our estimates with independent datasets, we find that
our method improves only incrementally on other gap-filling
methods.We investigate the differences between the methods
to understand the extent of the limitations of gap-filling estimates
of pCO2. We show that disagreement between methods in the South Atlantic, southeastern Pacific and parts of
the Southern Ocean is too large to interpret the interannual
variability with confidence. We conclude that improvements in surface ocean pCO2 estimates will likely be incremental
with the optimisation of gap-filling methods by (1) the inclusion
of additional clustering and regression variables (e.g.
eddy kinetic energy), (2) increasing the sampling resolution
and (3) successfully incorporating pCO2 estimates from alternate
platforms (e.g. floats, gliders) into existing machinelearning
approaches.