Abstract:
A predictive long short-term memory (LSTM) model developed on a particular water quality
dataset will only apply to the dataset and may fail to make an accurate prediction on another dataset.
This paper focuses on improving LSTM model tolerance by mitigating discrepancies in model prediction
capability that arises when a model is applied to different datasets. Two predictive LSTM models are
developed from the water quality datasets, Baffle and Burnett, and are optimised using the metaheuristic
genetic algorithm (GA) to create hybrid GA-optimised LSTM models that are subsequently combined with
a linear weight-based technique to develop a tolerant predictive ensemble model. The models successfully
predict river water quality in terms of dissolved oxygen concentration. After GA-optimisation, the RMSE
values of the Baffle and Burnett models decrease by 42.42% and 10.71%, respectively. Furthermore, two
ensemble models are developed from the GA-hybrid models, namely the average ensemble and the optimal
weighted ensemble. The GA-Baffle RMSE values decrease by 5.05% for the average ensemble and 6.06% for
the weighted ensemble, and the GA-Burnett RMSE values decrease by 7.84% and 8.82%, respectively. When
tested on unseen and unrelated datasets, the models make accurate predictions, indicating the applicability
of the models in domains outside the water sector. The consistent and similar performance of the models
on any dataset illustrates the successful mitigation of discrepancies in the predictive capacity of individual
LSTM models by the proposed ensemble scheme. The observed model performance highlights the datasets
on which the models could potentially make accurate predictions.