Evaluating machine learning models and identifying key factors influencing spatial maize yield predictions in data intensive farm management

Show simple item record

dc.contributor.author Maseko, Simphiwe Khulekani
dc.contributor.author Van der Laan, Michael
dc.contributor.author Tesfamariam, Eyob Habte
dc.contributor.author Delport, Marion
dc.contributor.author Otterman, H.
dc.date.accessioned 2025-01-17T07:38:09Z
dc.date.available 2025-01-17T07:38:09Z
dc.date.issued 2024-07
dc.description DATA AVAILABITY STATEMENT: Data will be made available on request. en_US
dc.description.abstract Understanding the relationships between crop yields, soil properties, weather patterns and input applications is important for optimizing agricultural production. Data variation analysis using statistical and machine learning (ML) approaches can help identify and understand the practices that optimize yield. The objectives of this study were (i) to evaluate the predictive accuracy of selected ML models for estimating grain yields in on-farm maize (Zea mays L.) trials with different combinations of seeding and fertilizer rates in a commercial field, and (ii) to investigate the ability of ML models to assist in identifying yield-limiting factors in the same field. Multiple linear regression, multilayer perceptron, decision tree, and random forest (RF) ML models were trained and tested using crop management and soil from a data-intensive farm management (DIFM) trial and remotely sensed data. The dataset consisted of multiple subplot treatment observations of crop management, soil properties and normalized difference vegetation index (NDVI), linked to final grain yield for the 2019/2020 and 2020/2021 seasons. The RF had the best combination of high correlation (R2 = 0.69 and 0.80) and low error (MAPE = 5.4 and 8.4% and RMSE = 0.69 and 0.95 t ha− 1 ) when compared to other models for both seasons. Feature importance analysis revealed that urea application was consistently the most critical variable and explained yield variations to the greatest extent, whereas soil phosphorus (P), plant population, and sodium in 2020, and soil P, soil pH, clay content, and plant population in 2021 emerged as the most influential factors for explaining yields. This study concluded that the RF model was the best for spatial yield predictions using DIFM trial datasets. There was also variability between seasons in yield limiting factors resulting from temporal variations in growing conditions. To effectively apply insights from yield prediction models, it is crucial that the variables incorporated into these models have a significant connection to yield and the findings can be translated into actionable management decisions. The DIFM trials combined with ML can play an important role in advancing the field of precision agriculture by providing valuable insights into the complex interactions between crops, soils, and management practices, and identifying new opportunities for improving crop yields and environmental sustainability. en_US
dc.description.department Plant and Soil Sciences en_US
dc.description.sdg SDG-02:Zero Hunger en_US
dc.description.sdg SDG-13:Climate action en_US
dc.description.uri https://www.sciencedirect.com/journal/european-journal-of-agronomy en_US
dc.identifier.citation Maseko, S., Van der Laan, M., Tesfamariam, E.H. et al. 2024, 'Evaluating machine learning models and identifying key factors influencing spatial maize yield predictions in data intensive farm management', European Journal of Agronomy, vol. 157, art. 127193, pp. 1-11, doi: 10.1016/j.eja.2024.127193. en_US
dc.identifier.issn 1161-0301 (print)
dc.identifier.issn 1873-7331 (online)
dc.identifier.other 10.1016/j.eja.2024.127193
dc.identifier.uri http://hdl.handle.net/2263/100127
dc.language.iso en en_US
dc.publisher Elsevier en_US
dc.rights © 2024 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). en_US
dc.subject Precision agriculture en_US
dc.subject Yield limiting factors en_US
dc.subject Yield variability en_US
dc.subject SDG-02: Zero hunger en_US
dc.subject SDG-13: Climate action en_US
dc.subject Random forest (RF) en_US
dc.title Evaluating machine learning models and identifying key factors influencing spatial maize yield predictions in data intensive farm management en_US
dc.type Article en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record