Abstract:
Understanding the relationships between crop yields, soil properties, weather patterns and input applications is
important for optimizing agricultural production. Data variation analysis using statistical and machine learning
(ML) approaches can help identify and understand the practices that optimize yield. The objectives of this study
were (i) to evaluate the predictive accuracy of selected ML models for estimating grain yields in on-farm maize
(Zea mays L.) trials with different combinations of seeding and fertilizer rates in a commercial field, and (ii) to
investigate the ability of ML models to assist in identifying yield-limiting factors in the same field. Multiple linear
regression, multilayer perceptron, decision tree, and random forest (RF) ML models were trained and tested using
crop management and soil from a data-intensive farm management (DIFM) trial and remotely sensed data. The
dataset consisted of multiple subplot treatment observations of crop management, soil properties and normalized
difference vegetation index (NDVI), linked to final grain yield for the 2019/2020 and 2020/2021 seasons. The RF
had the best combination of high correlation (R2 = 0.69 and 0.80) and low error (MAPE = 5.4 and 8.4% and
RMSE = 0.69 and 0.95 t ha− 1
) when compared to other models for both seasons. Feature importance analysis
revealed that urea application was consistently the most critical variable and explained yield variations to the
greatest extent, whereas soil phosphorus (P), plant population, and sodium in 2020, and soil P, soil pH, clay
content, and plant population in 2021 emerged as the most influential factors for explaining yields. This study
concluded that the RF model was the best for spatial yield predictions using DIFM trial datasets. There was also
variability between seasons in yield limiting factors resulting from temporal variations in growing conditions. To
effectively apply insights from yield prediction models, it is crucial that the variables incorporated into these
models have a significant connection to yield and the findings can be translated into actionable management
decisions. The DIFM trials combined with ML can play an important role in advancing the field of precision
agriculture by providing valuable insights into the complex interactions between crops, soils, and management
practices, and identifying new opportunities for improving crop yields and environmental sustainability.