The genesis of ML methods in pedology traces back to 1980’s, when it was first applied in pedometrics whereby ML data-driven methods could be applied in the modelling and prediction of soil fertility. Presented here is a briefing on a few previous of these works from 2010 to date (2022). These works addressed a range of ML tasks from classifying soil properties into classes of very low, low, moderate, high and very high fertility, to predicting unknown values. Whereas it is best practice to use as many possible algorithms, with all possible available principle parameters in order to perform an exhaustive evaluation so as to attain good analytical results and final model(s). [
87] compared the performance of J48, KNN, JRip, NB, SVM, ANN classification algorithms by using PH, EC, N, P, K, OC, S, Fe, Mn, and Zn input variables of soil dataset to predict soil fertility as ‘fertile’ or ‘not fertile’, whereby JRIP scored maximum accuracy of 97%. In another study by [
101], data from Vellore soil testing laboratory with soil attributes PH, EC, Fe, Zn, Mn, Cu, OC, P, K, and fertility index (FI) as ‘ideal’ or ‘not ideal’ were utilized to perform experiments of training various bagging, boosting, and stacking ensemble classifiers, were they pre-processed the data, extracted relevant features as a means to achieve better performance, and attained an accuracy of 98.15% by boosting the decision tree like C5.0 algorithm. A versatile method for rapid and accurate determination of soil fertility for sugarcane production was developed in [
102], whereby the soil fertility index was established and modelled independently using boosted decision trees with the use of soil attributes PH, OM (OC), Ca, and Mg, Aluminium used in place of B due to their study finding high correlation between the two, whereby they achieved AUC scores of 0.76, 0.67 and 0.65 for the respective fertility classes ‘highly fertile’, ’fertile’, or ‘least fertile’ prediction. In another work, the Random Forest was used to develop a model that was used as part of the work to predict soil’s OC, N, P, K, Ca, Mg, Na, Fe, Mn, Cu, Al nutrients fertilities and use the information to understand the edaphic drivers of soil constraints to very extreme high or near zero yields and heterogeneity across Africa, to guide in nutrients-specific interventions, they could find that soil factors could explain 72% of the variations in yields [
103]. [
104] developed a hybrid classification model by using a Decision Tree Classifier to isolate the soil’s PH, EC, OC, N, P, K, S, Zn, Fe, Cu, Mn, and B dependent features and used Naïve Bayes classification on the independent features to predict the fertilities for the primary properties (PH, EC, OC, N) with individual naïve Bayes, and decision tree respective performances of 69.9%, 90.43%, and 99.93% for the DT-NB independent featured hybrid. While they macro P, K, S, Zn, nutrients were respectively predicted at 38%, 88%, 97% accuracies, the micro Fe, Cu, Mn, B nutrients levels were predicted at 42%, 83%, 99.93% accuracies, respectively. [
105] examined soil micro and macro nutrients EC, K, pH, Mn, Zn, S, P, B, OC using machine learning to grade soil nutrients, and they applied various classification algorithms and found that random forest had the highest accuracy score as compared to support vector machine and Gaussian naïve Bayes in predicting the soil classes for suitable crop plantation. Likely, [
106] used PH, EC, OC, P, K, Fe, Zn, Mn, Cu to implement machine learning models for predicting soil fertility as low, high or medium using Support Vector Machine, nearest neighbor, Naïve Bayes, and Decision Tree that scored 60%. Also, [
107] implemented machine learning models for automatically predicting the Indian state of Maharashtra village-wise fertility indices of organic carbon (OC), phosphorus pentoxide (P2O5), iron (Fe), manganese (Mn), and zinc (Zn) by using 76 methods belonging to 20 families including neural networks, deep learning, support vector regression, random forests, partial least squares, bagging and boosting, quantile regression and generalized additive models, among many others. Altogether, as per the Government of India standard fertility levels, the prediction of nutrients fertility indices as low, medium or high achieved the utmost best performance through the ensemble of extremely randomized trees (extraTrees), the results of which corresponded to accuracy (Acc) and Cohen kappa values of (Acc= 86.45% Kappa= 69.60%), (Acc= 79.03% Kappa= 56.19%), (Acc= 79.46% Kappa= 52.51%), (Acc= 86.13% Kappa= 71.08%), (Acc= 97.63% Kappa= 81.03%) for OC, Fe, P2O5, Mn, and Zn, respectively, which is considerably fairly accurate. Other best performing models were those generated through regularized random forest, random forests, and random forest with feature selection, last but not least good performances were obtained from gradient boosting of regression trees (bstTree) and generalized boosting regression (gbm); quantile random forest, M5 rule-based model with corrections based on nearest neighbors (cubist) and support vector regression (svr). In another study, [
108] designed an intelligent soil PH, OC, EC, P, K, B nutrient and pH classification using weighted voting ensemble deep learning (ISNpHC-WVE) technique. Such classifications were employed in generating village-wise fertility indices analyses, and they are applied for making fertilizer recommendations using the decision support systems.. In addition, three deep learning (DL) models namely gated recurrent unit (GRU), deep belief network (DBN), and bidirectional long short term memory (BiLSTM) were used for the predictive analysis. Moreover, a weighted voting ensemble model was employed which allows a weight vector on every DL model of the ensemble depending upon the attained accuracy on every class. Furthermore, [
109] used different classification algorithms to predict fertility rate based on soil’s PH, EC, Fe, Cu, Zn, OC, P, K. Whereby, J48 classifier performed better in predicting fertility index for six (6) classes very low, low, medium, medium high, high, very high with 98.17% accuracy, while naïve bayes and random forest had respective performances of 77.18% ,and 97.92%, their observation generally showed fertility rate for Aurangabad district to be medium. In another study, [
110] projected a comparative analysis of Naïve Bayes, JRip and J48 ML algorithms by using soils data with attributes PH, EC, OC, P, K, Fe, Zn, Mn, Cu, it was observed that JRip classification algorithm gave better results compared to the other two algorithms, whereby it achieved an accuracy of 91.9% and therefore it was recommended to predict six(6) soil classes very high, high, moderately high, moderate, low, and very low. Last but not least, a study by [
2] was also useful in providing information on soil features, and algorithms of interest whereby PH, EC, N, OC, P, Ca, Mg, Na, K, Fe, Mn, Cu, and Zn could be observed key features these of which were modelled using naïve Bayes and random forest trees as part of a task to numerically classify a portion of Kilombero Valley soil clusters in Tanzania. Last, but not list, in [
111] a novel 2-Stage Hybrid Ensemble Based Heterogeneous Committee Machine for Improving Soil Fertility Status Prediction Performance was developed. Specifically, agricultural soil properties to be attributed as features for soil analysis and fertility prediction by using machine learning algorithms were identified and modelled following a feature selection as OC, pH, EC, TN, P, Ca, K, Mg, Na, S, Mn, Al, Zn, Fe, B. Then machine learning K-Mean clustering algorithm with K-elbow was used to categorize available distinct soil fertility status target classes based crop yields as an index to fertility. Finally heterogeneous hybrid classifiers were evaluated to build a weighted voting ensemble (WVE) with improved prediction performance, by combining the judgments of class probability predictions from the individual hybrid classifiers through optimization in a novel brute based 1EXP(-)Z+ multi precision search spaces for guaranteeing optimality finding. Whereby the K-mean hybrid based WVE combination of GB, RF, SV, KN, DT was the best alternative with accuracy of 98.93% and Cohen Kappa 93.98% on test data, Furthermore, the solution in [
111] achieved through ROC analysis AUC score of 0.87, 0.83 and 0.82 for the respective low, medium and high fertility target classes. These results which showed improvement as compared to models in other studies as shown in
Table 2 that provides a summary of the reviewed studies related to application of machine learning in soil chemical properties modelling.