3.1. Calibration
The growth and productivity of wheat showed a high variability both among
cvs and across the growing seasons sown with the same
cvs (
Figure 1a and
Figure 1b).
Valgerardo and Latino were characterized by a remarkable reduced growth in some years with negative consequences on productivity. Indeed, for Valgerardo, dry biomass accumulation stopped at values ranging between 4157 kg ha-1 (T5) and 5447 kg ha-1 (T8), in 1982. The grain yield behaved accordingly, with values well below 1000 kg ha-1 for all straw treatments.
The following year, a storm that occurred just before harvest caused lodging of the plants resulting in loss of grain. Thus, data from this year were excluded from the modelling exercise here reported.
For Latino, dry biomass and productivity at harvest in 1992, remained below 5000 kg ha-1 and 1000 kg ha-1, respectively.
A fair stability of growth and productivity over the growing years was achieved by Ofanto and Appulo, with comparable values in terms of TDM (slightly higher than 10000 kg ha-1 for both) and grain yield (around 3000 kg ha-1).
Simeto and Claudio were the cvs that showed the greatest yield potential, as evidenced by the high productivity in some l years (with peaks of over 5000 kg ha-1) when compared to the remaining cvs.
However, even for these two cvs, some growing seasons proved to be critical for the growth and accumulation of biomass with limited grain yield which for Simeto fell below 2000 kg ha-1.
Ultimately, Saragolla was the cv that provided some of the highest (4508 kg ha-1 in 2021; T2) and lowest yield values (1692 kg ha-1 in 2020; T5) even if for the worst performances, the corresponding TDM was not so bad (from 11723 kg ha-1 to 13974 kg ha-1).
As for Valgerardo, a storm that occurred shortly before harvesting, heavily compromised the grain harvesting in 2001 (cv Ofanto) and 2018 (cv Claudio for T2 and T5 treatments); thus, the wheat data of these growing seasons were not taken into consideration for model parametrization.
The calibrated values achieved by “trial and error” procedure for the coefficient of parameters underlying the crop growth, concerned: i) the assimilation of CO
2; ii) conversion into biomass; iii) separation in the various organs of the plant; iv) development of the canopy and intercepted radiation; v) root length; vi) senescence (
Table 1 and
Table 2).
In addition to these parameters, the coefficients of algorithms governing the simulation of evapotranspiration (
Table 2), specific partitions for each phenological phase (
Table 2) and degree days (GDD;
Table 3) to achieve the phenological stages were also modified.
For the emergence, flowering and maturity stages, an excellent match between the observed and simulated data was reached, both in terms of similarity of values averaged for all growing seasons and in the inter-annual variability (
Table 4).
Accurate calibration of crop phenology is considered the primary, basic step in the application of crop simulation models [
42]. In our modelling exercise, emergence and flowering stages of wheat as formalized by ARMOSA, attained the highest scores, the latter being capable to capture both the averaged GDD to reach these phenological stages and variability across years.
GDD to reach maturity stage was well formalized by ARMOSA, slightly penalized by the low score of EF and the middle score of d, but well depicted by GSD and CRM figures.
The better the accuracy of a simulation model in replicating the crop phenology, the greater the ability of the same framework to capture the genetic variability underlying canopy development and biomass accumulation [
43].
The accumulation of biomass is related to the amount of radiation intercepted by the leaf surface which in turn is responsible for the conversion of the assimilated CO2 into carbohydrate which is a cultivar specific trait.
In the light of this, the coefficients of some algorithms underlying the development and senescence of the canopy, the conversion of CO
2 into dry matter, maintenance respiration and water and temperature stresses for each cultivar were changed to best fit the simulation of biomass accumulation with that gathered in
LTE (see
Table 1).
As for phenology, the calibration phase showed the goodness of ARMOSA in faithfully replicating the total dry biomass at the harvest averaged for all soil treatments (
Table 5).
Indeed, the highest score was for three out of four evaluation indices, with only a negligible deviation of GSD from the optimal value (25.77% vs 25%). By assessing the response of ARMOSA for the cropping systems separately (T2, T5 and T8), the brilliant match between observed TDM and the model output for T2 and T8 came out, with a narrow deviation from the optimal value of GSD for the former and a slight overestimation of the model for the latter. Anyway, even the response of ARMOSA in replicating T5 could be deemed satisfactory with the best performance for EF and d, but with a slight overestimation and deviation of the simulated data compared to the observed one.
The environment (Mediterranean climate) of the area under investigation is characterized by erratic pattern rainfall whit prolonged conditions of drought especially during the spring-summer period during the spring-summer period with high rainfall. Furthermore, for durum wheat cropped in Mediterranean area, the common agronomic practices do not provide for irrigation. The sum of these conditions subjects the crop to extremely variable water supply and water stress among the years and within the same growing season [44-46].
By examining the ratio between standard deviation and the mean value of
TDM, it emerged that some
cvs were more susceptible to climatic erraticism (i.e., Valgerardo, Latino, Appio) than others (Ofanto and Appulo;
Table 6).
So, a meticulous calibration of the crop coefficients related to the adaptative mechanisms to temperature and rainfall pattern and any water / temperature stresses (i.e. WSPar, TmaxCO2, TOffCO2, KET) was performed for each cv.
On 8 cvs, ARMOSA was able to accurately replicate TDM at the end of growing season for 4 of them, fairly good for 3 cvs and only for one cv the simulation was not satisfactory.
It should be noted that for Saragolla, we investigated only 3 growing seasons (from 2019 to 2021) and this led to a reduced number of observations not adequate to optimize ARMOSA's response for this cv.
Simeto and Valgerardo resulted the cvs for which ARMOSA accurately simulated both the inter-annual variability and the average TDM observed in the field, with a slight overestimation for Simeto.
For the remaining cvs there was a mixed response; for some of them ARMOSA was efficient in replicating the biomass accumulation at harvest, returning negligible differences between the observed and simulated mean data, but less effective in capturing the variability between the various years (see GSD, EF and d for Appulo, Claudio and Ofanto).
For other cvs, the simulations comprehensively caught the inter-annual variability (i.e., Claudio and Latino) but overestimated or underestimated the average trend of TDM.
The cropping systems carried out in LTE, were characterized by the release of straw and their incorporation into the soil, differentiating for the supplement or not of nitrogen and water.
Definitively, by analysing the response of ARMOSA in simulating TDM at harvest, it emerged as the calibration process correctly trained the cropping system model to effectively replicate the data observed in the field across LTE under P_30 treatments.
Thus, the correct estimate of TDM by ARMOSA and therefore of biomass incorporated in the soil was the first key point for an adequate simulation of TOC dynamic.
In previous studies ARMOSA was calibrated and validated on a wide range of climate and soil conditions throughout Europe, conventional systems, and CA simulating TOC dynamics with very good or even excellent results (Valkama et al., 2020).
Thus, the calibration step for the TOC dynamic focused only on two parameters controlling the evolution of soil organic matter, namely Khumus (1.4*10-4) and CMicrobEfficiency (0.4), leaving all the other parameters unchanged.
ARMOSA replicated the dynamics of
TOC quite agreeably, attaining the "Good" score for all the treatments under investigation (
Table 7;
Figure 2). This result was reached thanks to the accurate estimate of mean value of
TOC (averaged for all treatments; 64965 vs 64758 kg ha
-1,
Table 7).
Although
CRM index indicated a perfect alignment of the simulated values with the measured ones, it should be noted that ARMOSA tended to slightly underestimate the data collected in the initial course of
LTE and then overestimate the data in the central part of
LTE (
Figure 2).
It was not possible to measure the robustness of ARMOSA in formalizing TOC dynamics of in the last part of LTE because of the lack of soil sampling, which instead occurred in the validation phase (see in the next section).
The high variability of the measured
TOC both between consecutive years and within the same sampling (high standard deviation) is highlighted (
Figure 2).
The source of this erraticism could derive from a series of conditions associated to the sampling time and sampling point. The sampling dates over the years occurred between the beginning of September and the end of November; in this period straw could still be intact (i.e., early September) or already partially degraded (i.e., late November), state also related to the moment of their burial with respect to the soil sampling. This could affect the amount of organic matter and organic carbon in the shallow layers of soil as well as the sampling point which could be affected by the substantial content (and dynamic) of crop residues [
47].
This may explain the reduced matching between the measured and simulated variability of TOC (low EF and d score), although AROMSA formalized a high variability of this variable between the beginning and the end of growing period (due to the degradation dynamic of straw).
Contrasting results were obtained in the simulation of the grain yield (
Table 8).
Although the total score of yield simulated averaged for all treatments was "Fair", only for T2 was achieved a good result, while for the other two treatments the outcome was not adequate.
This pattern was consequently confirmed also for the simulated yield of the several cvs. Out of 8
cvs, half did not achieve a satisfactory score, three obtained a fairly good score and only one reached the maximum score (
Table 9).
GSD ranged from a minimum of 24.45% for Latino to a maximum of 66.51 % for Claudio. The latter had a low fitting in the calibration test with EF (-9.93) and CRM (-0.23), which were the worst among the simulated varieties. Apart Latino, calibration of Simeto allowed to reach satisfactory performance in terms of EF (0.1) and d (0.77), followed by Valegerardo (0.18 and 0.83 for EF and d, respectively).
The poor result of Saragolla should also be shown, with EF and d far from the optimum values, even if simulation of the mean yield was aligned with the observed data (CRM of 0.03).
Calibration of ARMOSA was focused on the parameters controlling the partition of the biomass between the different organs, therefore the grain and the maintenance respiration of the same (
Table 2).
The observed data showed that grain yield was not linearly related to the biomass produced at harvest.
Several authors achieved poor performance when calibrating crop simulation models on wheat yield across different sites, years and cultivars, especially in hot-arid environments.
Specifically, some authors claimed that the grain production depends on genetic coefficients that are not only site-specific [
48] but also year-specific [49-50].
Our results after the calibration of ARMOSA confirm what was reported by [
51] who stated that it was difficult to accurately predict the production of wheat with low levels and / or in environments characterized by high temperatures.
The simulation of grain production becomes pernicious when situations of water and / or thermal stress occur during seed formation [
52].
In the climatic condition of the experimental site of LTE, there are frequent situations of low rainfall and heat waves that have heavily compromised the potential productivity of the crop. Not to mention short but intense storms and strong gusts of wind that led to the lodging of the crop.
These extreme events which occur during seed filling, which significantly impact the final yield are hardly formalized by crop growth simulation models [
53].
However, the 1:1 regression line between observed and simulated data (
Figure 3) showed the good aptitude of ARMOSA in capturing the variability of the average grain yield among
cvs (
Table 8), with R2 of 0.82 and angular coefficient of 1.06.
3.1. Validation
The good robustness of ARMOSA in the simulation of phenology was also confirmed in the validation step, with maximum scores reached for the emergence and flowering phases.
Even if formalization of maturity stage did not reach the degree of excellence (
EF of -1.05 and
CRM of 0.46), ARMOSA was aligned with the observed mean value (156 days vs 155 days;
Table 10).
Indication on the reliability of ARMOSA in replicating the productivity of the
cvs (Simeto, Claudio and Saragolla) along validation process were drawn from the results of the 1: 1 regression (
Figure 4a;
Table 11).
The average value of grain yield of Claudio was aligned between the model output and the observed data (4300 kg ha-1 vs 4392 kg ha-1). Although the standard deviation was much higher in ARMOSA compared to the LTE data, the model reasonably captured the observed variability among years (see dispersion around the 1:1 regression line). What turned out to be off-scale were the outcomes related to a single growing season for NT and MT, in which the simulated values (8154 kg ha-1, as mean) were much higher than the observed productivity (4565 kg ha-1).
For Saragolla, ARMOSA was inclined to slightly underestimate the actual yield (β = 0.92) but with an excellent fit between simulated and observed data (R2 = 0.99), even if the compared growing seasons were only two for a total of four yield productivity figures.
For Simeto, the overestimation of grain production by ARMOSA was around 24% (3267 kg ha-1 vs 4416 kg ha-1). As for Claudio, a very high inconsistency between the output and the actual grain yield was observed for one growing season (2349 kg ha-1 vs 5919 kg ha-1 as mean), but definitively Simeto proved to be the trickiest cv for ARMOSA (although not so dramatically) of validation phase.
Evaluating ARMOSA overall for NT and MT treatments, the tendency of the model to slightly overestimate (+ 10%) the observed grain productivity was highlighted, to which was added a larger variability generated by the model, as computed by the coefficient of variation (ratio between the standard deviation and the mean) which was approximately 35% for ARMOSA and 26% for LTE.
Summing up the results obtained during the testing of ARMOSA, it was shown that the model tends to slightly overestimate the yield, with a broader sensitivity in modulating the crop performance to different climate patterns (CV = 34%) with respect to the actual plant dynamics (CV = 25%).
Testing the response of ARMOSA in formalizing
TOC (
Figure 4b), it emerged how the model responded differently to the two soil treatments (
NT and
MT) and aligning the outputs with what was observed during
LTE.
Indeed, in LTE, TOC went from about 51000 kg ha-1 at the beginning of the experimental test (2002) to 63200 kg ha-1 in NT and 55800 kg ha-1 in MT, respectively, in 2020.
ARMOSA did not go far from the observed data, returning for 2020 TOC value of 63045 kg ha-1 and 65247 kg ha-1 for NT and MT, respectively.
This opposes when comparing the simulated and observed data for some of the several experimental years (i.e., 2015 and 2019), in which substantial differences were recorded among ARMOSA outputs and actual soil TOC content.
This is because
TOC determined by laboratory analysis is strongly affected by the organic substance deriving from the total or partial degradation of crop residues, the content of which can be extremely variable depending on the sampling point [
47].
This also explains the extreme variability of the figures (see standard deviation in
Figure 4b) observed for each sampling, both in
NT and
MT.
In the light of that, ARMOSA can be considered reliable in the simulation of
TOC fluctuation, particularly if one considers the evolution over a period long enough to capture the correct dynamics of
TOC under different crop systems [
54].