1. Introduction
Harvesting Hass avocados is challenging due to its complex physiology and accumulation of solid material during fruit development. Many studies note that it is necessary to have a methodology or adopt technologies that ensure that avocado quality is consistent across all stages of the supply chain [
1]. Researchers have reported that preharvest factors influence the fruit composition of Hass avocados [
2]. The dry matter (DM) of fruit pulp is the most widely used indicator internationally to determine quality attributes and minimize defects in fruit pulp [
3]. There are other harvest maturity indicators that have been used to evaluate quality attributes, with the oil content (OC) being the most studied to identify the ideal harvest time. However, Lee et al. noted that due to low costs and rapid determination, DM can be considered the standard indicator for harvest maturity due to its close relationship with OC in the fruit [
4].
Nevertheless, destructive techniques are time-consuming and fail to collect the amount of variability present in orchards, which is why nondestructive alternatives such as near-infrared spectroscopy (NIRS) could be useful for predicting maturity parameters quickly [
5]. NIRS techniques allow the evaluation of internal and superficial parameters in fruits through a process of illuminating the sample with radiation and measuring the radiation that is reflected, absorbed, or transmitted during the path it exerts through the sample [
6].
When using measurement equipment where the response variable is reflectance, the change in harmonic vibrations that occur in the region (Vis-NIRS) is represented and stored as a record of reflectance (1/R) versus wavelength [
7]. Most studies perform NIRS measurements directly on the skin (exocarp), as mentioned by Walsh et al. in their studies in which they highlight that the scans are performed in diffuse reflection or interaction mode to prevent any damage; this method is considered a nondestructive technique, allowing NIRS radiation to penetrate inside and capture all the properties of the fruit pulp [
8].
Several multivariate studies have been conducted to evaluate the implementation of nondestructive techniques that identify harvest maturity indices in Hass avocados. However, due to the dry matter gradient present in the fruits, determining all the variability is essential for obtaining reliable results. The first studies carried out by Schroeder (1985) showed that there is a pronounced DM gradient that is dispersed throughout the fruit [
9]. This internal variability can be associated with development and maturation problems for the fruit on the tree. It has also been reported that there is a gradual decrease in DM from the end of the peduncle toward the interior of the fruit near the seed and from the sides to the interior of the fruit [
10].
However, the different studies developed to predict DM by NIRS do not establish a clear methodology that allows recovery of all the variability present in the fruit. Wedding et al. (2013) implemented two scans per fruit over the range of the peduncle to the basal zone (equator), finding that the performance of the predictive model for DM stabilizes only if several seasons or harvests are considered to collect a greater variability in fruits [
11]. Other researchers (Blakey, 2016; Ncama et al., 2018) carried out between four and six equidistant scans around the equator zone of each fruit to evaluate nondestructive models and predict the DM in Hass avocado fruits [
12,
13], and their methodology was unlike the methodology implemented by Olarewaju et al., 2016, who implemented two scans in the equator zone after rotating the fruit 180° and averaged the spectra to perform the prediction models [
5]. Moreover, in Mexico, nondestructive studies were carried out with scans on the peduncle to obtain more reliable results than those evaluated in the equatorial zone due to the interference of the DM gradient present in the seed [
14]. Despite efficient models developed for DM quantification, no focus has been placed on the robustness of these models according to the zone of the fruit. Therefore, the aim of this work was to determine a methodology to capture the variability presented within the same fruit using NIRS.
3. Results and Discussion
The DM values range between 23.41% and 25.73%, with an average value of 24.65% and a standard deviation of 0.65%. The repartition of the values shows two populations and is in concordance with the discussion that will be presented below (
Figure 2). This repartition is due to fruits F1, F7, and F9, which present average DM contents lower than 24%.
The ANOVA (two factors: zone and fruit) of the DM shows (
Table 1) evidence that there is a fruit effect and a zone effect inside Hass avocado samples, with a “fruit” effect more significant than a zone effect based on type III errors.
A Newman‒ Keuls (SNK) test comparing the differences between the factor “fruit” with a confidence interval of 95% confirms that F3, F8, F1, F9, and F7 are different from all other fruits and that F4, F5, and F6 are not significantly different (same DM), and the same result occurs for F4, F6, and F10 and for F10 and F2 (
Figure 3a). A Newman‒Keuls (SNK) test comparing the differences between the levels of “zone” with a confidence interval of 95% confirms that the 3 zones are significantly different, with average values ranging from 24.47% (equator), 24.68% (base) and 24.79% (peduncle) (
Figure 3b).
There is a DM gradient within the fruit, with higher DM content on the peduncle followed by the equator and base zones. The variability within the fruit was shown by Phetsomphou (2000) and Wedding et al. (2010) [
19,
20] not only in the P, E, and B zones but also in the outer, middle, or inner fruit. Therefore, this gradient affects the robustness of both destructive and nondestructive analyses for DM quantification. The literature does not show a clear trend in the direction of the gradient. Moreover, there is a lack of information that explains the phenomenon well. Schroeder (1985) indicated that these gradients could be related to complex fruit development physiology [
9]. There are many differences in structure and metabolic activity that eventually are demonstrated within the avocado fruit. Moreover, the gradient within the fruit and the variability between fruits increase the difficulty of guaranteeing the minimum dry matter needed to export fruits. According to Rodríguez et al. (2018), almost 80% of fruit samples must have a DM higher than 24% to have good postharvest quality in the international market. The results show that the samples in this study have 70% to 24% DM [
18].
Regarding methodologies applied to build NIRS predictive models of Hass avocado DM estimation, almost all the authors scanned the fruit in the equatorial zone two or more times [
5,
11,
15,
21]. In other fruits, the same methodology has been employed in that only the equator zone is scanned [
22,
23]. Although the DM gradient within the fruit has been published for Hass avocados, NIRS models are considered destructive analyses of the outer (0.5-1.0 cm) layer of the mesocarp and skin; thus, the analysis could have a bias with respect to the whole fruit DM content. P.P. Subedi and K.B. Walsh (2020) found a difference of 4% in DM between the outer and inner parts of Hass avocados [
15]. The authors developed predictions despite this high variation. In terms of fruit development, 4% dry matter implies a difference in fruit age of almost 40 days according to Rodríguez et al. (2018) [
18]. We found a nonsystematic bias in the dry matter of +/- 4% in the analysis of the DM with commercial, portable NIRS in Colombia (data not shown). Therefore, these devices do not allow efficient sorting of fruit in orchards or packing houses.
PCA was performed on the raw data, log (1/R), for the whole wavelength range, and the dimension of the input matrix is n = 90 and p = 1050. The first two PCs explained 94% of the total inertia, and 88% and 6%, respectively. The presentation of the sample scores for the first two PCs highlights differences between zones (
Figure 4), which indicates that there is high variability within spectra due to the zone of the fruits. These groups have not been reported before, although there are many publications on the use of NIRS technology to analyze DM in Hass avocados.
The loadings associated with PC1 show two main peaks in the NIR region: 1450 nm and 1918 nm, which correspond to water (O-H) absorption bands (
Figure 5). The results agree with those of other works that indicated that for Hass avocados, the main peaks are closely associated with the H-O-H stretching modes of water [
5].
The RMSi for each zone (
Figure 6) shows that despite some outlier spectra (with high RMS values), all RMSi values are of the same order.
The RMSi average values are given per zone (for all the fruit), an ANOVA on the RMSi values with zone as factor confirms that there is a significant effect (α = 5%) of the zone of the fruit on spectra variability (Pr>F = 0.043), the contrast test (Newman‒Keuls, SNK) confirms a difference between peduncle and equator, and no difference between peduncle and base and no difference between base and equator occur (
Figure 7).
These results associated with DM content observations suggest that at least one spectrum per zone is reasonable for determining the variability in the whole fruit.
The RMSi method of calculating dispersion is commonly used in NIRS analysis and is known as the root mean square error or root mean square deviation. It is a measure of the variability in a group of spectra that are supposed to be similar. Therefore, this descriptor indicates spectral similarities between avocado zones. This study highlights the existing spectral variability between fruit zones. Therefore, at least one scan must be performed per zone to capture the whole fruit variability as will be discussed in section on the multivariate analysis.
Calibration based on three spectra per zone
A first calibration was performed with all the spectra (90) and all DM values (90), and the best pretreatment is the first derivative (gap derivative, gap size = 5) and a correction detrending (second-order polynomial). PLS regression is used for calibration.
Table 2.
PLS equation performances for three spectra per zone.
Table 2.
PLS equation performances for three spectra per zone.
N |
R² calibration |
SEC |
RMSEC |
R² Validation |
RMSECV |
SECV |
# PLS |
90 |
0.743 |
0.329 |
0.328 |
0.522 |
0.4526 |
0.455 |
8 |
Figure 8.
Predicted DM versus laboratory DM calculated with PLS regression based on all data (n = 90).
Figure 8.
Predicted DM versus laboratory DM calculated with PLS regression based on all data (n = 90).
Calibration based on two spectra per zone
A calibration based on PLS regression with the same pretreatments was performed on 60 samples corresponding to two spectra per zone per fruit.
Table 3.
PLS equation performances for two spectra per zone.
Table 3.
PLS equation performances for two spectra per zone.
N |
R² calibration |
SEC |
RMSEC |
R² Validation |
RMSECV |
SECV |
# PLS |
60 |
0.653 |
0.384 |
0.382 |
0.284 |
0.577 |
0.582 |
7 |
In the learning step, the performances are quite similar (SEC, R²) to those observed for the calibration with the whole set of data. The performances decrease during validation (RMSECV = 0.577%).
Figure 9.
Predicted DM versus laboratory DM. PLS regression based on selected data (n = 60). Two spectra per zone per fruit.
Figure 9.
Predicted DM versus laboratory DM. PLS regression based on selected data (n = 60). Two spectra per zone per fruit.
This model was used to predict the set of remaining spectra (30) corresponding to one spectrum per zone per fruit.
Table 4.
Prediction parameter performances.
Table 4.
Prediction parameter performances.
N |
R² |
SEP |
RMSEP |
slope |
Bias |
30 |
0.478 |
0.479 |
0.474 |
0.55 |
-0.047 |
Figure 10.
Predicted values versus laboratory values for the 30 remaining spectra (one per zone per fruit).
Figure 10.
Predicted values versus laboratory values for the 30 remaining spectra (one per zone per fruit).
The SEP observed in the remaining 30 spectra is SEP = 0.479%, which is satisfactory regarding the average DM content of the fruit (24.65%). This result indicates that 2 spectra per zone could be enough to build an efficient and robust model for dry matter quantification.
Calibrations based on one spectrum per zone per fruit
Three calibrations based on PLS regression with the same pretreatments were performed on 30 samples corresponding to one spectrum per zone per fruit. One calibration was performed per replicate set.
Figure 11.
Predicted values versus laboratory values for 30 spectra (replicate n°1 per zone).
Figure 11.
Predicted values versus laboratory values for 30 spectra (replicate n°1 per zone).
Figure 12.
Predicted values versus laboratory values for 30 spectra (replicate n°2 per zone).
Figure 12.
Predicted values versus laboratory values for 30 spectra (replicate n°2 per zone).
Figure 13.
Predicted values versus laboratory values for 30 spectra (replicate n°3 per zone).
Figure 13.
Predicted values versus laboratory values for 30 spectra (replicate n°3 per zone).
Table 6.
The prediction parameter performance.
Table 6.
The prediction parameter performance.
Calibration |
Prediction set |
N |
R² |
SEP |
RMSEP |
slope |
Bias |
R1 |
R2_R3 |
60 |
0.578 |
0.424 |
0.420 |
0.588 |
-0.009 |
R2 |
R1_R3 |
60 |
0.530 |
0.454 |
0.450 |
0.614 |
-0.016 |
R3 |
R1_R2 |
60 |
0.166 |
0.777 |
0.782 |
0.479 |
-0.130 |
These models are used to predict the 3 sets of remaining spectra (30) corresponding to two spectra per zone per fruit.
The calibration using the third replicate performs worse than the two others in terms of prediction. Nevertheless, the average standard error of prediction observed using each subset of replicates is 0.551%, which is a promising result. Most likely, some of the spectra of replicate 3 are noisy, which degrades the performance of the model but justifies using at least one spectrum per zone to be representative and minimize the effect of "atypical" spectra.
Although it has been well documented by using destructive analysis that there is a Hass avocado DM gradient within the whole fruit, NIR applications for Hass avocado are focused on building PLS models with the NIRS spectra of the fruit equatorial zone. These models could have a bias that could affect the DM prediction, and the fruit shipped to different areas could have internal quality issues.
This work demonstrates that one calibration per replicate set has a good DM prediction ability according to the fit statistics shown in
Table 5. It is possible to predict the DM variability within a whole fruit with an NIRS scan per zone. The results are in agreement with those of other works that have used NIR spectroscopy to predict fruit quality and composition. Specifically, some studies have focused on the effect of different fruit orientations, such as the stem-calyx axis and equator, on the quality of acquired spectra [
24,
25,
26,
27]. The results of the research show that measurement orientation on spectra greatly affects the prediction accuracy of lignin, soluble solids, and acidity content in pears, peaches, kiwifruit, and apples, respectively.
PCA projections
NIRS users also have difficulties identifying and detecting samples that are considerably different from the majority of the remaining samples, which are known as outliers. Based on the literature and common NIRS practices, outlier data can be identified by projecting spectral data onto principal component analysis (PCA) and applying the Hotelling’s T
2 ellipse, as presented in
Figure 14. Data lying outside the ellipse are potential outliers [
28].
The projection of the spectra of replicates 1 and 2 per zone on the main plan (PC1 vs. PC2) calculated on the spectra of replicate 3 shows (
Figure 14) that 84% of the variance in the R1-R2 spectra is explained by the R3 spectra. Three spectra are outside the 95% confidence ellipse.
The same approach was performed by projection of the R2 and R3 replicates (60 spectra) on the subspace defined by calculating PCs from R1 replicate spectra (n = 30); 84% of the variance in the R1-R3 spectra is explained by the R1 spectra (
Figure 15). Three spectra are outside the 95% confidence ellipse: one from R2 and one from R3. The third spectrum is from R1, which indicates that this spectrum is an outlier in the R1 set (atypical spectrum).
The same approach was performed by the projection of the R1 and R3 replicates (60 spectra) on the subspace defined by the calculation of the PCs from the R2 replicate spectra (n = 30). Eighty-five percent of the variance of the R1-R3 spectra is explained by the R2 spectra. Two spectra are outside the 95% confidence ellipse: one from R1 and the other from the R2 set, which means that this spectrum is an outlier in the R2 set (atypical spectrum).
Figure 16.
Projection of the 60 spectra of the replicates 1 and 3 per zone on the main plain of the PCA calculated on 3 spectra of R2 (95% confidence ellipse). Fruit zone: B: base, E: equator, P: peduncle. F: Fruit.
Figure 16.
Projection of the 60 spectra of the replicates 1 and 3 per zone on the main plain of the PCA calculated on 3 spectra of R2 (95% confidence ellipse). Fruit zone: B: base, E: equator, P: peduncle. F: Fruit.
These results based on PCA models and projections show that spectral variability per area can be captured with a single spectrum. For the 3 models, less than 5% of the projected samples can be considered outliers with Hotelling’s T² distances close to the limit.