Introduction
Pear (Pyrus communis L.) is widely spread throughout temperate regions such as China, America and Australia (Silva et al., 2014). Pears are high in fibre, which can help to lower cholesterol. They are low GI, which makes them an excellent nutritious food (Amatya et al., 2012). China is a predominantly agricultural country, but compared with developed countries, the fertiliser and water utilisation rate is low (Han et al., 2018). Adjusting irrigation strategy is one of the essential tools for creating green and ecologically sustainable agriculture, which can also ensure the supply of agricultural products (He et al., 2022). Personal experience is still mainly used for watering. On the one hand, this can lead to over-irrigation and waste of water, as artificial experience cannot predict all situations. At the same time, excessive irrigation can also damage the pear tree itself, causing yield decline and other problems. On the other hand, since the pear is a regional plant, it is less extensive and widely planted than wheat and rice crops. Drought or water-deprivation can lead to serious issues. Therefore, the pear tree irrigation study needs more attention. In this paper, the main focus of the study is water stress detection.
Plant irrigation plays an important role in water stress. Water stress adversely impacts many aspects of plant physiology, particularly photosynthetic capacity (Osakabe et al., 2014). Once the stress is prolonged, the growth and productivity of plant is severely impacted. Moreover, water stress causes significant reductions in pear productivity. Water stress causes changes not only in leaf temperature and spectral emissivity but also in leaf and canopy water content, pigment content, and structure (Jones, 2004). LWC (leaf water content) reflects the absolute water content value (Li et al., 2018), which can be used to reflect the actual condition of plant growth more precisely. According to Jones (2004), stomatal conductance measured by a porometer is the most sensitive reference measurement of plant water stress induced by water deficit. However, this method is labour-intensive, time-consuming, and only provides point measurements (Wang, 2022).
In addition to emerging nondestructive techniques such as X-ray CT imaging (Costa et al., 2013) and Raman spectroscopy (Ahmed et al., 2018), near-infrared (NIR) spectroscopy (800-2500nm) is a method that is well-suited for characterising organic compounds, mainly in combination with multivariate mathematical techniques (Ambrose et al., 2016). When NIR light illuminates and transmits through an object, the energy of the incident electromagnetic wave changes because of the stretching and vibrations of chemical bonds such as O-H, N-H and C-H. Subsequently, the quality and quantity of an object can be evaluated indirectly, rapidly and without contact by analyzing the light reflectance and transmittance values (Ma et al., 2020). However, the conventional NIR approach collects spectral data from a single sample point; the simultaneous capturing of multiple targets remains a challenge. Hence, a time-efficient technique with high spatial resolution is required to apply NIR practically. NIR hyperspectral imaging (NIR-HSI) is such a technique: it provides a NIR spectral image at each wavelength (Li et al., 2019). It enables quality evaluation across an entire surface by indirectly analyzing the spatial distribution of molecular vibration information. In addition, many samples can be scanned and analyzed together by this advanced imaging technique. With such outstanding advantages, NIR-HSI has already been introduced to plant biotic and abiotic stress identification. Wei et al. (2021) conducted analysis of moisture content in tea leaves based on VNIR spectra. It was also used for orange spotting disease of oil palm (Golhani et al., 2019).
Plant leaves are the good medium for water stress detection. So far, many researchers have studied plant leaves in vitro using hyperspectral imaging. Liu et al. (2015) used hyperspectral imaging to predict nitrogen and phosphorus contents in citrus leaves. Liu et al. (2014) used hyperspectral imaging to simultaneously estimate Gannan navel orange leaves’ chlorophyll and water content. Wang et al. (2020) used hyperspectral imaging to rapidly detect the quality index of post-harvest fresh tea leaves. Also, Li et al. (2019) made use of an ASD Field Spec 4 spectrometer to determine the spectral reflectance of pear leaves in different periods to estimate the content of Fe in the leaves of pears. Li (2018) used hyperspectral data to estimate nutrient elements in the leaves and canopy of Pyrus sinkiangensis ‘Kuerlexiangli’. Liu et al. (2022) identified anthracnose and black spot of pear leaves on near-infrared hyper spectroscopy.
Machine learning and pattern recognition-based methods have succeeded in hyperspectral image analysis tasks. Many machine learning methods are well-suited for hyperspectral data (Gewali et al., 2018). They can automatically learn the relationship between the reflectance spectrum and the desired information while being robust against the noise and uncertainties in spectral and ground truth measurements (Giannoni et al., 2018). Wang (2022) did an inversion of nitrogen and chlorophyll content in crop leaves based on hyperspectral and Partial Least Squares Regression (PLSR). Ridge Regression Models was used to estimate grain yield from field spectral data in bread wheat (Triticum Aestivum L.) grown under three water regimes (Hernandez et al., 2015). Bayesian modelling of phosphorus content in wheat grain was conducted using hyperspectral reflectance data (Pacheco-Gil et al., 2023).
As there is no literature focusing on the plant leaf directly while the plants are growing, our study is significant because it can provide a timely and accurate monitoring system for farmers to take timely irrigation methods. Specifically, Poobalasubramanian et al. (2022) did a similar research to ours by using chlorophyll-fluorescence indices extracted via hyperspectral images; they tried to identify early heat and water stress in strawberry plants. However, unlike our study, which focuses on one leaf, their research considered all the canopies of leaves of one plant. Their research considered three water conditions: drought, normal and recovered state. In addition, the data analysis method they took was different from ours. In their study, eight chlorophyll-fluorescence indices were used to develop machine-learning models to determine the heat and water stress at the early stages of strawberries. The remote sensing analysis method was taken in their research. However, there are significant differences between species individuals at the leaf and canopy scales, which primarily affect the spectral characteristics of canopies composed of multiple leaves. Spatial variation of leaf water content is more challenging to measure using the presented method (Junttila et al.,2022). In order to overcome the challenge above-mentioned, our study decides to look at one leaf in the middle of the plant at one time instead of considering all the canopies of leaves of one plant, thus improving the accuracy and efficiency of plant water stress detection.
Also, our research makes an innovation by focusing on pear seedlings instead of the mature plant. Seedling is the most critical stage in the life history of plants, connecting intergenerational bonds in plants. The number of seedlings can reflect parent generations’ qualitative and quantitative characteristics and work as a prediction of group dynamics and evaluation trends for future species (Liu et al.,2017).
Convolution neural networks is popular for image feature extraction. Yalcin & Razavi (2016) proposed a Convolutional Neural Network (CNN) architecture to classify the type of plants from the image sequences collected from smart agro-stations. Ahmad et al. (2022) used CNN for feature extraction of plant leaf. CNN was also used for corn plant disease recognition (Guifen et al., 2019). This paper proposes a non-destructive real-time detection method using hyperspectral imaging to scan leaves from pear seedlings directly. Due to the fact that previous leaf water content detection methods can be destructive or costly, in addition, it would be tough for human eyes to identify the difference between regular leaves and leaves suffering from dryness before apparent wilting symptoms appear, therefore, the paper aims to combine hyperspectral imaging with machine learning models that can be used to identify pear seedling leaves under different water conditions in a real-time and non-destructive environment. We hypothesised that chemical changes in plant leaf cells during water stress generate changes in the reflectance profile in a particular spectral region. In our research, we use hyperspectral imaging (400nm-1000nm) to detect the difference in the water content in the leaf using machine learning methods.
Therefore, this study aims to demonstrate the effectiveness of hyperspectral imaging coupled with deep learning-based techniques Multiple Linear Regression Based on Ridge Regression, Bayesian linear regression and Elman Neural Network, and propose a practical plant-factory based model for detecting pear leaf water stress of young pear seedlings by focusing on one leaf directly from the plant at a time. Two types of information, main spectra selected using SPA and CNN features of main spectra were input into the neural networks. The overall classification accuracy of these three machine learning methods all reach an accuracy of over 70%. Hence, drought or oversaturated seedlings can be recovered in time without destroying the plant.
Materials and Methods
1. Material
QingzhenD3’ pear root-stocks with around 12 leaves per plant were grown in the National Agricultural and Forestry Science and Technology incubator seedling base in Zhucheng, China. Pear seedlings of similar growth were chosen. Pear seedlings were transported to greenhouses based at Qingdao Agricultural University. Two weeks after transplanting, all pear seedlings were treated with the complete nutrient solution and supplied with all the essential nutrients. A known amount of nutrient solution was provided to each plant using a trickle nozzle.
The temperature of the greenhouse was 25℃. The humidity of the greenhouse was around 95%. Each pot contained one seedling. The substrate was composed of peat and pastoral soil (1:1). The pastoral soil was procured from Qingdao Agricultural University and carefully sieved. The experiment took place between July and September 2021.
2. Experiment design
Treatment groups were established as the prediction models and simulate the differing water stress situations. There were three treatment groups: the excessive water treatment group, the drought group, and the control group-normal watering group. For each treatment, 30 pear seedlings were used for hyperspectral imaging collection. Specifically, there were 30 repetitions for each treatment group. A pre-experiment was conducted to conclude that the amount of water a pot of pear seedling needs was 10mL. The night before the experiment, the pear seedlings of the three treatment groups were watered thoroughly. Once the pear seedlings of the drought group were watered, no further watering was required during the experiment. The excessive water treatment group maintained water at a depth of 1.5cm, and the water was uniformly replenished at 6 pm. In addition, 10mL of water was added to pear seedlings in the regular watering group every day at 6 pm.
One leaf in the middle of each plant (usually the fourth leaf from the bottom) was employed for hyperspectral data collection (
Figure 1). According to the research method requirement, the middle leaf is relatively stable and representative of the whole plant. The acquisition time was 0 days, one day, three days, five days, and seven days after water treatment before apparent symptoms such as wilting and yellowing appeared on pear leaves. Images were taken every other day to ensure the internal environment of the plant was recorded consistently and accurately. On day 0, the starting point of water treatment, only 30 hyperspectral images were collected (10 for each treatment). From day 0, day 1, day 3, day 5, day 7, 30 images were gathered for each treatment. The sample collection period was from 9:00 am to 11:00 am.
Table 1 indicates the number of pictures collected. Three areas of interest were extracted from each hyperspectral cube, therefore, we have 765 areas of interest in total.
A workflow consisting of three phases was proposed, from data analysis to model development. In order to clearly demonstrate the experiment process, the workflow of this experiment is shown in
Figure 1.
Figure 1.
Workflow of this research.
Figure 1.
Workflow of this research.
Phase1: region of interest procured from the acquired hyperspectral image (with leaf as an example); Phase 2: data cube acquired after performing dimension reduction (SPA key wavelengths selection); Phase 3: three types of information input into the neural network; Phase 4: machine learning analysis of the input
The proposed workflow has four phases: region-of-interest (ROI) extraction, feature selection, three types of information input into the neural network (whole data cube, main spectral, and CNN feature extraction of key wavelength images), and machine-learning analysis. Region of interest (ROI) selection is usually used in the study to remove the irrelevant target regions. Researchers usually use the application ENVI to extract pixel points in ROI or average spectra of hyperspectral images as pixel-level or object-level data for analysis and processing (Kang et al.,2022). In phase 1, the region-of-interest leaf area was chosen using ENVI. In phase 2, feature wavelengths were selected using a successive projections algorithm (SPA). In the third phase, we procured three types of information, whole data cubes, key wavelengths obtained using SPA, and CNN features of images of key wavelengths. In the fourth phase, three machine-learning models were employed and compared to model plant water stress detection. The final output provided results in the early detection of dry, normal and overwatering plants from the feature wavelengths selected.
2.1. Hyperspectral image acquisition
Resonon imager PIKA L was used for image acquisition (
Figure 2). Resonon imaging spectrometers were line-scan imagers, meaning they collected data one line at a time. PIKA L covered the Visible and Near-Infrared (VNIR) spectral range of 400-1000nm. It had 281 spectral channels, with a spectral bandwidth of 2.1nm. The spectral resolution-FWHM was 3.3 nm, while the spatial pixels were 900. Its max frame rate (fps) was 249. The application SpectrononPro can collect data cubes with a Benchtop System. In addition, because the water condition of the live plants can be influenced by the outside environment, such as light, heat, etc., the water content of leaves is prone to change, which may affect the differentiation of leaves under different treatments. Therefore, the hyperspectral camera must be carefully calibrated as frequently as needed to obtain a valid data cube.
The lifting platform was placed on the moving stage. The vertical and horizontal directions were parallel to the corresponding sides of the moving stage. In the meantime, the height of the stage was adjusted so that the distance from the bottom edge of the light source to the whiteboard was about 35cm. The position of the stage was changed so that the inner side of the stage coincided with the longitudinal axis of the lens. And the horizontal axis of the lens was located at 1/3 of the whiteboard on the right. Next, the pear seedling was placed on the moving stage by adjusting the height and spatial orientation of the pear seedling. The target leaf was placed in the longitudinal centre of the whiteboard along the horizontal axis of the lens. It should be sure that the leaf was on the same horizontal plane as the whiteboard. During the entire experiment, the position of the lifting platform was kept fixed, to make sure of the clarity of the image.
The pear seedling leaf was kept on the black background of stiff paper and imaged to acquire the image. Image capture parameters used were 256 frames and 45 steps to get the best resolution image between 400 and 100nm. The exposure time used was 24ms-1. The parameters were kept consistent for all the photos. The resulting hyperspectral images was a particular block of 900×500×300 reflectance image, representing a 3-D image with X-axis and Y-axis coordinate information and the other representing the spectral information at 300 different wavelengths. This information was stored for subsequent analysis.
2.2. Image acquisition and image correction
The image acquisition was conducted in a dark room to avoid undesired light and at a controlled temperature and humidity of 20℃ and 65%, respectively. Hyperspectral datacubes were acquired in the 400-1000nm spectral range with 5nm intervals between contiguous bands. The acquired raw images were corrected with two reference images using the following Equation (1).
Where: R was the relative reflectance image of the sample, was the raw image of the sample, W was the white reference image acquired from a uniform, stable, and high reflectance ceramic tile (reflectance), and D was the current dark image acquired by completely covering the camera lens with its non-reflective opaque black cap (Wang et al., 2018).
2.3. Feature wavelength selection
Feature selection methods in HSI aim to reduce dimensionality while preserving relevant information for later classification (Wang et al., 2004). The selection of wavelength variables is integral to establishing a nondestructive testing model based on hyperspectral imaging. The model can be simplified by screening the characteristic wavelengths or wavelength ranges. Various feature selection techniques have been used to select important variables for spectroscopic data (Wang et al., 2018).
Feature extraction can reduce the dimension of spectral data and improve regression model performance. The successive projections algorithm (SPA) was used to obtain the optimum wavelength. The successive projection algorithm (SPA), a variable-selection technique, was proposed for constructing multivariate calibration models (Woldgiorgis et al., 2021). Later, it was extended to address classification problems (Kandpal et al., 2016). SPA is a one-way selection algorithm that mainly uses collinear minimisation to select the optimal variables (Arau ́jo et al., 2001). The advantage of this algorithm is that the variable group with the least redundant information can be chosen from more spectral variables. In addition, collinearity between variables within a variable group is minimised (Pontes et al., 2005). The SPA is executed in Matlab 2023 (The Math Works, Natick, USA) using a SPA toolbox (available at
http://www.ele.ita.br/~kawakami/spa/). The test_size is 0.4, m_min is 2, m_max is 28.
According to
Figure 3 and Figure 7 wavelengths were chosen, substituting the 300 wavelengths of the original sample. The optimal wavelengths selected were listed in
Table 2. Next, the eigenvectors of the image corresponding to the characteristic wavelength were extracted and put into neural networks for analysis. For these seven wavelengths, we obtain seven images. After applying CNN, for each image, we have 4096 features. After obtaining all features relevant to all the data points, we put into neural networks for analysis.
Different data input is shown in
Table 3. In the data input part, the height of each hyperspectral imaging data is 300, the total treatment samples is 30+90×4=390, after SPA major wavelength selection, 7 spectra were selected to represent the original 300 wavelengths. Since we use AlexNet for feature selection, we procured 4096 features for each HSI data cube. Different parameters of neural networks are shown in
Table 4.
According to
Table 4, for the RR-MLR, the regularization parameter (
gam) is 10, for the Bayesian Linear Regression, the covariance (
sigma_squared) is 0.01, for the Elman Neural Network, the maximum number of iterations (
epoch) is 2000, the target error (
goal) is 1e-5, and the learning rate (
lr) is 0.01.
3. Machine learning methods
The acquired hyperspectral data was analysed using machine learning methods. The data set was randomly divided into training dataset (80% of the hyperspectral data) and testing dataset (20% of the hyperspectral data).
Machine learning algorithms are effective for expressing complex relationships (Chang, 2007). Convolutional Neural Network (CNN) is used for feature extraction. Models such as Data Classification for Multiple Linear Regression Based on Ridge Regression, Bayesian Linear Regression, and Elman Neural Network are introduced in our research. These machine learning models are classic algorithms known for their relative high efficiency and accuracy, which can be used for image classification. In our research, we designed these three models under the same frame.
Multiple Linear Regression Based on Ridge Regression is a valuable technique for improving the performance and stability of Multiple Linear Regression models, particularly when dealing with multicollinearity and the risk of overfitting (Luo & Liu, 2017). It strikes a balance between bias and variance, making it a useful tool in many real-world regression problems (Matdoan et al., 2021). Regression models for predicting rice yield and protein content using unmanned aerial vehicle-based multispectral imagery was employed by Kang et al. (2021).
Bayesian Linear Regression offers several advantages over traditional frequentist linear regression methods: Bayesian Linear Regression allows to incorporate prior information or beliefs about the model parameters (Baldwin & Larson, 2017). Bayesian Linear Regression provides a full posterior distribution over the model parameters, not just point estimates (Kong et al., 2020). Bayesian methods can be effective when dealing with small datasets, as they naturally incorporate prior information, which becomes more influential when data is limited (Barbier et al., 2021). Bayesian model averaging was used to improve the yield prediction in wheat breeding trials (Fei et al., 2023).
Elman networks can be designed with varying degrees of complexity by adjusting the number of hidden units in the recurrent layer (Wang et al., 2021). This flexibility allows you to control the model’s capacity, making it suitable for both simple and complex tasks (Thilagaraj et al., 2021). Elman networks can handle input sequences of varying lengths (Sriram et al., 2018). This is advantageous when dealing with data where the number of time steps may change from one example to another. Elman neural network is used as a rapid prediction method of moisture content for green tea fixation (Lan et al., 2022).
3.1. Convolutional neural network (CNN) feature extraction
Convolutional neural networks, as a deep feedforward network, are commonly used to process multiple arrays of data, such as time series, images, and audio spectrograms (Chen et al.,2023). This application showed that CNN is capable of learning the features of the hyperspectral data well when properly trained. A typical convolutional neural network consists of a convolutional layer, a pooling layer, and a fully connected layer, each with a different function. The convolution layer convolves the input vector through the convolution kernel to generate a feature vector, and all units in the same feature value share the same filter. The pooling layer is located behind the convolutional layer and is divided into maximum pooling and average pooling, which compresses the input feature information and simplifies the computational complexity of the network (Gu et al.,2018).
The spectral information is calculated by the convolutional layer and the maximum pooling layer (Li et al.,2021). The network model effectively extracts the local and global features that cannot be directly obtained from the original spectral data and convert the extracted feature inputs into vectors by the global average pooling layer (Yamashita et al.,2018). The fully connected layer is used to convert the inputs into vectors and to realise the classification function (O’Shea & Nash, 2015).
The structural system of the convolutional neural network is mainly composed of two parts, namely, the feature extractor and the classifier (Albawi et al.,2017). The feature extractor usually consists of a stack of several convolutional layers and a maximum pooling layer, and the classifier is usually a fully connected softmax layer (Yang & Li,2017).
At a convolution layer, the previous layers’ feature maps are convolved with learnable kernels and put through the activation function to form the output feature map. Each output map may combine convolutions with multiple input maps. In general, we have that
Where represents a selection of input data, and the convolution is of the “valid” border handling type when implemented in MATLAB (Bouvrie,2006). Each output map is given an additive bias b, however for a particular output map, the input maps will be convolved with distrinct kernels. That is to say, if output map j and map k both sum over input map i, then the kernels applied to map i are different for output maps j and k.
3.2. Baseline model
3.2.1. Data Classification for Multiple Linear Regression Based on Ridge Regression
In Hoerl & Kennard (1970) it was shown that the optimal value of
k, i.e., the value that minimizes the mean square error (MSE), is the following:
Based on the results obtained from the optimal value of
k in that same article the authors sugested the following estimator of the ridge parameter:
Where
=
and
are the residuals obtained from the OLS regression and
is the maximum element of
. Hence, for this estimator,
and
are simly replaced by their unbiased estimators. Further developments were then made by Kibria (2003) where the following estimators were proposed:
and
Where .
Khalaf and Shukur (2005) suggested a new method to estimate the ridge parameter
k, as a modification of
:
Where
is the maximum eigenvalue of
. Using the same idea as in Kibria (2003), Khalaf & Shukur (2005), and Alkhamisi et al. (2006) we have the following estimators for
k:
and
Where
= .
In this article, we propose a modification of all of these estimators by multiplying them by the amount (10). The eigenvalues of the matrix of cross-products equal to one when the regressors are independent.
To investigate the performance of the RR and OLS, we calculate the MSE using the following equation:
Where is the estimator of β obtained from OLS or RR, and R equals 2000 which corresponds to the number of replicates used in the Monte Carlo simulation (Khalaf et al., 2013).
3.2.2. Bayesian linear regression
Provided a dataset
is the input variable, is the corresponding target value, N is the number of data samples (Kong et al., 2020).
Regression aims at providing a specific predictive value
given the input variable
. Form of linear regression mainly contains two types: standard linear model and kernelized model, which are given by Equations (12), (13), respectively (Bishop, 2006). The standard linear model is a linear combination of the elements of the input variable while the kernelized model is a linear combination of a set of nonlinear functions of the input variable.
where is the ith element of the weight vector w, is the ith element of the input variable , is the kernel function (Tipping, 2003).
The above two regression models can be uniformly expressed by:
where , is the ‘design’ matrix, =y()is referred to as ‘basis function’.
is the standard linear model under the conditions of M=d and .
is the kernelized model under the conditions of M=N and
3.2.3. Elman Neural Network
The network is a single recursive network that has a context layer as an inside self-reference layer, see
Figure 6. During operation, both current input from the input layer and previous state of the hidden layer saved in the context layer activate the hidden layer. Note that there exists an energy function associated with the hidden layer, context layer, and input layer (Liou, 2006). With successive training, the connection weights can load the temporal relations in the training word sequences.
Figure 6.
the Elman network (Liou & Lin, 2006).
Figure 6.
the Elman network (Liou & Lin, 2006).
The context layer carries the memory. The hidden layer activates the output layer and refreshes the context layer with the current state of the hidden layer. The back-propagation learning algorithm (Rumelhart & McClelland,1986) is commonly employed to train the weights in order to reduce the difference between the output of the output layer and its desired output. In this study, the threshold value of every neuron in the network is set to zero. Let and be the number of neurons in the output layer, the hidden layer, the context layer, and the input layer, respectively. In the Elamn network, is equal to . In this study, the number of neurons in the input layer is equal to that in the output layer and is also equal to the number of total features, that is, R= (15)
Let
be the code set of different words in a corpus. The corpus,
D, contains a collection of all given sentences. During training, a sentence is randomly selected from the corpus and fed to the network sequentially, word by word, starting from the first word of the sentence. Let |
D| be the total length of all the sentences in the corpus,
D. |
D| is the total number of words in
D. Usually, |
D| is several times the number of different words in the corpus, so |
D|>N. Initially,
t=0, and all weights are set to small random numbers. Let
w(t) be the current word in a selected sentence at time
t, i.e.,
where
is the last word of a training epoch. In this study, we set T=4|
D| in one epoch. This means that in each epoch, we use all the sentences in the corpus to train the Elman network four times. Let the three weight matrices between layers be
and
, where
is an
by
matrix,
is an
by
matrix, and
is an
by
matrix, as shown in
Figure 6. The output vector of the hidden layer is denoted as
when
is fed to the input layer.
is an
by 1 column vector with
elements. Let
be the output vector of the output layer when
is fed to the input layer.
is an
by 1 column vector.
The function of the network is
Where is a sigmoid activation function that operates on each element of a vector (Rumelhart & McClelland,1986). We use the sigmoid fucntion for all neurons in the network. This function gives a value roughly between 1.7159 and 1.7159.
4. Performance metrics
4.1. Confusion matrix
Performance measurement is vital in defining the effectiveness of a program. Confusion matrixes are a common evaluation tool used in machine learning (An, 2020). Generally, they consist of a n n table plotting actual class against predicted class (n denoting the number of classes, so a binary classifier would utilize a 2 2 table), to which the true and false (determined by the actual classes) positives and negatives (determined by the predicted classes) fit within (Visa et al., 2011).
4.2. Accuracy
In practical applications, we should take the accuracy of the classifier into consideration. Because scientists and farmers are more concerned with the situations where the classifier sorts the drought pear seedling leaves as sound ones if the classifier makes the wrong decision, which will hinder the timely watering of the plants, leading to more significant potential economic losses than discarding the plants.
Accuracy (total correct divided by the total number of assessments), however, does not consider the significance of misidentified class (Halimu et al., 2019) and tends to be an overaly optimistic performance indicator.
4.3. evaluation metrics for classification algorithms
Commonly used evaluation metrics for classification algorithms include recall, precision, F1 score and mse-loss. Recall measures the ability to identify positive samples, precision measures the accuracy of positive sample predictions, and F1 score is a metric that combines recall and precision. MSE-loss is a criterion that measures the mean squared error between each element in the input and target.These metrics can be selected and weighted based on specific requirements. The formulas for these metrics are provided accordingly.
Where TP is true positive, TN is true negative, FP is false positive, FN is false negative, P is precision, and R is recall (Shu et al., 2023).
All hyperspectral data processing was conducted in ENVI and statistical analysis were executed in MATLAB 2023. All experiments were performed under a Windows 10 OS on a machine with CPU Intel Core i7-7820HK @ 2.90 GHz, GPU NVIDIA GeForce 1080 with Max-Q Design, and 8GB of RAM.
4. Result
Three types of information, the whole data cubes, key wavelengths selected using SPA and CNN features of main spectra were input into the different neural networks. To avoid over-fitting problem, we obtained the accuracy using 10-fold cross validation which uses 9/10 of data as for training the algorithm and the remaining for testing puprpose and repeats the process 10 times and get the average result. 4096 features were extracted using CNN neural network for each main spectra. In the CNN features analysis, We compared the classification results of the Multiple Linear Regression based on Ridge Regression (a), Bayesian linear regression (b), and Elman Neural Network (c) (Figures 7–9).
Figures 7–9 illustrates the confusion matrix representing the classification results of the three neural networks in the context of main spectra analysis and CNN feature from images of key wavelengths, both for the training and test datasets. For the training dataset, all three neural networks have accurately categorized the samples into their respective groups.
Whole data cube (a.1), main spectra analysis (a.2) and CNN feature of main spectra analysis (a.3) in Multiple Linear Regression based on Ridge Regression (a); 1 stands for the normal group, 2 stands for the overwatering group and 3 stands for the drought group
According to
Figure 4, in the whole data cube analysis, in the confusion matrix for train data in RR-MLR, 178 out of 247 normal samples were correctly identified; but 30 overwatering samples were wrongly treated as normal, 25 overwatering samples wrongly treated as drought (45.8%). In the confusion matrix for test data, 56 out of 83 drought samples were correctly identified as drought(67.5%). Whereas in the main spectra classification, 33 overwater samples were wrongly identified as normal, and 61 overwater samples wrongly treated as drought (78.3%). In the test data, 76 out of 95 samples were correctly identified as normal (80%). In the CNN feature analysis, during the testing phase of Multiple Linear Regression based on Ridge Regression, out of 81 samples, 68 were correctly identified as belonging to the normal treatment group. However, there were misclassifications, where 11 samples from the overwatering group and 2 samples from the drought group were wrongly categorized into the normal treatment group.
whole data cube (b.1), main spectra analysis (b.2) and CNN feature of main spectra analysis (b.3) in Multiple Linear Regression based on Bayesian linear regression (b); 1 stands for the normal group, 2 stands for the overwatering group and 3 stands for the drought group
In the whole data cube analysis (
Figure 5), in the train data of Bayesian linear regression, 180 out of 261 samples were correctly identified as normal (69%), while 51 out of 120 samples were correctly identified as overwatering (57.5%). In the test data, out of 51 overwatering samples, 13 were correctly identified (25.5%). By comparison, in the main spectra analysis, 203 out of 223 normal water samples were correctly identified (91%), but 105 drought samples were wrongly taken as normal and 1 drought sample wrongly taken as overwatered (45.1%). In the test data, only 1 overwater sample was correctly pinpointed, but 33 overwater samples and 17 overwater samples were wrongly taken as normal and drought respectively (98%). In the CNN feature analysis, in the testing dataset of Bayesian linear regression, 50 out of 55 overwatering samples were correctly identified and classified as belonging to the overwatering group. However, misclassifications existed, where 3 samples from the normal group and 2 samples from the drought group were wrongly identified and categorized as belonging to the overwatering group.
Figure 5.
Confusion matrix of training and testing set of data.
Figure 5.
Confusion matrix of training and testing set of data.
Figure 6.
Confusion matrix of training and testing set of data.
Figure 6.
Confusion matrix of training and testing set of data.
whole data cube (c.1), main spectra analysis (c.2) and CNN feature of main spectra analysis (c.3) in Multiple Linear Regression based on Elman Neural Network (c); 1 stands for the normal group, 2 stands for the overwatering group and 3 stands for the drought group
In the whole data cube analysis of Elman Neural Network (
Figure 6), 187 out of 223 normal samples were correctly identified (83.9%), but 8 normal samples were wrongly identified as drought, and 28 wrongly identified as normal. By comparison, in the test data, 17 overwatering samples were correctly identified, but 11 overwatering samples were taken as normal and 23 overwatering samples were taken as drought (66.7%). In the confusion matrix of train data, 172 out of 223 samples were correctly identified as normal (77.1%), but 58 drought samples were wrongly taken as receiving normal treatment. In the test dataset, 80 out of 95 samples were correctly identified as normal (84.2%), whereas 13 overwater samples and 23 overwater samples were taken as normal and drought respectively (70.6%). In the CNN feature analysis, in the testing dataset of the Elman Neural Network, 54 out of 61 drought samples were correctly detected and classified as belonging to the drought group. However, there were misclassifications, where 4 samples from the normal group and 3 samples from the overwatering group were wrongly identified and categorized as belonging to the drought group.
Table 5 shows the performance metrics of each machine learning methods and different inputs. According to the study, the performance of key wavelengths after SPA performed worse than the whole data cube, that is because the height of each hyperspectral cube is decreased drastically after the major wavelength selection. In the CNN feature of key wavelength image analysis, in the test dataset, Multiple Linear Regression based on Ridge Regression achieved the lowest accuracy at 82.69%, while Bayesian Linear Regression had the highest test accuracy at 87.98%. And the performance of CNN feature of key wavelength image analysis all outperformed the machine learning analysis of main spectra. In CNN feature analysis of Bayes and Elman, both F1 score is 0.87. Elman also has the precision of 0.87. Bayes has the highest recall rate of 0.91. Details of the training and test accuracy of the three neural networks used for hyperspectral imaging classification is shown in
Figure 8 in
Appendix A.
4. Conclusion and Discussion
As can be seen from the experiment result, the hyperspectral imaging classification result shows differences among groups under different water treatments. Therefore, the hyperspectral imaging tool with these three machine learning models can effectively differentiate leaves with various water stresses while the young seedlings are still growing. Timely measures can be taken to avoid irreparable loss.
We used hyperspectral imaging coupled with machine learning to identify pear seedling leaves under different water stresses because of its ability to automatically learn the spatiotemporal features without handcrafting and thus achieve high classification accuracy. RR-MLR, Bayes and Elman were used to classify the pear seeling hyperspectral images. According to our research, CNN features of main spectra outperformed mere spectral features in the machine learning algorithms.
Besides, our research confirms the findings of Zhao et al. (2020). Zhao et al. (2020) demonstrated the strong potential of using HSI technology for tomato leaf water status monitoring in plant factories. This experiment’s best leaf water content assessment model was attained using the normalised difference vegetation index (NDVI) with individual raw relative reflection (RAW) wavelengths and 1300 nm and 1310 nm wavelengths. Previous studies have shown that the NIR plateau between 800 nm and 1300 nm and water absorption bands above 1300 nm are common characteristics of reflectance spectra of all healthy green plants. The high reflectivity in the NIR region of 800 nm to 1300 nm is due to the porous plant leaf structure (tissues and cells) (Zhao et al., 2018). In addition, the reflectivity of the absorption band at 970 nm was significantly affected by factors such as water (Li et al.,2021).
Our study fills the gap by integrating hyperspectral imaging with machine learning models for differentiating plant leaves under water stress. This innovative approach offers the potential for enhanced accuracy and robustness in water stress detection, contributing to improved water management strategies and optimized plant growth in agricultural practices. The concept of employing Convolutional Neural Networks (CNNs) for the feature extraction from images of primary spectra represents an innovative and unexplored approach in the realm of image processing and spectral analysis. This method, pioneering in its application, leverages the advanced capabilities of CNNs to discern and extract key features from spectral images, a technique not previously attempted in this field.
In our research, a more significant number of hyperspectral cubes could be collected to improve the accuracy of our experiments. Furthermore, the methodology presented in this paper can be implemented in a plant water condition monitoring system. This plant water condition monitoring system will be tested in pots and field experiments, guiding farmers on irrigation problems. Overall, the outcomes of this research will play a crucial role in advancing precise water management practices. By accurately detecting water stress conditions and understanding the temporal dynamics of plant responses, we can implement more effective and sustainable water management approaches to optimize plant growth and yield while conserving water resources.