Preprint
Review

Deep Learning and Machine Learning for Automatic Grapevine Varieties Identification: A Brief Review

Altmetrics

Downloads

319

Views

230

Comments

0

A peer-reviewed article of this preprint also exists.

Submitted:

02 May 2024

Posted:

06 May 2024

You are already at the latest version

Alerts
Abstract
The Eurasian grapevine (Vitis vinifera L.) is the most widely grown horticultural crop in the world and is important for the economy of many countries. In the wine production chain, grape varieties play an important role as they directly influence the authenticity and classification of the product. Identifying the different grape varieties is therefore fundamental for quality control and inspection activities, as well as for regulating production. Currently, ampelography and molecular analysis are the main approaches to identifying grape varieties. However, both methods have limitations. Ampelography is subjective and prone to errors and is experiencing enormous difficulties as ampelographers are increasingly scarce. On the other hand, molecular analyses are very demanding in terms of cost and time. In this scenario, Deep Learning (DL) and Machine Learning (ML) methods have emerged as a classification alternative to deal with the scarcity of ampelographs and avoid molecular analyses. In this study, the most recent and current methods for identifying grapevine varieties using DL classification-based approaches are presented through a systematic literature review. The classification pipeline of the 31 studies found in the literature was described, highlighting its pros and cons. Most of the studies used DL-based models trained with leaf images acquired in a controlled environment at a maximum distance of 1.2 metres to classify grape varieties. In addition, there is a large gap between practical applications and the datasets used: a great lack of varieties, limited data acquired in the field and a lack of tests on plants under adverse conditions. Potential directions for improving this area of research were also presented.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

The Eurasian grapevine (Vitis vinifera L.) holds the title of being the most extensively cultivated and economically significant horticultural crop globally, being cultivate since the ancient times [1]. Due to its substantial production, this crop plays a crucial part in the economies of many countries [2]. The fruit is important because it can be used to consumption and also to the production of wine. The number grape varieties present in the world is unknown, but specialists estimate it to be around 5000 to 8000, under 14000 to 24000 different names [3,4,5]. Despite this huge number, only 300 or 400 varieties account most of the grape plantings in the world [4]. The most common varieties are Kyoho, Carbenet Sauvignon, Sultanina, Merlot, Tempranillo, Airen, Chardonnay, Syrah, Red Globe, Grenache Noir, Pinot Noir, Trebbiano Toscano [6].
The grape variety plays an important role in the wine production chain and in the leave consumption, since in some cases the they can be more costly than the fruit [7,8]. Wine is one of the most popular agri-foods in the four corners of the world [9]. In 2019, the European Union accounted for 48% of world consumption and 63% of world production [10]. In terms of value, the wine market share totalled almost 29.6 billion euros in 2020, despite the Covid-19 pandemic crisis [10]. The varieties used on the production of the drink directly influences its authenticity and classification, and due to its socioeconomic importance, identifying grape varieties became an important part of the production regulation. Furthermore, recent results achieved by Jones and Alves [11] highlighted that some varieties can be prone to wamer environments, in the context of climate changes, accentuating the need of tools for grapevine variety identification.
Nowadays the identification of grapevine varieties is carried out mostly using ampelography or molecular analysis. Ampelography, defined by Chitwood et al. [12] as "the science of phenotypic distinction of vines", is one of the most accurate ways of identifying grape varieties through visual analysis. Its authorised reference is Precis D’Ampelographie Pratique [13]; however, it uses well-defined official descriptors provided in the identity of the plant material for grape identification [14,15]. Despite it wide utilisation, Ampelography depends on the person carrying it out, as with any visual analysis task, making the process subjective. It can be exposed to interference from environmental, cultural and genetic conditions, introducing uncertainty into the identification process [14,16]. It can be time-consuming and error-prone, just like any other human-based task, and ampelographers are becoming scarce [17].
Molecular markers is another technique that has been used to identify grape varieties [17]. Among the used markers, random amplified polymorphic DNA, amplified fragment length polymorphism and microsatellite markers have been used in the grape variety identification [17]. This technique makes it possible to deal with subjectivity and environmental influence. However, it must be complemented by ampelography due to leaf characteristics that can only be assessed in the field [3,18,19]. In addition, the identification of grape varieties with a focus on production control and regulation would involve several molecular analyses, increasing the costs and time required.
With the advance of computer vision techniques and easier access to data, several studies have emerged with the aim of automatically identifying grapevine varieties. Initially, they were based on classic machine learning classifiers, e.g. Support Vector Machines, Artificial Neural Networks [20], Nearest Neighbour algorithm [21], Partial least squares regression [22], and using manually or statically extracted characteristics, e.g. indices, or the data directly. However, in 2012, with the advent of Deep Learning (DL), more specifically the study by Krizhevsky et al. [23], computer vision classifiers became capable of reaching or, in some cases, surpassing human capacity. Lately, transfer learning and fine-tuning approaches have allowed these models to be applied to many general computer vision tasks, such as object detection, semantic segmentation and instance segmentation, and in other research domains, for example precision agriculture and medical image analysis. The automatic identification of grapevine varieties has followed this lead, and most studies now use DL-based classifiers in their approaches.
In this study, recent literature on the identification of grapevine varieties using ML and DL-based classification approaches was reviewed. The steps of the computer vision classification process (data preparation, choice of architecture or feature extraction and classifier selection, training and model evaluation) were described for 31 studies found in the literature, highlighting their pros and cons. Possible directions for improving this field of research are also presented. To the best of our knowledge, there are no studies in the literature with the same objective. However, this study may have some intersection with Chen et al. [24], which aimed to review studies that used deep learning for plant image identification. Besides, Mohimont et al. [25] reviewed studies that used computer vision and DL for yield-related precision viticulture tasks, e.g. flower counting, grape detection, berry counting and yield estimation, while Ferro and Catania [26] surveyed the technologies employed in precision viticulture, covering topics ranging from sensors to computer vision algorithms for data processing. It is important to emphasise that the explanation of computer vision algorithms is already widespread in the literature and will not be covered in this study. One can refer to Chai et al. [27] and Khan et al. [28] for advances in the field of natural scenes, or Dhanya et al. [29] for developments in the field of agriculture.
The remainder of this article is organised as follows. In Section 2, the research questions, inclusion criteria, search strategy and extraction of the characteristics of the selected studies are described. Then, in Section 3, the results are presented, highlighting the approach used in the stage of creating the DL-based classifier. In Section 4, a discussion around the selected studies is presented, focussing on the pros and cons of the approaches used and also introducing techniques that can still be explored in the context of identifying grapevine varieties using DL-based methods. Finally, in Section 5, the main conclusions are presented.

2. Methods

2.1. Research Questions

The following research questions (RQ) were used to substantiate this review:
  • (RQ1) How have ML and DL-based techniques been used for the automatic identification of grapevine varieties?
  • (RQ2) What are the best architectures for the automatic identification of grapevine varieties?
  • (RQ3) What are the main challenges and future development trends in identifying grape varieties using ML and DL-based models?

2.2. Inclusion Criteria

The following four inclusion criteria were used to conduct the study: (1) studies written in English; (2) studies that used a machine learning or DL-based approach to identify different vine varieties; (3) conference papers or journal articles; and (4) studies published between 2018 and 2024. The search was limited to 2018 because the field of ML applied to grapevine variety identification research has shown a considerable increase in the number of publications over the last six years. Figure 1 shows the number of studies returned from the search carried out for two different search engines (detailed in Section 2.3) since 2015.

2.3. Search Strategy

Using a solid approach, Scopus (Elsevier B.V., Amsterdam, Netherlands) and Web of Science (WOS) (Clarivate Analytics, London, UK) were used to query the studies. The details of the search are shown in Table 1, while the number of studies retrieved per year is shown in Figure 1. The search was carried out on April 22, 2024. In addition to this base, a search in Google Scholar, in the same date, was also performed, and two studies were added.

2.4. Result Filtering

The initial queries produced 154 results in Scopus and 98 in WOS, applying the filters for year and types of studies (defined in Section 2.2). Two articles from Google Scholar were also added. Next, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) framework was applied, as shown in Figure 2. The results were filtered by removing duplicate studies and others that were not relevant to the aim of the review, after analyzing the title and abstract. As a result, 223 studies were excluded and 31 were included in the review.

2.5. Extraction of Characteristics

For a better understanding of the context in which the studies were carried out, various characteristics were collected and organised in Table 2, Table 3 and Table 4. For better visualisation, the studies that used DL techniques are grouped in Table 3 and Table 4, and those that used ML are detailed in Table 2. For the DL studies, the year of publication, location, description of the dataset used, the part of the vineyard it focussed on, the architecture used and the results are shown. For the ML studies, the year of publication, location, description of the dataset used, the part of the vineyard focused on, the feature extractors and classifiers used and the results are presented. The results are provided using the F1 Score, Accuracy (Acc) or Area-Under-The-Curve (AUC) metrics, in that order of preference. Only the best result obtained is presented. It is important to note that the description may include the total number of related images, taking into account the samples generated. In addition, in the focus field, "Leaves" and "Fruits" refer to images taken a maximum of 1.2 metres from the plants or the detached leaf(s) or fruit(s).

3. Results

As shown in Table 2, Table 3 and Table 4, 30 studies were identified from the selected sources. Figure 3 shows a graph comprising the countries of origin of the datasets, the years and the focus of the selected studies. The majority of the studies were published in 2021, and most of the datasets used have Portugal as their source of localisation. Therefore, this field of research has been active these days, especially in countries where the grape cultivar is economically relevant. Furthermore, most studies have focused on the leaves to identify grape varieties.
Figure 4 shows the occurrences of keywords generated for the studies included using VOSViwer [59]. It can be seen that the number of DL-based approaches exceeds the number of ML-based approaches published in the literature over the last 6 years (ML = 7 versus DL = 24).
All the selected studies followed the classic process of training computer vision classifiers in their method. This process can be seen in Figure 5. First, the data is acquired and prepared to train the classifiers. Next, pre-processing steps are applied to the data to increase the quality of the classification. Then, architectures are selected or created for DL methods, or feature extractors and classifiers are selected for ML methods, and subsequently trained on the data. In the final stage, these classifiers are evaluated. To better understand the different approaches used in the different stages, the pipeline will be followed to guide this discussion.
The datasets and benchmarks used in the studies will first be presented, followed by details of the approaches used in the pre-processing phase. Next, the architecture, or feature extractors and classifiers, and the training process adopted will be examined. Finally, the metrics and explanation techniques used to evaluate the studies will be described.

3.1. Datasets and Benchmarks

Details of the datasets used by the studies included in this review are presented in Table 5, Table 6 and Table 7.
RGB images are the main way of identifying grapevine varieties in the studies included, centred on leaves, fruit and seeds. On the other hand, spectra, hyperspectral images (HSI) and the 3D point cloud have also been used as target data. Of the studies that have used RGB images, most have used datasets acquired in a controlled environment, although classifiers trained with images acquired in a controlled environment may be of limited use. Furthermore, another disadvantage is that controlled environment-based techniques are generally invasive, requiring the leaf to be removed from the plant. Similarly, studies that have used spectral signatures, through spectrometers or hyperspectral cameras, and 3D point clouds have also centred on data acquisition in a controlled environment. All the datasets that used RGB data acquired images at a maximum distance of 1.2 metres. Images acquired in the field were prone to secondary leaves and the presence of unrelated information, e.g. soil, sky, and human body parts. For the purposes of simplification, in the rest of this text RGB images will be referred to as images.
Similar to other fields of research, only a few studies provided their datasets. Peng et al. [54] and Franczyk et al. [55] used the grape instance segmentation dataset from Embrapa Vinho (Embrapa WGISD) [61]. This dataset consists of 300 images belonging to 6 grape varieties. Koklu et al. [7] provided the dataset used and, more recently, studies [30,39,40,41,42,46] have followed and explored the same dataset. This dataset consists of 5 classes and was acquired in a controlled environment, resulting in 500 images. De Nart et al. [37], did not provide their dataset, although they tested their approach using the Vlah [62] dataset. Vlah [62] organised a dataset made up of 1009 images distributed over 11 varieties. Other datasets have also been proposed in the literature. Al-khazraji et al. [63] proposed a dataset with 8 different grape varieties acquired in the field. Sozzi et al. [64] proposed a dataset for cluster detection, but which can be used to identify 3 different varieties. In the same vein, Seng et al. [65] presented a dataset with images of fruit at different stages of development that is made up of 15 different varieties. Table 8 summarises all the publicly available datasets that, as far as we know, can be used to train and evaluate DL models with the aim of classifying different grape varieties. Figure 6 shows examples of images for each publicly available dataset.
In the studies that provided the acquisition period, most used data obtained over a short period of time (less than a month). Carneiro et al. [44] and Carneiro et al. [45] used the most representative datasets, in terms of time, to identify grape varieties. It should be noted that since grapes are seasonal plants, it is very important to represent different periods of the season in the dataset in order to capture the different phenotypic characteristics of the leaves over time. Seasonal representation in the dataset directly implies the classifier’s ability to generalise.
In addition, Magalhães et al. [43] and Fuentes et al. [36] were the only studies that were concerned with the position of the leaves used in the classification. Magalhães et al. argue that leaves from nodes between the 7th and 8th should be used, as they are the most representative in terms of phenotypic characteristics [66], while Fuentes et al. used mature leaves from the fifth position. De Nart et al. [37], considered the age of the leaves in their study, excluding samples that were too young or too old, and health, discarding unhealthy leaves. In contrast, Garcia et al. [31] considered the age of the plant, acquiring samples from plants over a year old. Other characteristics, water stress and nutrition, were not considered by the included studies.
Given that DL-based classification models are prone to overfitting, most models have used data augmentation techniques to improve the quality of the data used during training, contrasting with the ML-based classifiers. Rotations, reflections and translations are the main modifications applied to images. Furthermore, these modifications are generally only applied to the training subset. Carneiro et al. [44] and De Nart et al. [37] tested different techniques for data augmentation. Carneiro et al. [44] concluded that offline geometric augmentations (zoom, rotation and flips) led the models to better classification.
Furthermore, it is clear that the datasets consisted mainly of a limited number of varieties, in contrast to the estimate made by experts, which involved at least 5000 varieties. Figure 7 shows a rank of the 30 most used varieties in the included studies. It is notable that the origin of the data used in the studies and the availability of public datasets directly influence the rank. Touriga Nacional is one of the most representative varieties planted in Portugal (source of 33% of the data in the included studies), while Ak, Ala Idris, Bozgüzlü, Dimnit and Nazlı are the varieties presented in the most widely used publicly available dataset [7]. Among the most common varieties planted around the world, Carbenet Sauvignon, Tempranillo (aka Tinta Roriz), Merlot, Chardonnay, Syrah and Pinot Noir are present in the rank. Also, one must consider that the same grape variety can be called by a synonym, for example, in Portugal, Tempranillo is known as Tinta Roriz [67].

3.2. Pre-Processing

In the image context, Liu et al. [51] used the complemented images in training, so that each colour channel in the resulting image was the complement of the corresponding channel in the original image. Pereira et al. [58] tested many types of pre-processing: fixed-point FastICA algorithm [68], canny edge detector [69], greyscale morphology processing [70], background removal with the segmentation method proposed by Pereira et al. [71], and the proposed four-corners-in-one method. The FastICA algorithm is an independent component analysis method based on kurtosis maximisation that was applied to blind source separation. The idea behind applying independent component analysis (ICA) to images is that each image can be understood as a linear superposition of weighted features and then ICA decomposes them into a statistically independent source base with minimal loss of information content to achieve detection and classification [72,73]. Unlike ICA, grey-scale morphological processing is a method of extracting vine leaves based on classical image processing. Firstly, the image is transformed into greyscale, based on its tonality and intensity information. Next, morphological greyscale processing is applied to remove colour overlap in the leaf vein and background. Linear intensity adjustment is then used to increase the difference in grey values between the leaf vein and its background. Finally, the Otsu threshold [74] is calculated to separate the veins from the background, and detailed processing is carried out to connect lines and remove isolated points [70].
The next method used by Pereira et al. [58] was proposed in Pereira et al. [71] and is also based on classical image processing. This study presents a method for segmenting grape leaves from fruit and background in images acquired in field. This approach is based on growing regions using a colour model and thresholding techniques and can be separated into three stages: pre-processing; segmentation; and post-processing. In pre-processing, the image is resized, the histogram is adjusted to increase contrast and then the resulting image and the raw image are converted to the hue, saturation and intensity colour model, and the original image is also converted to the CIELAB (L*a*b*) colour model. In the segmentation phase, the luminance component of the raw image (L*) is used to detect the shadow regions, then the shadow and non-shadow regions are processed, removing the marks and the background with different approaches for each. Finally, in the post-processing step, the method fills in small holes using morphological operations. These holes are usually due to the presence of diseases, pests, insects, sunburn and dust on the leaves. The method achieved 94.80% average accuracy. Finally, the same authors also propose a new pre-processing method called four-corners-in-one. The idea is to concentrate all the non-removed pixels in the north-west corner of the image, after segmenting the leaves of the vines in an image; then a left-shift sequence is performed followed by a sequence of up-shift operations on the coloured pixels in the image. This algorithm is replicated for the other three corners. According to the authors, this method obtained the best classification accuracy in the set of experiments carried out.
Carneiro et al. [47] and Carneiro et al. [45] evaluated the use of segmentation models to remove the background from images acquired in the field before classification using DL. Both studies applied a U-Net [75], and Carneiro et al. [47] also tested SegNet [76] to segment the data before classification. The results show that performance can be reduced if secondary leaves are removed and the model trained with the segmented leaves pays more attention to the centre leaves. Abbasi and Jalal [30], Garcia et al. [31] and Marques et al. [34] did the same for ML-based algorithms, although with the aim of separating better leaf regions to extract better features. They used K-Means, an unspecified thresholding technique, and Otsu thresholding, respectively.
Doğan et al. [40] used Enhanced Super Resolution Generative Adversarial Networks (ESRGAN) [77] to increase the resolution of the images, after decreasing it, so that the method could work as a generator of new samples. The idea is to apply a Generative Adversarial Network [78] to recover a high-resolution image from a low-resolution one. The authors applied this approach as a data augmentation technique, decreasing the resolution of the images in the dataset and increasing it again using ESRGAN, so that these new images were considered new samples.
In addition, ML applications have also employed filtering, e.g. median filter [30], and indices, e.g. Red Green Blue Vegetation Index [34], to improve the features extracted before classification.
In the spectra scenario, the approach to obtaining the spectra and their processing before classification has a crucial impact. Xu et al. [32] applied threshold segmentation to the 810 nm band to obtain the leaf region and then averaged this region to generate a spectral signature. To filter the spectra, they applied the standard normal variable and the first derivative, while to de-noise the data they applied an algorithm based on empirical mode decomposition. In addition, the authors discarded the first 78 wavelengths. Gutierrez et al. [35] selected the spectra of a pixel to represent the licence and then calculated Pearson’s r between that pixel and a group of pixels, if it was greater than 0.9, it would be used. The authors applied the standard normal variate (SNV) and Savitzky-Golay (SG) filtering with two orders of derivatives to pre-process the spectra. The first 25 wavelengths were discarded to avoid noise. Fuentes et al.[36] averaged the values from 5 different measurement points to calculate the spectra that would represent each leaf. Fernandes et al. [56] after calculating the reflectance using the acquired spectra, applied the SG filter, logarithm, multiplicative scatter correction (MSC), SNV, first derivative and second derivative to the data, comparing the results for each approach.

3.3. Architecture and Training

Considering that most of the studies included in the study applied DL-based techniques to classify grape varieties, the main approach used in the architecture was an ensemble of transfer-learning and fine-tuning. However, a few different techniques were also employed: hand-crafted architectures, Fused Deep Features and extraction using DL-based models plus Support Vector Machines (SVM) classifiers. Otherwise, in ML-based studies, SVM with raw spectra emerged as the most widely used approaches.

3.3.1. Deep Learning

AlexNet [23], VGG-16 [79], ResNet [80], DenseNet [81], Xception [82], MobileNetV2 [83], EfficientNetV2 [84], Inception V3 [85], Inception ResNet [86] were the Convolutional Neural Networks (CNNs) architectures employed in the image-based studies included in this review. These networks were first trained on ImageNet and then transfer learning and fine-tuning were employed in two stages. In the first step, their classifiers are replaced by a new one and the convolutional weights are frozen so that only the new classifier is trained and has its weights updated (transfer learning). In the second step, all the weights are unfrozen and the entire architecture is retrained (tuning). Detailed information on each architecture used can be found in Alzubaidi et al. [87]. In a different way, Carneiro et al. [47] and Kunduracioglu and Pacal [38] used Vision Transformers throught ViT [88], Swin-Transformers [89], MobileViT [90], Deit [91], and MaxViT [92] but followed the same learning strategy.
Unlike other studies based on image classification, Peng et al. [54], Lv [42], and Doğan et al. [40] used Fused Deep Features to identify grapevine varieties. This approach consists of extracting features from images from more than one source, concatenating all the extracted features and then classifying them. Peng et al. [54] extracted features from AlexNet, ResNet and GoogLeNet, then fused the features using the Canonical Correlation Analysis algorithm [93] and then classified the vine varieties using an SVM classifier. In addition, the authors trained the aforementioned architectures with fully connected classifiers, which resulted in worse performance than the proposed method. They argued that the small size of the dataset is the main reason why it is difficult to obtain better results using CNN directly. Lv [42] merged the results of VGG-19, ViT, Inception Resnet, DenseNet and ResNext, but instead of merging the extracted features, the final classification of each model was used so that voting strategies were applied to obtain the final classification. Doğan et al. [40] merged attributes from VGG-19 and MobileNetV2. The difference is that the authors used a Genetic Based Support Vector Machine to select the best features for classification, improving the results by 3 percentage points. The final classification of the selected feature was done using an SVM. Koklu et al. [7] also used the features extracted from a pre-trained CNN architecture plus an SVM classifier. The idea was to extract features from the logits of the first fully connected layer of MobileNetV2 and use them to test the performance of four different SVM kernels: Linear, Quadratic, Cubic and Gaussian. In addition, the authors carried out experiments using the Chi-Square test to select the 250 most representative features in the logits.
Regarding transfer-learning and fine-tuning, Carneiro et al. [48] and Ahmed et al. [60] analysed the impact of different frozen layers. Carneiro et al. [48] obtained the same metrics for different frozen layers, while Ahmed et al. [60] showed that freezing the half-convolution part of the model leads to better classification. On the other hand, Carneiro et al. [48] also stated, using Explainable Artificial Intelligence (XAI), that freezing the entire convolution part or training the entire model leads to more similar results, in terms of explainability, than training only half of the convolutional part of the model, whereby in the latter case the background part of the images contributed more to classification than in the first case.
Some optimisers, global pooling techniques and losses were used for training. Among the optimisers available in the literature for training machine learning models, Stochastic Descent Gradient (SGD) and Adam [94] were used in the selected studies. In addition to SGD, it was possible to use an adaptive learning rate scaler or a momentum technique to improve the training process.
All the image-based studies that have used a global pooling method have opted for Global Average Pooling, with the aim of reducing the CNN activation maps before classification. The losses used were Cross Entropy loss (CE) and Focal Loss (FL) [95]. Focal Loss is a modification of the CE loss that reduces the weight of easy examples and thus concentrates training on difficult cases. It was first used in object detection studies, due to its huge imbalance between detected bins of "objects" and "non-objects", however Mukhoti et al. [96] concluded that it can also be used to deal with calibration errors of multi-class classification models, in the sense that the probability values they associate with the labels of the classes they predict overestimate the probabilities of those labels being correct in the real world. Carneiro et al. [50] used Focal Loss to mitigate the imbalance in the dataset used.

3.3.2. Machine Learning

The training process for ML-based approaches can be seen in three stages: 1) feature extraction, 2) dimension reduction and/or feature selection and 3) classification.
The included studies targeting HSI and point clouds did not apply any feature extraction process, using the raw spectra/points as a representation of the samples. On the other hand, a number of descriptors were adopted when classifying the images. Most of the descriptors extracted were based on colour (e.g. RGB statistical values, histogram operations, transformation to CEILAB L*a*b* colour space), shape (e.g. roundness, aspect ratio, convex area, perimeter) and texture (e.g. entropy, contrast, energy, homogeneity). In a different way, Fuentes et al. [36] applied fractal dimension analysis and Abassi and Jalal used KAZE [97] as feature descriptors. Fractal dimension analysis measures the complexity of shapes. Abassi and Jalal applied it to leaf shapes using the box-counting method. On the other hand, KAZE is a feature detector-descriptor algorithm that exploits non-linear diffusion filtering in the detection and description of multi-scale features, so that these features use a non-linear scale space instead of a Gaussian scale space. The extracted features are robust to changes in size (scaling), orientation (rotation), small distortions (limited affine transformations) and, furthermore, these features are distinct at various image scales [98]. In the 3D point cloud scenario, the Pair-Wise Iterative Closest point was applied to obtain the similarity between pairs of point clouds.
For feature reduction or selection studies targeting images, recursive removal of highly correlated features [34] and Principal Component Analysis (PCA) [36] were applied. Also, Garcia et al. [31] evaluated the use of three selection methods: Kendall’s Rank Coefficient, Wrapper and embedding. In the Kendall’s Rank Coefficient, features were selected by their level of correlation, removing redundant features. The Wrapper method used a logistic regression model as an estimator to select the best combination of characteristics. The embedding method used a Random Forest to determine the weights of each characteristic. According to the authors, the best performance was achieved by the embedding method.
Finally, SVM, k-NN, decision tree, Linear Discriminant Analysis, Logistic Regression, Softmax Regression, Gaussian Naive Bayes and Artificial Neural Networks (ANN) were used for classification. It is worth noting that the studies that used HSI opted only for SVMs or ANNs. On the other hand, for those targeting images, all classifiers were explored.

3.4. Evaluation

Aiming to quantitatively evaluate trained models, accuracy is the most used metric, followed by the F1 Score. Some studies also use precision, recall, Area-Under-The-Curve (AUC), specificity, or the Matthews correlation coefficient (MCC).
On the other hand, as in other areas of research, some studies based on DL use XAI to qualitatively evaluate their models. XAI is a set of processes and methods aimed at enabling humans to understand, adequately trust and effectively manage the emerging generation of artificially intelligent models [99]. The techniques employed by the selected studies are model-agnostic for post-hoc explainability,which means that no modification to the architecture was necessary in order to apply them.
Nasiri et al. [53] and Pereira et al.[58] extracted the filters learnt by their models. In addition, Nasiri et al. [53] also produced Saliency Maps. Carneiro et al. [49] and Liu et al. [51] used Grad-CAM [100] to obtain heatmaps focused on the pixel’s contribution to a specific class. Carneiro et al. [44] also used Grad-CAM to evaluate models, but instead of analysing the heatmaps generated, they used them to calculate the classification similarity between pairs of trained models. The authors calculated the heatmaps for the test subset for the models and calculated the cosine similarity between these heatmaps for the pairs of models. The authors concluded that, among the data augmentation approaches used, static geometric transformations generate representations more similar to RandAugment than to CutMix.
Carneiro et al. [50] used Local Interpretable Model-Agnostic Explanations (LIME) [101] for the same purpose. Furthermore, Carneiro et al. [47] extracted attention maps from ViT and checked the impact of sample rotation using them.
To generate saliency maps, Nasiri et al. [53] began by calculating the derivative of a class score function, which can be approximated by a first-order Taylor expansion. The elements of the calculated derivative were then rearranged. Grad-CAM is a technique proposed by Selvaraju et al. [100] that aims to explain how a model concludes that an image belongs to a certain class. The idea is to use the gradient of the score of the predicted class in relation to the activation maps of a selected convolutional layer. The selection of the convolutional layer is arbitrary. As a result, heat maps are obtained containing the regions that contribute positively to image classification. According to the authors, obtaining explanations of the predictions using Grad-CAM makes it possible to increase human confidence in the model and, at the same time, to understand classification errors. Like Grad-CAM, LIME [101] is an explainability approach used to explain individual machine learning model predictions for a specific class. Unlike Grad-CAM, it is not restricted to CNNs, so it is applicable to any machine learning classifier. The idea behind LIME is to train an explainable surrogate model with a new dataset composed of perturbed samples (e.g. hiding parts of the image) derived from the target data, so that it becomes a good approximation of the original model locally (in the neighbourhood of the target data). Then, from the surrogate interpretative model, it is possible to obtain the regions that have contributed to the classification, both positively and negatively.

4. Discussion and Future Directions

Considering that the studies presented in this research represent the current state of the art on the use of ML and DL-based models to identify grapevine varieties, it can be concluded that there is still much to be done. The idea behind this section is to provide a discussion of the techniques used and a guide for future work.

4.1. Looking into the Grapevine Varieties Identification Problem

Before starting to discuss the solutions found in the literature, it is important to define the characteristics that the classification of grapevine varieties using Deep Learning encompasses.
  • Grapevines are seasonal plants. This means that there are periods when the plants will have leaves, and others when they won’t. This feature has a direct impact on the preparation of the dataset, which ideally should cover different phases of leaf growth. In addition, this feature limits the use of fruits in identification, as they take longer to grow than leaves;
  • The presence of some grape varieties (e.g. Syrah, Chardonnay) is more common than others (e.g. Alvarinho), so the datasets are naturally unbalanced and can be treated as a long-tailed data distribution classification (some classes represent the majority of the data, while most classes are under-represented [102]);
  • The classification of varieties within a species has a high inter-class similarity and high intra-class variations, placing the task in the fine-grained recognition problems family;
  • There will be a high presence of unrelated information in the images acquired in the field, which could contribute to classification errors;
  • There is a large amount of publicly available leaf and fruit images that are not annotated for variety identification;

4.2. Machine Learning vs Deep Learning

As in other fields of research, in grape variety identification, DL-based approaches outperformed ML-based approaches in both the number of publications and performance (see Figure 4). Among the included studies that used DL and images, the worst performance was achieved by Pereira et al. [58] with an accuracy of 77.30%, while other studies achieved 100% [37,38,39,40,57]. In this respect, the architecture used by Pereira et al. [58] was AlexNet, the first deep CNN proposed in the literature in 2012 [23].
Further comparing DL and ML in the identification of grape varieties, the results of studies using the dataset proposed by Koklu et al. [7] highlight the general difference. Abassi and Jalal [30], the only study published using an ML-based approach after 2023, achieved an accuracy of 83. 20%, although this result was lower than all the other studies that proposed DL-based approaches to classify the same dataset [7,38,39,40,41,42,46]. In fact, the ability of DL-based approaches to automatically learn to extract useful features has been making exponential advances in computer vision since 2012, with the publication of Krizhevsky et al. [23]. Furthermore, the dataset of Koklu et al. [7] is very small compared to the largest dataset used by DL-based architectures, Magalhães et al. [43], which using small DL-based architectures managed to surpass the results achieved by Abassi and Jalal [30] (94.75 vs 83.20). Given this fact, the focus of the discussion and future directions in this study will be directed towards DL-based approaches.
Despite the high metrics (over 90%) claimed by most of the included studies, the generalisability of these models still needs to be better assessed. De Nart et al. [37] obtained an accuracy of 100% on their dataset, however, when testing the model on the Vlah dataset [62], without retraining, an accuracy of 35% was obtained. This result highlights the importance of creating more generalised datasets, since they are the main basis of deep learning and machine learning performance.
On the other hand, most of the studies that explored HSI were still using ML-based approaches. Fernandes et al. [56] was the only study that classified spectra using DL, through CNNs.

4.3. Datasets

Analysing the datasets used, one can see that they are composed of few varieties and are mostly based on leaves. The largest dataset used is made up of images of 27 different grape varieties [37] or spectra of 30 different grape varieties [35], so there is a big difference between the number of existing varieties and the number of annotated data available. In a practical scenario, taking Portugal as an example, 285 grape varieties are permitted nationwide. In the Douro Demarcated Region (DDR), the region responsible for the "Douro" and "Porto" wine denominations in Portugal, 115 grape varieties are allowed. If a classifier is created to improve production regulations in this region, all 115 grape varieties should be covered. Therefore, a classifier intended for practical application in the DDR should be trained with more grape varieties. The widespread use of leaf images in studies is probably due to the fact that the fruits take longer to grow, and consequently they are needed to collect the seeds. In addition, dried seeds have the advantage of being preserved and used for later identification.
Another problem related to datasets is the acquisition period. Despite the fact that Carneiro et al. [44] and Carneiro et al. [45] acquired data over two complete seasons, the other studies that specified the acquisition period did so in short order. Considering that grapes are plants with large seasonal changes, acquiring data for the entire season is mandatory to improve the usability of the classifier, since this will directly affect its generalisation capacity and practical applications. These facts emphasise the need for an open-source annotated datasets with more varieties, acquired over a long period.
Most of the studies identified used simple data augmentation strategies to increase the number of samples in the training stage. Carneiro et al. [44] evaluated the use of CutMix, RandAugmentation and static geometric augmentations and concluded that geometric transformations remain the best approach for augmenting data in varietal identification. On the other hand, other alternative methods can still be employed to increase the number of samples with higher quality, such as generating synthetic data with General Adversarial Networks, AutoAugment, MixUp, or increasing the feature space.
It is widely discussed in computer vision literature that synthetic data can be used to increase the number of samples in datasets, improving classification results. In the context of vine identification, Generative Adversarial Networks (GANs) can be used to produce images of vines. GANs were proposed by Goodfellow et al. [78] and their general operation consists of training two different models: a generative one, which captures the distribution of the data, and a discriminative one, which estimates the probability of a sample being from a training set or originating from the generative model. The aim of training is to make the generator so good at generating samples that the discriminator can’t see any differences between them and the samples in the training set. Doğan et al. [40] used GANs in their study, however, they were used to improve the resolution of the images, rather than to generate entirely new samples.
Although it has achieved good performance in other fields of research, synthesising data using GAN should be used with attention. According to the results of some studies [103,104,105], there is a mismatch between the data generated by GAN and reality, which can lead to increased misclassification by models trained with synthetic data. To the best of our knowledge, there are no studies that have used GANs to augment grapevine variety identification datasets, however, in the plant context, there are a few that have aimed to identify diseases [106,107,108,109,110].
AutoAugment, proposed by Cubuk et al. [111], can also be used to deal with the lack of data. This approach consists of searching for the best policy to do data augmentation in an image. The policy consists of sub-policies; these sub-policies can be translation, rotation and shear, and the probability and magnitude of each. A search algorithm is then used to find the best policy so that the model produces the best result on a target dataset. In addition, searched policies for big agricultural datasets, e.g. iNaturalist, can be transferred to small datasets. MixUp [112] can also be used to generate new images by combining images and their labels in pairs. According to the authors, the combination regularises the models, improving their ability to generalise.
Another strategy consists of inserting noise, interpolating or extrapolating the learnt feature space to modify features already generated by a model [113]. Chu et al. [114] presented a technique for augmenting the data in the feature space in order to deal with the problem of high data imbalance present in datasets. The authors inserted a unit of attention with the help of the class activation map to obtain class-specific and class-generic features for each class and then, for the class with few samples, mixed the class-specific features with class-generic features from other classes.
Moreover, most of the studies were based on data acquired by handheld devices such as cameras and smartphones, and no studies were found that used remote sensing technology, for example images from unmanned aerial vehicles (UAVs) or satellites. Similarly, all the studies that used images as input centred on RGB images, so there is space for the exploration of multispectral and hyperspectral images in the context of DL-based approaches. Factors such as health, leaf age and leaf position have been considered exclusion factors by some studies. However, none of them evaluated the results in opposite scenarios, for example, unhealthy leaves, leaves from water-stressed plants, variations in daylight, etc.

4.4. Pre-Processing

In the DL-based approaches which targeted images, segmentation methods based on classical image processing and DL-based architectures are applied in the pre-processing steps. The results obtained by Carneiro et al. [50], Carneiro et al. [48] and Ferentinos [115] show that leaf classification using images acquired in the field can be prone to interference from unrelated information. However, the results of Carneiro et al. [47] and Carneiro et al. [45] showed that the application of DL-based segmentation architectures (U-Net and SegNet) did not improve the results in terms of metrics. Indeed, in Carneiro et al. [47] performance decreased by 0.1 percentage points. However, in both studies the dataset used to train the models is small or the annotation was rough, and the results obtained in other studies indicate that removing the background can lead the models to faster convergence and better results in classifying leaves [58,116,117], suggesting that more research can be conducted in this direction.
Pre-processing for ML-based approaches targeting images was linked to feature extraction, for example segmentation to extract information about leaf shapes using images acquired in a controlled environment, rather than the presence of elements outside the leaves. On the other hand, ML-based approaches targeting HSI used pre-processing to eliminate noise and clean the signal before classification.

4.5. Architectures and Training

4.5.1. Deep Learning

In architectural terms, CNN and Transformer architectures pre-trained on ImageNet were used, individually and also through feature fusion. The use of such architectures implies the application of various techniques in grape variety identification: residual connections (ResNet, Xception, ViT), Depth Separable Convolutions (Xception, MobileNetV2), Dense Blocks (DenseNet) and Linear Bottlenecks (MobileNetV2), self-attention mechanism (ViT) are examples. These techniques have led to major performance improvements in various image classifications over time.
Focusing on the studies by Pereira et al. [58] and Nasiri et al. [53], they used the pre-trained AlexNet and VGG-16 models, respectively. Despite good results in the past, ImageNet classification results show that there is a high probability of improving results if more recent pre-trained CNN architectures are used, as in Carneiro et al. [45], Magalhães et al. [43], or Kunduracioglu and Pacal [38]. In addition, there are newer CNN architectures that remain untested, for example MobileNetV3 [118], larger members of the EfficientNet family [84,119], ConvNext [120,121].
Although CNNs represent the state of the art in most computer vision tasks, they do have some limitations. Sabour et al. [122], in their introduction to Capsule Networks (CapsNet), cited the inability of CNNs to recognise pose, texture and deformation of an image or its parts, and the lack of use of spatial information. These characteristics can be very useful in images acquired in the field, since leaves are not always placed in the same position, or at the same angle, they are subject to oscillations due to wind, and there is a large amount of unrelated information. According to Patrick et al. [123] the invariance of the characteristics learnt by CNNs comes from the pooling operation, which leads the model to lose information. CapsNets have been proposed to solve these problems and, in this context, can be applied to identifying grape varieties. The authors proposed replacing max-pooling with a "routing-by-agreement" process and convolutional filters with Capsules. Unlike a convolutional filter, a capsule is a group of neurons that learn a vector of features (e.g. angle, scale, pose). Capsules make CapsNets utilise space when extracting features and routing by agreement avoids information loss. In addition, Andrushia et al. [124] successfully applied CapsNet to the detection of diseases in grapevine leaves and concluded that this model is able to extract more useful features than CNNs due to its ability to map hierarchical pose relationships.
Carneiro et al. [49] and Kunduracioglu and Pacal [38] explored transformers in the context of identifying grapevine varieties. Dosovitskiy et al. [88] introduced the concept of self-attention in image classification with ViT models, with the aim of repeating the good results obtained by Transformer Networks in natural language processing. Raghu et al. [125] compared the representations generated by ViTs and CNNs. As a result, they were able to show that ViTs make better use of global information, strongly propagate information between layers and preserve more spatial information. However, these models have to be trained with a large amount of data to maintain similar performance or outperform CNNs in computer vision tasks. Thus, the fine-tuning strategy becomes mandatory to apply it to grape variety identification, since it is a task based on small data sets [126]. More details on ViTs can be found in Khan et al. [127]. Carneiro et al. [49] stated that the use of ViTs outperformed previous results, however, at the cost of increased computational needs for training and inference. On the other hand, Kunduracioglu and Pacal [38] achieved 100% accuracy for both CNN (Inception V4) and transformers (Swin Transformers), but Rajab et al. [39] obtained the same result on the same dataset using an older model (VGG-19). Thus, there is still space to explore the impact of using transformers on grape variety identification.
An important factor in the results is that all the studies that used pre-trained architectures explored the transfer-learning plus fine-tuning configuration, using the pre-trained weights to initialise the models. Recent studies [128,129,130] have pointed out that, despite the excellent results brought by this arrangement, it introduces two inconsistencies in training in the agricultural context: domain shift and supervision collapse. The domain shift refers to the fact that the classification of natural scenes is very different from the classification of agricultural images, while the supervision collapse may occur because the pre-training was carried out using a fixed number of labels, and the model tends to concentrate on the information relevant to mapping the input data onto these labels, then discarding the information relevant to the classification of agricultural images. Another impediment caused by using pre-trained weights from generalised datasets is the fixed type of inputs. Since these models were trained with RGB images, multispectral and hyperspectral images cannot be used as input for the applications. Other alternatives to overcome transfer learning are to replace supervised training with different learning strategies in the agricultural domain, such as unsupervised learning [131], semi-supervised learning [132] or self-supervised learning [133]. As mentioned above, despite the lack of annotated data, there are many images of the context of grape cultivars publicly available on the web (e.g. iNaturalist collection 1), so these approaches can be used to pre-train DL-large models using the unlabelled data.
Another factor to pay attention to is the natural existence of tailed classes in datasets, since this task can treated as a long-tailed data distribution classification. Imbalance can be problematic because the model tends to overfit the classes with more samples [134]. In this context, few-shot learning approaches [135] can be applied to minimise the lack of samples for some classes. Carneiro et al. [50] tested the use of Focal Loss to deal with unbalance in the dataset, even if there were no tail classes in the dataset (between 60 and 75 images per class), but other balanced losses [102,136] can still be tested. Cui et al. [102] introduced the Softmax Class-Balanced cross entropy loss, a modification of the cross entropy loss that introduces a weighting factor inversely proportional to the effective number of samples in a class. Similarly, Park et al. [136] created a loss function using an influence measure to identify how each sample affects biased decisions that cause the model to overfit. They assigned weights to each sample accordingly.
Furthermore, no study has analysed the classification of grape varieties as a fine-grained recognition problem. This family of classifiers aims to recover and recognise images belonging to multiple subordinate categories of a supercategory [137]. Among the techniques that can be applied are Bilinear CNNs [138,139] and Multi-Objective Matrix Normalisation [140]. The idea behind these techniques is to insert coded high-order statistics into the final features of the models. Both models use the covariance matrix-based representation with the aim of improving results for fine-grained tasks. Other tools that can be explored are the loss functions built specifically for fine-grained recognition tasks [141,142,143].

4.5.2. Machine Learning

Regarding the studies that classified images, the classic feature extractor and KAZE were used to represent the images before classification. Other feature descriptors can still be explored, such as SIFT [144] or SURF [145]. In terms of dimensionality reduction, other techniques can also be tested, for example auto-coders, Maximum Variance Unfolding or Local Tangent Space Analysis. Otherwise, most of the classic classifiers were tested in this scenario. The best performance was obtained by Garcia et al. [31] (F1 Score 0.89, see Table 2), using a combination of classical feature extraction based on colour, texture and shape analyses, using the embedding method for feature selection and a decision tree as a classifier.
On the other hand, a combination of raw spectra (without feature extraction) and SVM or ANN has been tested by studies targeting spectra. These studies differ more architecturally in the way they composed the raw spectra and the acquisition devices. The best performance was achieved by Xu et al. [32] (Accuracy of 99.31, see Table 2) with an SVM classifier, using spectra constructed on the basis of the mean value of the values within the leaf area.
The exploitation of 3D point clouds is currently in the early stages of development, and only iterative nearest point analysis and linear discriminant analysis have been tested. Landa et al. [33] obtained an accuracy of 93%.

4.6. Evaluation

Grad-CAM, LIME, visualization of learned features, attention maps, and saliency maps were the XAI techniques employed by some of the selected studies. These approaches allowed to understand what the models are seeing when they make decisions. This understanding is very important when it comes to practical applications. Taking DDR as an example, if a classifier based on DL were to be used to help regulate production, it should be ensured that the decisions are using information related to the vine, for example, leaf, stem and fruit. In addition, these techniques can be used to check the impact of pre-processing steps, e.g. Liu et al. [51]. Thus, generating explanations for predictions increases human confidence in the classification process. Given that most of the selected studies are centred on images, the rest of the analysis focused on this type of data.
There are problems with Grad-CAM and LIME. The usability of Grad-CAM is restricted to CNNs, and it is mandatory to choose a layer for the calculations. There is a consensus in the literature that more top layers are chosen because they extract features at a higher level. Furthermore, Subramanya et al. [146] showed with adversarial attacks that this method cannot necessarily show the reason for misclassification.
On the other hand, LIME can be used in any model that makes a prediction, since it is based on surrogate models. However, some studies have tested the stability of the model when generating explanations. Stability is referred to as the ability of the method to generate the same explanation when it is applied several times to the same sample. When applying LIME to explain image-based classifiers, small perturbations can generate large differences in explanations, but if fixed parameters are set, the method can be stable, even if the surrogate model is not faithful [147,148,149].
Moreover, more recent techniques such as Guided Integrated Gradients [150], XRAI [151] and SmoothGrad [152] can also be used to obtain explanations for the classification.

4.7. Comparison with other Subfields of Precision Viticulture

Precision viticulture includes several tasks that can be improved with the use of computer vision. Flower counting, grape detection, berry counting, yield estimation, disease identification and variety identification are examples of such tasks. Mohimont et al. [25] analysed recent precision viticulture studies that applied DL techniques to all of the above yield-related tasks, excluding variety identification and disease detection. The number of studies the authors found for each task individually does not differ from the number of studies found in the present study. Furthermore, the literature pays more attention to disease detection. To the best of our knowledge, no study has analysed methods for identifying grapevine diseases. However, a simple search of the literature reveals studies that have explored self-supervised learning [153], unsupervised learning [154], the use of multispectral data acquired with UAVs [155], and other more recent techniques [156]. This behaviour is repeated in other plant species, since diseases affect food safety, thus there are more studies aimed at identifying diseases, although, at the same time, this implies that research into methods for the automatic identification of grapevine varieties through deep learning is still moving at a small pace and has a lot to explore.

5. Conclusion

This study presents a review of studies aimed at identifying grapevine varieties using Machine and Deep approaches. A total of 31 articles were analysed, taking into account the inclusion criteria. The results indicate that the automatic identification of grapevines using DL models is still in its early stages, with considerable room for improvement. Most of the studies focused on identification using images, applying transfer learning and fine-tuning to pre-trained models on large datasets for general scene classification.
From the 31 studies selected in this review, it can be concluded that:
  • (RQ1) How have ML and DL-based techniques been used for the automatic identification of grapevine varieties? Pre-trained architectures for image classification are the way in which DL has been most widely applied to the identification of grapevine varieties. On the other hand, ML has been used to classify images and spectra.
  • (RQ2) What are the best architectures for the automatic identification of grapevine varieties? Since most of the data sets are small and based on images, DL architectures have obtained the best results.Fine-tuning and transfer-learning techniques were used in almost all image-based classification studies that utilised DL. The most frequently used pre-trained architecture was EfficientNet, accompanied by cross entropy loss and static geometric data augmentation strategies. In the Evaluation phase, in addition to the popular classification metrics, Accuracy and F1 Score, Grad-CAM was the most frequently used XAI method.
  • (RQ3) What are the main challenges and future development trends in identifying grape varieties using ML and DL-based models? Considering that the majority of studies have applied DL-based approaches and their superior performance compared to ML strategies, future development trends point towards DL. There is still room to evaluate the removal of complex background in images acquired in the field; the generation of new samples through GANs can be explored; new architectures can be tested, for example Capsules Networks, or Bilinear CNNs; different losses can still be explored [102,136,141,142,143]; and other XAI approaches can also be employed, for example Guided Integrated Gradients, XRAI and SmoothGrads.

Author Contributions

Conceptualization, G.A.C., A.C., and J.S.; methodology, G.A.C.; validation, J.S. and A.C.; formal analysis, G.A.C., A.C., and J.S.; investigation, G.A.C.; resources, G.A.C., A.C., and J.S.; data curation, G.A.C.; writing—original draft preparation, G.A.C and J.S; writing—review and editing, G.A.C, A.C., and J.S.; visualization, G.A.C.; supervision, J.S and A.C..; project administration, J.S.; funding acquisition, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research activity was supported by –“DATI - Digital Agriculture Technologies for Irrigation efficiency” project. PRIMA – Partnership for Research and Innovation in the Mediterranean Area, (Research and Innovation activities), financed by the states participating in the PRIMA partnership and by the European Union, through Horizon 2020 and by FCT - Portuguese Foundation for Science and Technology, under the project UIDB/04033/2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
Acc Accuracy
AUC Area-Under-The-Curve
ANN Artificial Neural Network
CapsNet Capsule Networks
CE Cross Entropy
CNN Convolutional Neural Network
DNA Deoxyribonucleic acid
DDR Douro Demarcated Region
DL Deep Learning
ESRGAN Enhanced Super Resolution Generative Adversarial Networks
FL Focal Loss
GAN Generative Adversarial Network
Grad-CAM Gradient-weighted Class Activation Mapping
HSI Hyperspectral Imagery
ICA Independent Component Analysis
k-NN k-Nearest Neighbors
LIME Local Interpretable Model-Agnostic Explanations
MSC Multiplicative Scatter Correction
MCC Matthews correlation coefficient
PCA Principal Component Analysis
RGB Red, Green, Blue
RQ Research Questions
SG Savitzky-Golay
SNV Standard Normal Variate
SVM Support Vector Machines
SGD Stochastic Descent Gradient
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-analyses
UAV Unmanned Aerial Vehicles
ViT Vision Transformer
XAI Explainable Artificial Intelligence

References

  1. Eyduran, S.P.; Akin, M.; Ercisli, S.; Eyduran, E.; Maghradze, D. Sugars, organic acids, and phenolic compounds of ancient grape cultivars (Vitis vinifera L.) from Igdir province of Eastern Turkey. Biological Research 2015, 48, 2. [Google Scholar] [CrossRef] [PubMed]
  2. Nascimento, R.; Maia, M.; Ferreira, A.E.N.; Silva, A.B.; Freire, A.P.; Cordeiro, C.; Silva, M.S.; Figueiredo, A. Early stage metabolic events associated with the establishment of Vitis viniferaPlasmopara viticola compatible interaction. Plant Physiology and Biochemistry 2019, 137, 1–13. [Google Scholar] [CrossRef] [PubMed]
  3. Cunha, J.; Santos, M.T.; Carneiro, L.C.; Fevereiro, P.; Eiras-Dias, J.E. Portuguese traditional grapevine cultivars and wild vines (Vitis vinifera L.) share morphological and genetic traits. Genetic Resources and Crop Evolution 2009, 56, 975–989. [Google Scholar] [CrossRef]
  4. Schneider, A.; Carra, A.; Akkak, A.; This, P.; Laucou, V.; Botta, R. Verifying synonymies between grape cultivars from France and Northwestern Italy using molecular markers. VITIS - Journal of Grapevine Research 2001, 40, 197–197. [Google Scholar] [CrossRef]
  5. Lacombe, T. Contribution à l’étude de l’histoire évolutive de la vigne cultivée (<em>Vitis vinifera</em> L.) par l’analyse de la diversité génétique neutre et de gènes d’intérêt. Theses, Institut National d’Etudes Supérieures Agronomiques de Montpellier, 2012. MR AGAP - équipe DAAV - Diversité, adaptation et amélioration de la vigne.
  6. Distribution of the world’s grapevine varieties. Technical Report 979-10-91799-89-8, International Organisation of Vine and Wine.
  7. Koklu, M.; Unlersen, M.F.; Ozkan, I.A.; Aslan, M.F.; Sabanci, K. A CNN-SVM study based on selected deep features for grapevine leaves classification. Measurement 2022, 188, 110425. [Google Scholar] [CrossRef]
  8. Moncayo, S.; Rosales, J.D.; Izquierdo-Hornillos, R.; Anzano, J.; Caceres, J.O. Classification of red wine based on its protected designation of origin (PDO) using Laser-induced Breakdown Spectroscopy (LIBS). Talanta 2016, 158, 185–191. [Google Scholar] [CrossRef] [PubMed]
  9. Giacosa, E. Wine Consumption in a Certain Territory. Which Factors May Have Impact on It? In Production and Management of Beverages; Woodhead Publishing, 2019; pp. 361–380. [CrossRef]
  10. of Vine, T.I.O. ; Wine. STATE OF THE WORLD VITIVINICULTURAL SECTOR IN 2020 2020. [Google Scholar]
  11. Jones, G.; Alves, F. Impact of climate change on wine production: a global overview and regional assessment in the Douro Valley of Portugal. Int. J. of Global Warming 2012, 4, 383–406. [Google Scholar] [CrossRef]
  12. Chitwood, D.H.; Ranjan, A.; Martinez, C.C.; Headland, L.R.; Thiem, T.; Kumar, R.; Covington, M.F.; Hatcher, T.; Naylor, D.T.; Zimmerman, S.; et al. A Modern Ampelography: A Genetic Basis for Leaf Shape and Venation Patterning in Grape. Plant Physiology 2014, 164, 259–272. [Google Scholar] [CrossRef]
  13. Galet, P. Précis d’ampélographie pratique; 1952.
  14. Garcia-Muñoz, S.; Muñoz-Organero, G.; de Andrés, M.; Cabello, F. Ampelography - An old technique with future uses: the case of minor varieties of Vitis vinifera L. from the Balearic Islands. Journal International des Sciences de la Vigne et du Vin 2011, 45, 125–137. [Google Scholar] [CrossRef]
  15. Pavek, D.S.; Lamboy, W.F.; Garvey, E.J. Selecting in situ conservation sites for grape genetic resources in the USA. Genetic Resources and Crop Evolution 2003 50:2 2003, 50, 165–173. [Google Scholar] [CrossRef]
  16. Tassie, L. Vine identification – knowing what you have. The Grape and Wine Research and Development Corporation (GWRDC) Innovators network, 2010. [Google Scholar]
  17. This, P.; Jung, A.; Boccacci, P.; Borrego, J.; Botta, R.; Costantini, L.; Crespan, M.; Dangl, G.S.; Eisenheld, C.; Ferreira-Monteiro, F.; et al. Development of a standard set of microsatellite reference alleles for identification of grape cultivars. Theoretical and Applied Genetics 2004, 109, 1448–1458. [Google Scholar] [CrossRef] [PubMed]
  18. Calo, A.; Tomasi, D.; Crespan, M.; Costacurta, A. Relationship between environmental factors and the dynamics of growth and composition of the grapevine. Acta Horticulturae 1996, 427, 217–231. [Google Scholar] [CrossRef]
  19. Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A Review on Deep Learning Techniques Applied to Semantic Segmentation 2017.
  20. Gutiérrez, S.; Tardaguila, J.; Fernández-Novales, J.; Diago, M.P. Support Vector Machine and Artificial Neural Network Models for the Classification of Grapevine Varieties Using a Portable NIR Spectrophotometer. PLOS ONE 2015, 10, e0143197, Publisher: Public Library of Science. [Google Scholar] [CrossRef] [PubMed]
  21. Karakizi, C.; Oikonomou, M.; Karantzalos, K. Vineyard Detection and Vine Variety Discrimination from Very High Resolution Satellite Data. Remote Sensing 2016, 8, 235, Number:3 Publisher: Multidisciplinary Digital Publishing Institute. [Google Scholar] [CrossRef]
  22. Diago, M.P.; Fernandes, A.M.; Millan, B.; Tardaguila, J.; Melo-Pinto, P. Identification of grapevine varieties using leaf spectroscopy and partial least squares. Computers and Electronics in Agriculture 2013, 99, 7–13. [Google Scholar] [CrossRef]
  23. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Curran Associates, Inc., 2012, Vol. 25.
  24. Chen, Y.; Huang, Y.; Zhang, Z.; Wang, Z.; Liu, B.; Liu, C.; Huang, C.; Dong, S.; Pu, X.; Wan, F.; et al. Plant image recognition with deep learning: A review. Computers and Electronics in Agriculture 2023, 212, 108072. [Google Scholar] [CrossRef]
  25. Mohimont, L.; Alin, F.; Rondeau, M.; Gaveau, N.; Steffenel, L.A. Computer Vision and Deep Learning for Precision Viticulture. Agronomy 2022, 12, 2463, Number:10 Publisher: Multidisciplinary Digital Publishing Institute. [Google Scholar] [CrossRef]
  26. Ferro, M.V.; Catania, P. Technologies and Innovative Methods for Precision Viticulture: A Comprehensive Review. Horticulturae 2023, 9, 399, Number:3 Publisher: Multidisciplinary Digital Publishing Institute. [Google Scholar] [CrossRef]
  27. Chai, J.; Zeng, H.; Li, A.; Ngai, E.W.T. Deep learning in computer vision: A critical review of emerging techniques and application scenarios. Machine Learning with Applications 2021, 6, 100134. [Google Scholar] [CrossRef]
  28. Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Computing Surveys 2022, 54, 1–41, arXiv:2101.01169[cs]. [Google Scholar] [CrossRef]
  29. Dhanya, V.G.; Subeesh, A.; Kushwaha, N.L.; Vishwakarma, D.K.; Nagesh Kumar, T.; Ritika, G.; Singh, A.N. Deep learning based computer vision approaches for smart agricultural applications. Artificial Intelligence in Agriculture 2022, 6, 211–229. [Google Scholar] [CrossRef]
  30. Abbasi, A.A.; Jalal, A. Data Driven Approach to Leaf Recognition: Logistic Regression for Smart Agriculture. In Proceedings of the 2024 5th International Conference on Advancements in Computational Sciences, ICACS 2024. Type: Conference paper. [CrossRef]
  31. Garcia, L.C.; Concepcion, R.; Dadios, E.; Dulay, A.E. Spectro-morphological Feature-based Machine Learning Approach for Grape Leaf Variety Classification. In Proceedings of the 2022 IEEE 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, HNICEM 2022. Type: Conference paper. [CrossRef]
  32. Xu, M.; Sun, J.; Zhou, X.; Tang, N.; Shen, J.; Wu, X. Research on nondestructive identification of grape varieties based on EEMD-DWT and hyperspectral image. Journal of Food Science 2021, 86, 2011–2023. [Google Scholar] [CrossRef] [PubMed]
  33. Landa, V.; Shapira, Y.; David, M.; Karasik, A.; Weiss, E.; Reuveni, Y.; Drori, E. Accurate classification of fresh and charred grape seeds to the varietal level, using machine learning based classification method. SCIENTIFIC REPORTS 2021, 11. [Google Scholar] [CrossRef] [PubMed]
  34. Marques, P.; Padua, L.; Adao, T.; Hruska, J.; Sousa, J.; Peres, E.; Sousa, J.J.; Morais, R.; Sousa, A. Grapevine Varieties Classification Using Machine Learning. PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2019, PT I 2019, 11804, 186–199, ISBN:978-3-030-30241-2; 978-3-030-30240-5 Publisher: Soc Experimental Mech; Portuguese Assoc Artificial Intelligence; SISCOG Type: Conferencepaper. [Google Scholar] [CrossRef] [PubMed]
  35. Gutierrez, S.; Fernandez-Novales, J.; Diago, M.P.; Tardaguila, J. On-The-Go Hyperspectral Imaging Under Field Conditions and Machine Learning for the Classification of Grapevine Varieties. FRONTIERS IN PLANT SCIENCE 2018, 9. [Google Scholar] [CrossRef] [PubMed]
  36. Fuentes, S.; Hernandez-Montes, E.; Escalona, J.M.; Bota, J.; Viejo, C.G.; Poblete-Echeverria, C.; Tongson, E.; Medrano, H. Automated grapevine cultivar classification based on machine learning using leaf morpho-colorimetry, fractal dimension and near-infrared spectroscopy parameters. COMPUTERS AND ELECTRONICS IN AGRICULTURE 2018, 151, 311–318. [Google Scholar] [CrossRef]
  37. De Nart, D.; Gardiman, M.; Alba, V.; Tarricone, L.; Storchi, P.; Roccotelli, S.; Ammoniaci, M.; Tosi, V.; Perria, R.; Carraro, R. Vine variety identification through leaf image classification: a large-scale study on the robustness of five deep learning models. JOURNAL OF AGRICULTURAL SCIENCE, 2024. [Google Scholar] [CrossRef]
  38. Kunduracioglu, I.; Pacal, I. Advancements in deep learning for accurate classification of grape leaves and diagnosis of grape diseases. JOURNAL OF PLANT DISEASES AND PROTECTION. [CrossRef]
  39. Rajab, M.A.; Abdullatif, F.A.; Sutikno, T. Classification of grapevine leaves images using VGG-16 and VGG-19 deep learning nets. Telkomnika (Telecommunication Computing Electronics and Control) 2024, 22, 445–453. [Google Scholar] [CrossRef]
  40. Doğan, G.; Imak, A.; Ergen, B.; Sengur, A. A new hybrid approach for grapevine leaves recognition based on ESRGAN data augmentation and GASVM feature selection. Neural Computing and Applications 2024. [Google Scholar] [CrossRef]
  41. Sun, Y.; Tian, B.; Ni, C.; Wang, X.; Fei, C.; Chen, Q. Image classification of small sample grape leaves based on deep learning. In Proceedings of the ITOEC 2023 - IEEE 7th Information Technology and Mechatronics Engineering Conference; 2023; pp. 1874–1878, Type: Conferencepaper. [Google Scholar] [CrossRef]
  42. Lv, Q. Classification of Grapevine Leaf Images with Deep Learning Ensemble Models. In Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning, CVIDL 2023, 2023, pp. 191–194 Type: Conference paper. [Google Scholar] [CrossRef]
  43. Magalhaes, S.C.; Castro, L.; Rodrigues, L.; Padilha, T.C.; Carvalho, F.D.; Santos, F.N.D.; Pinho, T.; Moreira, G.; Cunha, J.; Cunha, M.; et al. Toward Grapevine Digital Ampelometry Through Vision Deep Learning Models. IEEE Sensors Journal 2023, 23, 10132–10139. [Google Scholar] [CrossRef]
  44. Carneiro, G.; Neto, A.; Teixeira, A.; Cunha, A.; Sousa, J. Evaluating Data Augmentation for Grapevine Varieties Identification. 2023, 3566–3569. [Google Scholar] [CrossRef]
  45. Carneiro, G.A.; Texeira, A.; Morais, R.; Sousa, J.J.; Cunha, A. Can the Segmentation Improve the Grape Varieties’ Identification Through Images Acquired On-Field? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023; 14116 LNAI, 351–363. [Google Scholar]
  46. Gupta, R.; Gill, K.S. Grapevine Augmentation and Classification using Enhanced EfficientNetB5 Model. 2023 IEEE Renewable Energy and Sustainable E-Mobility Conference, RESEM 2023. [CrossRef]
  47. Carneiro, G.A.; Padua, L.; Peres, E.; Morais, R.; Sousa, J.J.; Cunha, A. Segmentation as a Preprocessing Tool for Automatic Grapevine Classification. International Geoscience and Remote Sensing Symposium (IGARSS), 2022; 6053–6056. [Google Scholar] [CrossRef]
  48. Carneiro, G.S.; Ferreira, A.; Morais, R.; Sousa, J.J.; Cunha, A. Analyzing the Fine Tuning’s impact in Grapevine Classification. Procedia Computer Science 2022, 196, 364–370. [Google Scholar] [CrossRef]
  49. Carneiro, G.A.; Pádua, L.; Peres, E.; Morais, R.; Sousa, J.J.; Cunha, A. Grapevine Varieties Identification Using Vision Transformers. In Proceedings of the IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium; 2022; pp. 5866–5869. [Google Scholar] [CrossRef]
  50. Carneiro, G.; Padua, L.; Sousa, J.J.; Peres, E.; Morais, R.; Cunha, A. Grapevine Variety Identification Through Grapevine Leaf Images Acquired in Natural Environment. 2021, 7055–7058. [Google Scholar] [CrossRef]
  51. Liu, Y.; Su, J.; Shen, L.; Lu, N.; Fang, Y.; Liu, F.; Song, Y.; Su, B. Development of a mobile application for identification of grapevine (Vitis vinifera l.) cultivars via deep learning. International Journal of Agricultural and Biological Engineering 2021, 14, 172–179. [Google Scholar] [CrossRef]
  52. Škrabánek, P.; Doležel, P.; Matoušek, R.; Junek, P. RGB Images Driven Recognition of Grapevine Varieties. Springer, 9 2021, Vol. 1268 AISC, Advances in Intelligent Systems and Computing, pp. 216–225. [CrossRef]
  53. Nasiri, A.; Taheri-Garavand, A.; Fanourakis, D.; Zhang, Y.D.; Nikoloudakis, N. Automated grapevine cultivar identification via leaf imaging and deep convolutional neural networks: A proof-of-concept study employing primary iranian varieties. Plants 2021, 10. [Google Scholar] [CrossRef] [PubMed]
  54. Peng, Y.; Zhao, S.; Liu, J.; Peng, Y. .; Zhao, S..; Liu, J. Fused Deep Features-Based Grape Varieties Identification Using Support Vector Machine. Agriculture 2021, Vol. 11, Page 869 2021, 11, 869. [Google Scholar] [CrossRef]
  55. Franczyk, B.; Hernes, M.; Kozierkiewicz, A.; Kozina, A.; Pietranik, M.; Roemer, I.; Schieck, M. Deep learning for grape variety recognition. Procedia Computer Science 2020, 176, 1211–1220. [Google Scholar] [CrossRef]
  56. Fernandes, A.M.; Utkin, A.B.; Eiras-Dias, J.; Cunha, J.; Silvestre, J.; Melo-Pinto, P. Grapevine variety identification using “Big Data” collected with miniaturized spectrometer combined with support vector machines and convolutional neural networks. Computers and Electronics in Agriculture 2019, 163, 104855. [Google Scholar] [CrossRef]
  57. Adão, T.; Pinho, T.M.; Ferreira, A.; Sousa, A.; Pádua, L.; Sousa, J.; Sousa, J.J.; Peres, E.; Morais, R. Digital ampelographer: A CNN based preliminary approach, Cham, 2019; Vol. 11804 LNAI, pp. 258–271. [CrossRef]
  58. Pereira, C.S.; Morais, R.; Reis, M.J.C.S. Deep learning techniques for grape plant species identification in natural images. Sensors (Switzerland) 2019, 19, 4850. [Google Scholar] [CrossRef]
  59. van Eck, N.J.; Waltman, L. VOS: A New Method for Visualizing Similarities Between Objects. In Proceedings of the Advances in Data Analysis; Decker, R.; Lenz, H.J., Eds., Berlin, Heidelberg; 2007; pp. 299–306. [Google Scholar] [CrossRef]
  60. Ahmed, H.A.; Hama, H.M.; Jalal, S.I.; Ahmed, M.H. Deep Learning in Grapevine Leaves Varieties Classification Based on Dense Convolutional Network. Journal of Image and Graphics(United Kingdom) 2023, 11, 98–103. [Google Scholar] [CrossRef]
  61. Santos, T.; de Souza, L.; Andreza, d.S.; Avila, S. Embrapa Wine Grape Instance Segmentation Dataset – Embrapa WGISD 2019. [CrossRef]
  62. Vlah, M. Grapevine Leaves, 2021. [CrossRef]
  63. Al-khazraji, L.R.; Mohammed, M.A.; Abd, D.H.; Khan, W.; Khan, B.; Hussain, A.J. Image dataset of important grape varieties in the commercial and consumer market. Data in Brief 2023, 47, 108906. [Google Scholar] [CrossRef]
  64. Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. wGrapeUNIPD-DL: An open dataset for white grape bunch detection. Data in Brief 2022, 43, 108466. [Google Scholar] [CrossRef]
  65. Seng, K.P.; Ang, L.M.; Schmidtke, L.M.; Rogiers, S.Y. Computer vision and machine learning for viticulture technology. IEEE Access 2018, 6, 67494–67510. [Google Scholar] [CrossRef]
  66. Rodrigues, A. Um método filométrico de caracterização ampelográfica 1952.
  67. Organisation internationale de La Vigne et du Vin. International List of Vine Varieties and Their Synonyms, 2013.
  68. Hyvärinen, A.; Oja, E. A Fast Fixed-Point Algorithm for Independent Component Analysis. Neural Computation 1997, 9, 1483–1492. [Google Scholar] [CrossRef]
  69. Canny, J. A Computational Approach to Edge Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986; PAMI-8, 679–698. [Google Scholar] [CrossRef]
  70. Zheng, X.; Wang, X. Leaf Vein Extraction Based on Gray-scale Morphology. International Journal of Image, Graphics and Signal Processing 2010, 2, 25–31. [Google Scholar] [CrossRef]
  71. Pereira, C.S.; Morais, R.; Reis, M.J. Pixel-Based Leaf Segmentation from Natural Vineyard Images Using Color Model and Threshold Techniques. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018; 10882 LNCS, 96–106. [Google Scholar] [CrossRef]
  72. Du, Q.; Kopriva, I.; Szu, H. Independent-component analysis for hyperspectral remote sensing imagery classification. Optical Engineering - OPT ENG 2006, 45. [Google Scholar] [CrossRef]
  73. Vaseghi, S.; Jetelova, H. Principal and independent component analysis in image processing. Proceeding of the 14th ACM International Conference on Mobile Computing and Networking 2006. [Google Scholar]
  74. Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Transactions on Systems, Man, and Cybernetics 1979, 9, 62–66. [Google Scholar] [CrossRef]
  75. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. CoRR, 1505; abs/1505.0. [Google Scholar]
  76. Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 2481–2495, arXiv: 1511.00561. [Google Scholar] [CrossRef]
  77. Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Loy, C.C.; Qiao, Y.; Tang, X. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks, 2018. arXiv:1809. 0021. [Google Scholar] [CrossRef]
  78. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Communications of the ACM 2014, 63, 139–144. [Google Scholar] [CrossRef]
  79. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 2014. [Google Scholar]
  80. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015, 2016-December, 770–778. [CrossRef]
  81. Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2017 2016, 2017-January, 2261–2269. [CrossRef]
  82. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. 2017, pp. 1800–1807. [CrossRef]
  83. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4510. [Google Scholar] [CrossRef]
  84. Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training, 2021. arXiv:2104. 0029. [Google Scholar] [CrossRef]
  85. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision, 2015. arXiv:1512. 0056. [Google Scholar] [CrossRef]
  86. Szegedy, C.; Ioffe, S.; Vanhoucke, V. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. CoRR, 1602; arXiv:1602.07261. [Google Scholar]
  87. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 2021 8:1 2021, 8, 1–74. [Google Scholar] [CrossRef]
  88. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 2020.
  89. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, 2021. arXiv:2103. 1403. [Google Scholar] [CrossRef]
  90. Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer, 2022. arXiv:2110. 0217. [Google Scholar] [CrossRef]
  91. Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention, 2021. arXiv:2012. 1287. [Google Scholar] [CrossRef]
  92. Tu, Z.; Talebi, H.; Zhang, H.; Yang, F.; Milanfar, P.; Bovik, A.; Li, Y. MaxViT: Multi-Axis Vision Transformer, 2022. arXiv:2204. 0169. [Google Scholar] [CrossRef]
  93. Sun, S.Q.; Zeng, S.G.; Liu, Y.; Heng, P.A.; Xia, D.S. A new method of feature fusion and its application in image recognition. Pattern Recognition 2005, 38, 2437–2448. [Google Scholar] [CrossRef]
  94. Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization, 2015.
  95. Lin, T.Y.; Goyal, P.; Girshick, R.B.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. CoRR, 2017; abs/1708.0. [Google Scholar]
  96. Mukhoti, J.; Kulharia, V.; Sanyal, A.; Golodetz, S.; Torr, P.H.; Dokania, P.K. Calibrating deep neural networks using focal loss. Neural information processing systems foundation, 2 2020, Vol. 2020-Decem.
  97. Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE Features. In Proceedings of the Computer Vision – ECCV 2012; Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C., Eds., Berlin, Heidelberg; 2012; pp. 214–227. [Google Scholar] [CrossRef]
  98. Tareen, S.A.K.; Saleem, Z. A comparative analysis of SIFT, SURF, KAZE, AKAZE, ORB, and BRISK. In Proceedings of the 2018 International Conference on Computing, Mathematics and Engineering Technologies (iCoMET); 2018; pp. 1–10. [Google Scholar] [CrossRef]
  99. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  100. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. International Journal of Computer Vision 2016, 128, 336–359. [Google Scholar] [CrossRef]
  101. Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Association for Computing Machinery, 2 2016, Vol. 13-17-Augu, pp. 97–101. [CrossRef]
  102. Cui, Y.; Jia, M.; Lin, T.Y.; Song, Y.; Belongie, S. Class-Balanced Loss Based on Effective Number of Samples. 2019, pp. 9268–9277.
  103. Barratt, S.; Sharma, R. A Note on the Inception Score 2018. [CrossRef]
  104. Ravuri, S.; Vinyals, O. Seeing is Not Necessarily Believing: Limitations of BigGANs for Data Augmentation. 2019, pp. 1–5.
  105. Shmelkov, K.; Schmid, C.; Alahari, K. How good is my GAN? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2018, 11206 LNCS, 218–234. [CrossRef]
  106. Gomaa, A.A.; El-Latif, Y.M. Early Prediction of Plant Diseases using CNN and GANs. International Journal of Advanced Computer Science and Applications 2021, 12, 514–519. [Google Scholar] [CrossRef]
  107. Nazki, H.; Lee, J.; Yoon, S.; Park, D.S. Image-to-Image Translation with GAN for Synthetic Data Augmentation in Plant Disease Datasets. Smart Media Journal 2019, 8, 46–57. [Google Scholar] [CrossRef]
  108. Talukdar, B. Handling of Class Imbalance for Plant Disease Classification with Variants of GANs. 2020, pp. 466–471. [CrossRef]
  109. Yilma, G.; Belay, S.; Qin, Z.; Gedamu, K.; Ayalew, M. Plant Disease Classification Using Two Pathway Encoder GAN Data Generation. 2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2020, PP. 67-72. [CrossRef]
  110. Zeng, Q.; Ma, X.; Cheng, B.; Zhou, E.; Pang, W. GANS-based data augmentation for citrus disease severity detection using deep learning. IEEE Access 2020, 8, 172882–172891. [Google Scholar] [CrossRef]
  111. Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, V.Q. AutoAugment: Learning Augmentation Policies from Data. Cvpr 2019, 2018; 113–123. [Google Scholar] [CrossRef]
  112. Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization, 2018. arXiv:1710. 0941. [Google Scholar] [CrossRef]
  113. DeVries, T.; Taylor, G.W. Dataset Augmentation in Feature Space. 2017; arXiv:stat.ML/1702.05538]. [Google Scholar]
  114. Chu, P.; Bian, X.; Liu, S.; Ling, H. Feature Space Augmentation for Long-Tailed Data, 2020. arXiv:2008. 0367. [Google Scholar] [CrossRef]
  115. Ferentinos, K.P. Deep learning models for plant disease detection and diagnosis. Computers and Electronics in Agriculture 2018, 145, 311–318. [Google Scholar] [CrossRef]
  116. Kc, K.; Yin, Z.; Li, D.; Wu, Z. Impacts of Background Removal on Convolutional Neural Networks for Plant Disease Classification In-Situ. Agriculture 2021, 11, 827, Number:9 Publisher: Multidisciplinary Digital Publishing Institute. [Google Scholar] [CrossRef]
  117. Wu, Y.J.; Tsai, C.M.; Shih, F. Improving Leaf Classification Rate via Background Removal and ROI Extraction. Journal of Image and Graphics 2016, 4, 93–98. [Google Scholar] [CrossRef]
  118. Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for mobileNetV3. Institute of Electrical and Electronics Engineers Inc., 5 2019, Vol. 2019-Octob, pp. 1314–1324. [CrossRef]
  119. Tan, M.; Le, V.Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 36th International Conference on Machine Learning, ICML 2019, 2019-June, 10691–10700.
  120. Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s, 2022. arXiv:2201. 0354. [Google Scholar] [CrossRef]
  121. Woo, S.; Debnath, S.; Hu, R.; Chen, X.; Liu, Z.; Kweon, I.S.; Xie, S. ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders, 2023. [CrossRef]
  122. Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic Routing Between Capsules. Advances in Neural Information Processing Systems, 2017, 2017-December, 3857–3867.
  123. Kwabena Patrick, M.; Felix Adekoya, A.; Abra Mighty, A.; Edward, B.Y. Capsule Networks – A survey. Journal of King Saud University - Computer and Information Sciences 2022, 34, 1295–1310. [Google Scholar] [CrossRef]
  124. Andrushia, A.D.; Neebha, T.M.; Patricia, A.T.; Sagayam, K.M.; Pramanik, S. Capsule network-based disease classification for Vitis Vinifera leaves. Neural Computing and Applications 2024, 36, 757–772. [Google Scholar] [CrossRef]
  125. Raghu, M.; Unterthiner, T.; Kornblith, S.; Zhang, C.; Dosovitskiy, A. Do Vision Transformers See Like Convolutional Neural Networks? 2021.
  126. Steiner, A.; Kolesnikov, A.; Zhai, X.; Wightman, R.; Uszkoreit, J.; Beyer, L. How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers 2021.
  127. Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Computing Surveys 2021. [Google Scholar] [CrossRef]
  128. El-Nouby, A.; Izacard, G.; Touvron, H.; Laptev, I.; Jegou, H.; Grave, E. Are Large-scale Datasets Necessary for Self-Supervised Pre-training?, 2021. arXiv:2112. 1074. [Google Scholar] [CrossRef]
  129. Doersch, C.; Gupta, A.; Zisserman, A. CrossTransformers: spatially-aware few-shot transfer, 2021. arXiv:2007. 1149. [Google Scholar] [CrossRef]
  130. Lu, Y.; Young, S. A survey of public datasets for computer vision tasks in precision agriculture. Computers and Electronics in Agriculture 2020, 178, 105760. [Google Scholar] [CrossRef]
  131. Nazki, H.; Yoon, S.; Fuentes, A.; Park, D.S. Unsupervised image translation using adversarial networks for improved plant disease recognition. Computers and Electronics in Agriculture 2020, 168, 105117. [Google Scholar] [CrossRef]
  132. Homan, D.; du Preez, J.A. Automated feature-specific tree species identification from natural images using deep semi-supervised learning. Ecological Informatics 2021, 66, 101475. [Google Scholar] [CrossRef]
  133. Güldenring, R.; Nalpantidis, L. Self-supervised contrastive learning on agricultural images. Computers and Electronics in Agriculture 2021, 191. [Google Scholar] [CrossRef]
  134. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks 2018, 106, 249–259. [Google Scholar] [CrossRef] [PubMed]
  135. Argüeso, D.; Picon, A.; Irusta, U.; Medela, A.; San-Emeterio, M.G.; Bereciartua, A.; Alvarez-Gila, A. Few-Shot Learning approach for plant disease classification using images taken in the field. Computers and Electronics in Agriculture 2020, 175, 105542. [Google Scholar] [CrossRef]
  136. Park, S.; Lim, J.; Jeon, Y.; Choi, J.Y. Influence-Balanced Loss for Imbalanced Visual Classification. 2021, pp. 735–744.
  137. Wei, X.S.; Song, Y.Z.; Mac Aodha, O.; Wu, J.; Peng, Y.; Tang, J.; Yang, J.; Belongie, S. Fine-Grained Image Analysis with Deep Learning: A Survey, 2021. arXiv:2111. 0611. [Google Scholar] [CrossRef]
  138. Lin, T.Y.; RoyChowdhury, A.; Maji, S. Bilinear CNNs for Fine-grained Visual Recognition, 2017. arXiv:1504. 0788. [Google Scholar] [CrossRef]
  139. Gao, Y.; Beijbom, O.; Zhang, N.; Darrell, T. Compact Bilinear Pooling. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016; pp. 317–326, ISSN:1063-6919. [Google Scholar] [CrossRef]
  140. Min, S.; Yao, H.; Xie, H.; Zha, Z.J.; Zhang, Y. Multi-Objective Matrix Normalization for Fine-grained Visual Recognition. IEEE Transactions on Image Processing 2020, 29, 4996–5009. [Google Scholar] [CrossRef]
  141. Dubey, A.; Gupta, O.; Guo, P.; Raskar, R.; Farrell, R.; Naik, N. Pairwise Confusion for Fine-Grained Visual Classification, 2018. arXiv:1705. 0801. [Google Scholar] [CrossRef]
  142. Sun, G.; Cholakkal, H.; Khan, S.; Khan, F.S.; Shao, L. Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes, 2019. arXiv:1912. 0684. [Google Scholar] [CrossRef]
  143. Chang, D.; Ding, Y.; Xie, J.; Bhunia, A.K.; Li, X.; Ma, Z.; Wu, M.; Guo, J.; Song, Y.Z. The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification. IEEE Transactions on Image Processing 2020, 29, 4683–4695. [Google Scholar] [CrossRef] [PubMed]
  144. Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. International Journal of Computer Vision 2004, 60, 91–110. [Google Scholar] [CrossRef]
  145. Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded Up Robust Features. In Proceedings of the Computer Vision – ECCV 2006; Leonardis, A.; Bischof, H.; Pinz, A., Eds., Berlin, Heidelberg; 2006; pp. 404–417. [Google Scholar] [CrossRef]
  146. Subramanya, A.; Pillai, V.; Pirsiavash, H. Fooling network interpretation in image classification. Proceedings of the IEEE International Conference on Computer Vision, 2019, 2019-Octob, 2020–2029. [CrossRef]
  147. Alvarez-Melis, D.; Jaakkola, T.S. On the Robustness of Interpretability Methods 2018.
  148. Garreau, D.; von Luxburg, U. Explaining the Explainer: A First Theoretical Analysis of LIME 2020.
  149. Stiffler, M.; Hudler, A.; Lee, E.; Braines, D.; Mott, D.; Harborne, D. An Analysis of Reliability Using LIME with Deep Learning Models 2018.
  150. Kapishnikov, A.; Venugopalan, S.; Avci, B.; Wedin, B.; Terry, M.; Bolukbasi, T. Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2021, abs/2106.0, 5048–5056. [CrossRef]
  151. Kapishnikov, A.; Bolukbasi, T.; Viegas, F.; Terry, M. XRAI: Better Attributions Through Regions. Proceedings of the IEEE International Conference on Computer Vision, 2019, 2019-October, 4947–4956. [CrossRef]
  152. Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. SmoothGrad: removing noise by adding noise 2017. [CrossRef]
  153. Jin, H.; Chu, X.; Qi, J.; Zhang, X.; Mu, W. CWAN: Self-supervised learning for deep grape disease image composition. Engineering Applications of Artificial Intelligence 2023, 123, 106458. [Google Scholar] [CrossRef]
  154. Jin, H.; Li, Y.; Qi, J.; Feng, J.; Tian, D.; Mu, W. GrapeGAN: Unsupervised image enhancement for improved grape leaf disease recognition. Computers and Electronics in Agriculture 2022, 198, 107055. [Google Scholar] [CrossRef]
  155. Kerkech, M.; Hafiane, A.; Canals, R. Vine disease detection in UAV multispectral images using optimized image registration and deep learning segmentation approach. Computers and Electronics in Agriculture 2020, 174, 105446. [Google Scholar] [CrossRef]
  156. Jin, H.; Chu, X.; Qi, J.; Feng, J.; Mu, W. Learning multiple attention transformer super-resolution method for grape disease recognition. Expert Systems with Applications 2024, 241, 122717. [Google Scholar] [CrossRef]
1
Figure 1. Results from queries about studies in the field of machine and deep learning applied to grapevine viticulture identification without date filtering for Scopus and Web of Science. Details of the queries are shown on Section 2.3.
Figure 1. Results from queries about studies in the field of machine and deep learning applied to grapevine viticulture identification without date filtering for Scopus and Web of Science. Details of the queries are shown on Section 2.3.
Preprints 105508 g001
Figure 2. PRISMA workflow applied in this study.
Figure 2. PRISMA workflow applied in this study.
Preprints 105508 g002
Figure 3. Years, countries of origin of the datasets and focus of the selected studies. Most of the studies were published in 2021. Portugal was the biggest source of datasets. The studies focus on leaves rather than fruits and seeds to identify grape varieties.
Figure 3. Years, countries of origin of the datasets and focus of the selected studies. Most of the studies were published in 2021. Portugal was the biggest source of datasets. The studies focus on leaves rather than fruits and seeds to identify grape varieties.
Preprints 105508 g003
Figure 4. Co-occurrences of keywords for the included studies. Figure generated with VOSViewer. [59]
Figure 4. Co-occurrences of keywords for the included studies. Figure generated with VOSViewer. [59]
Preprints 105508 g004
Figure 5. Classic pipeline for training DL-based models. Starting with data acquisition and ending with model evaluation.
Figure 5. Classic pipeline for training DL-based models. Starting with data acquisition and ending with model evaluation.
Preprints 105508 g005
Figure 6. Example images for each publicly available dataset. The images have been cropped into squares, so they cannot represent the actual size of the images in the dataset.
Figure 6. Example images for each publicly available dataset. The images have been cropped into squares, so they cannot represent the actual size of the images in the dataset.
Preprints 105508 g006
Figure 7. Example images for each publicly available dataset. The images have been cropped into squares, so they cannot represent the actual size of the images in the dataset.
Figure 7. Example images for each publicly available dataset. The images have been cropped into squares, so they cannot represent the actual size of the images in the dataset.
Preprints 105508 g007
Table 1. Details of the search carried out. Access to both search engines was made on April 22, 2024.
Table 1. Details of the search carried out. Access to both search engines was made on April 22, 2024.
Database Website Query
Scopus https://www.scopus.com/home.uri TITLE-ABS-KEY (("grape variety" OR "grapevine") AND ("classification" OR "identification" OR "detection") AND "deep learning" OR "machine learning")
Web of Science https://www.webofscience.com/ TS=(("grape variety" OR "grapevine") AND ("classification" OR "identification" OR "detection") AND ("deep learning" OR "machine learning"))
Table 2. A summary of the selected studies that used ML to classify grape varieties. The year, location of the data set, description, the part of the plant where the images were focused, the feature extractors and classifiers used, as well as the results are presented.
Table 2. A summary of the selected studies that used ML to classify grape varieties. The year, location of the data set, description, the part of the plant where the images were focused, the feature extractors and classifiers used, as well as the results are presented.
Study Year Data Location Dataset Description Focus Features Extractor Classifiers Results
Abassi and Jalal [30] 2024 Turkey 500 RGB images distributed between 5 classes acquired in controlled environment Leaves Kaze and Blob Softmax Regression 83.20 (Acc)
Garcia et al. [31] 2022 Philippines 1149 RGB images dsitributed between 7 classes acquired in controlled environment Leaves Color, Texture, and shape mensurement analysis SVM, k-NN, and Decision Tree 89.00 (F1)
Xu et al.  [32] 2021 China 480 spectras distributed between 4 classes acquired in controlled environment Fruits Raw Signature SVM 99.31 (Acc)
Landa et al. [33] 2021 Israel 400 3D point clouds distributed between 8 classes acquired in a controlled environment Seeds Pair-wise using Iterative Closest Point Linear Discriminant Analisys 93.00 (Acc)
Marques et al.  [34] 2019 Portugal 240 RGB images distributed between 3 classes acquired in controlled environment Leaves Color and Shape Features Linear Discriminant Analisys, Logistic Regression, k-NN, Decision Tree, Gaussian Naive Bayes, SVM 86.90 (F1)
Gutiérrez et al. [35] 2018 Italy 2400 spectras distributed between 30 varieties acquired in field using vehicle Leaves Raw Signature SVM and ANN 0.99 (F1)
Fuentes et al. [36] 2018 Spain 138 RGB and 144 spectra distributed between 16 varities in a controlled environment Leaves Fractal dimensions, color and shape mensurement; Raw Spectra ANN 71.44 (Acc)
Table 3. A summary of the selected studies that used DL to classify grape varieties published between 2023 and 2024. The year, location of the data set, description, the part of the plant where the images were focused, the architectures used, as well as the results are presented.
Table 3. A summary of the selected studies that used DL to classify grape varieties published between 2023 and 2024. The year, location of the data set, description, the part of the plant where the images were focused, the architectures used, as well as the results are presented.
Study Year Data Location Dataset Description Focus Architecture Results
De Nart et al. [37] 2024 Italy 26382 RGB images distributed between 27 classes acquired in field and in a controlled environment Leaves MobileNetV2, EfficientNet, ResNet, Inception ResNet V2, and Inception V3 1.00 (Acc)
Kunduracioglu and Pacal [38] 2024 Turkey 500 RGB images distributed between 5 classes acquired in controlled environment Leaves VGG-16, ResNet, Xception, Inception, EfficientNetV2, DenseNet, SwinTransformers, MobileViT, ViT, Deit, MaxVit 100.00 (F1)
Rajab et al. [39] 2024 Turkey 500 RGB images distributed between 5 classes acquired in controlled environment Leaves VGG-16 and VGG-19 100.00 (Acc)
Doğan et al. [40] 2024 Turkey 7000 RGB images distributed between 5 classes acquired in controlled environment Leaves Fused Deep Features + SVM 1.00 (F1)
Sun et al. [41] 2023 Turkey 500 RGB images distributed between 5 classes acquired in controlled environment Leaves Handcraft 91.58 (F1)
Lv [42] 2023 Turkey 2800 RGB images distributed between 5 classes acquired in controlled environment Leaves VGG-19, ViT, Inception ResNet, DenseNet, ResNext 0.98 (F1)
Magalhães et al. [43] 2023 Portugal 40428 RGB images distributed between 26 classes acquired in a controlled environment Leaves MobileNetV2, ResNet-34 and VGG-11 94.75 (F1)
Carneiro et al. [44] 2023 Portugal 6216 RGB images distributed between 14 classes acquired in field Leaves EfficientNetV2S 0.89 (F1)
Carneiro et al. [45] 2023 Portugal 675 RGB images distributed between 12 classes; 4354 RGB images distributed between 14 classes; both acquired in field Leaves EfficientNetV2S 0.88 (F1)
Gupta and Gill [46] 2023 Turkey 2500 RGB images distributed between 5 classes acquired in controlled environment Leaves EfficientNetB5 0.86 (Acc)
`
Table 4. A summary of the selected studies that used DL to classify grape varieties published between 2018 and 2022. The year, location of the data set, description, the part of the plant where the images were focused, the architectures used, as well as the results are presented.
Table 4. A summary of the selected studies that used DL to classify grape varieties published between 2018 and 2022. The year, location of the data set, description, the part of the plant where the images were focused, the architectures used, as well as the results are presented.
Study Year Data Location Dataset Description Focus Architecture Results
Ahmed et al. [47] 2022 Turkey 500 RGB images distributed between 5 classes acquired in controlled environment Leaves DenseNet201 98.02 (F1)
Carneiro et al. [47] 2022 Portugal 28427 RGB images distributed between 6 classes acquired in field Leaves Xception 0.92 (F1)
Carneiro et al. [48] 2022 Portugal 6922 RGB images distributed between 12 classes acquired in-field Leaves Xception 0.92 (F1)
Carneiro et al. [49] 2022 Portugal 6922 RGB images distributed between 12 classes acquired in-field Leaves Vision Transformer (ViT_B) 0.96 (F1)
Koklu et al. [7] 2022 Turkey 2500 RGB images distributed between 5 classes acquired in controlled environment Leaves MobileNetV2 + SVM 97.60 (Acc)
Carneiro et al. [50] 2021 Portugal 6922 RGB images distributed between 12 classes acquired in-field Leaves Xception 0.93 (F1)
Liu et al. [51] 2021 China 5091 RGB images distributed between 21 classes acquired in-field Leaves GoogLeNet 99.91 (Acc)
Škrabánek et al. [52] 2021 Czech Republic 7200 RGB images distributed between 7 classes acquired in-field Fruits DenseNet 98.00 (Acc)
Nasiri et al. [53] 2021 Iran 300 RGB images distributed between 6 classes acquired in a controlled environment Leaves VGG-16 99.00 (Acc)
Peng et al.[54] 2021 Brazil 300 RGB images distributed between 6 classes acquired in-field Fruits Fused Deep Features 96.80 (F1)
Franczyk et al. [55] 2020 Brazil 3957 RGB images distributed between 5 classes acquired in-field Fruits ResNet 99.00 (Acc)
Fernandes et al. [56] 2019 Portugal 35933 Spectra distributed between 64 classes in-field Leaves Handcraft and SVM 0.98 (AUC)
Adão et al. [57] 2019 Portugal 3120 RGB images distributed between 6 classes acquired in controlled environment Leaves Xception 100.00 (Acc)
Pereira et al.  [58] 2019 Portugal 224 RGB images distributed between 6 classes acquired in controlled environment Leaves AlexNet 77.30 (Acc)
`
Table 5. Datasets characteristics for each selected study that used ML.
Table 5. Datasets characteristics for each selected study that used ML.
Study Acquisition Device Publicly Available Acquisition Period D. Augmentation Acquisition Environment
Abassi and Jalal [30] Camera – Prosilica GT2000C Yes - No Special Illumination box
Garcia et al. [31] Smartphone No - No Controlled Environment
Xu et al. [32] Spectrograph – ImSpectorV10 No 1 Day No Box with defined distance
Landa et al. [33] Microscope – Nikon SMZ25 No 1 Day No Special Illumination box covered in aluminium foil
Marques et al. [34] Camera – Canon 600D No Season 2017 No Controlled Environment
Gutierrez et al. [35] Hyperspectral Imaging Camera – Resonon Pika L No Two days No In field using a vehicle at 5 km/h
Fuentes et al. [36] Scanner – Hewlett Packard Scanjet G3010; Spectrometer – Ocean Optics HR2000+ No - No Controlled Environment
Table 6. Datasets characteristics for each selected study that used DL published between 2023 and 2024.
Table 6. Datasets characteristics for each selected study that used DL published between 2023 and 2024.
Study Acquisition Device Publicly Available Acquisition Period D. Augmentation Acquisition Environment
De Nart et al. [37] Mixed Camera and Smartphone Yes Seasons 2020 and 2021 Flips, rotation, scale, CutMix and Zoom In field and Controlled Environment
Kunduracioglu and Pacal [38] Camera - Prosilica GT2000C Yes - Flips, rotation, scale, CutMix and Zoom Special illumination box
Rajab et al. [39] Camera - Prosilica GT2000C Yes - - Special illumination box
Doğan et al. [40] Camera - Prosilica GT2000C Yes - Static augmentations and artificially generated images Special illumination box
Sun et al. [41] Camera - Prosilica GT2000C Yes - Rotations, flips and scale Special illumination box
Lv [42] Camera - Prosilica GT2000C Yes - Random erasing, zoom, scale and Gaussian noise Special illumination box
Magalhães et al. [43] Kyocera TASKalfa 2552ci No 1 day in June 2021 Blur, rotations, variations in brightness, horizontal flips, and gaussian noise Special illumination box
Carneiro et al. [44] Smartphones No Seasons 2021 and 2020 Static augmentations, CutMix and RandAugment In-field
Carneiro et al. [45] Mixed Camera and Smartphone No 1 Season and 2 Seasons Rotations, Flips and Zoom In-field
Gupta and Gill [46] Camera - Prosilica GT2000C Yes - Angle, scaling factor, translation Special illumination box
Table 7. Datasets characteristics for each selected study that used DL published between 2018 and 2022.
Table 7. Datasets characteristics for each selected study that used DL published between 2018 and 2022.
Study Acquisition Device Publicly Available Acquisition Period D. Augmentation Acquisition Environment
Ahmed et al. [60] Camera - Prosilica GT2000C Yes - Flip, rotation, sharpen, variation in brightness Special illumination box
Carneiro et al. [47] Mixed Camera and Smartphone No Seasons 2017 and 2020 Rotation, shift, flip, and brightness changes In-field
Carneiro et al. [48] Camera - Canon EOS 600D No - Rotations, shifts, variations in brightness and flips In-field
Carneiro et al. [49] Camera - Canon EOS 600D No 1 Season Rotations, shifts, variations in brightness and flips In-field
Koklu et al. [7] Camera - Prosilica GT2000C Yes - Angle, scaling factor, translation Special illumination box
Carneiro et al. [50] Camera - Canon EOS 600D No 1 Season Rotations, shifts, variations in brightness and flips In-field
Liu et al. [51] Camera - Canon EOS 70D No - Scaling, transposing, rotation and flips In-field
Škrabánek et al. [52] Camera – Canon EOS 100D and Canon EOS 1100D No 2 days in August 2015 - In-field
Nasiri et al. [53] Camera – Canon SX260 HS On request 1 day in July 2018 Rotation, height and width shift Capture station with artificial light
Peng et al.[54] Mixed Camera and Smartphone Yes 1 day in April 2017 and 1 day in April 2018 - In-field
Franczyk et al. [55] Mixed Camera and Smartphone Yes 1 day in April 2017 and 1 day in April 2018 - In-field
Fernandes et al. [56] Spectrometer – OceanOptics Flame-S No 4 days in July 2017 - In-field
Adão et al. [57] Camera - Canon EOS 600D No Season 2017 Rotations, contrasts/brightness, vertical/horizontal mirroring, and scale variations Controlled environment with white background
Pereira et al. [58] Mixed Camera and Smartphone No Seasons 2016 and 2019 Translation, reflection, rotation In-field
Table 8. List of publicly available datasets that can be used to classify grapevine varieties.
Table 8. List of publicly available datasets that can be used to classify grapevine varieties.
Dataset Number of Classes Number of images Classes Balanced
Koklu et al. [7] 5 500 Ak, Ala Idris, Bozgüzlü, Dimnit, Nazlı Yes, 100 images per class
Al-khazraji et al. [63] 8 8000 deas al-annz, kamali, halawani, thompson seedless, aswud balad, riasi, frinsi, shdah Yes, 1000 images per class
Santos et al. [61] 6 300 Chardonnay, Cabernet Franc, Cabernet Sauvignon, Sauvignon Blanc, Syrah No
Sozzi et al. [64] 3 312 Glera, Chardonnay, Trebbiano No
Seng et al. [65] 15 2078 Merlot, Cabernet Sauvignon, Saint Macaire, Flame Seedless, Viognier, Ruby Seedless, Riesling, Muscat Hamburg, Purple Cornichon, Sultana, Sauvignon Blanc, Chardonnay No
Vlah [62] 11 1009 Auxerrois, Cabernet Franc, Cabernet Sauvignon, Chardonnay, Merlot, Müller Thurgau, Pinot Noir, Riesling, Sauvignon Blanc, Syrah, Tempranillo No
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated