1. Summary
Wheat is one of the significant nutrients in life. According to the United Nations, at least 20% of all the calories that individuals consume come from wheat [
1]. The majority of wheat is normally turned into flour, which is then used to create a variety of foods, such as pasta, bread, cookies, and noodles as well as both sweet and savory snack foods and crackers too [
2]. Although wheat may be grown across Türkiye, the Central Anatolia Region produces most of it, followed by The Marmara Region, Southeastern Anatolia Region, Aegean, and Eastern Anatolia respectively. Konya is in first place with a total share of 10.7% in grain output which equals 1.686.326 tons of wheat [
3,
4]. Other than Türkiye, wheat takes up a significantly huge space in the world. China comes first in wheat production in the world and India is in second place.
The first time sunn pest appeared was in 1927 and the epidemic continues periodically till today. It is one of the crucial detrimental in agriculture which is the main source of damage in wheat grains that affects the quality significantly [
5,
6]. Sunn pest has different types of damage which linked to their density, biological eras, type of crop, and climate changes. The most important type of damage is the suction in the grain. As a result of this damage, there is no chance to produce either pasta or bread [
7]. In case of no intervention, the damage will be up to 100% in both quality and quantity. In the year 2019, a survey has been made in Türkiye which covers an area of approximately 42 million decares which includes 62 cities. 30 provinces which held just about 9.8 million decares were intervened. Until 2006, the government and the farmers intervened with sprayer planes. However, they noticed it is not sufficient and since then they intervened with the ground tools for exterminating the sunn pests and minimizing the sunn pest damage. The survey displays a contribution of 2.25 billion TRY to the national economy due to the intervened areas which is another indication of how important the intervention is [
7].
Traditional manual grain detection systems have drawbacks that prevent them from being used for large-scale detection of wheat grain, including relatively low efficiency and high costs [
8]. One of the problems mentioned, sunn pest damage, is now detected by a machine called sortex [
9]. By analyzing each grain and categorizing it as either approved or rejected, the optical sorter reduces the likelihood of good grains being mistakenly evacuated. Due to being highly expensive, sortex machine is hard to reach. If the given problem can be solved then, the need for the sortex machine becomes redundant. In case of resolution of the problem with Machine Learning (ML) solutions such as classification, segmentation, or detection by using our dataset [
10], we can avoid the high cost.
When the literature was examined it is seen that there are datasets that address different problems. However, it is observed that there is not much open-source formal data available for use in articles other than [
11]. The dataset is displayed in CSV format which presents 7 different features of wheat; area of wheat, perimeter of wheat, compactness, length of the kernel, width of the kernel, asymmetry coefficient, and kernel groove length. This dataset was created for classification problems.
On the other hand, there are other publicly available datasets presented in open sources. The main purpose of the datasets is classification and detection. To start with [
12] displays 3 different classes which are broken, normal, and weeviled. This dataset can be used for both detection and classification problems. There is also another dataset [
13] which displays 3 different classes. The datasets’ presented classes are bad seed, healthy, and impurity. It is suitable for classification problems. Except that there are other datasets that can be used for classification [
14]. In this dataset there are 10 different classes which are blacktip, broken, chalky, damaged, dehusked, healthy, immature, inorganic foreign material, organic foreign material, and red grain. Due to its various classes this dataset can be suitable for many classification problems. Other than classification there are available datasets for wheat detection [
15]. In this dataset, they present wheat data for detecting the wheat itself. All these datasets are annotated and ready for training with YOLO and other models like that. Despite the annotated datasets, there is also a dataset that is not labeled but available [
16]. In this dataset, there are images of wheat grains that are dispersed randomly. This data is suitable for classification, detection, and segmentation.
In addition to open-source datasets, there are also datasets that are utilized in research but not publicly available. These datasets are used in studies for ML and Deep Learning (DL) approaches. It is seen that in a few studies researchers prefer Convolutional Neural Network (CNN) [
17,
18,
19]. Other than CNN, it is observed that other DL algorithms such as ANN are used too [
20,
21]. Also, BiLSTM [
22] is used for classification with an accuracy score of 99.50%. From these studies, we see that although DL algorithms are used very often, traditional ML algorithms also produce good results when adapted to the problem. Agarwal et al. [
23] utilized not one but three different ML algorithms; Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naive Bayes (NB). SVM was used once again [
24] and achieved an accuracy score of 100%. Khatri et al. [
25] used multiple ML algorithms which are KNN, CART, Ensemble, and NB. Also, there is a case where both ML and DL are used together as well as separately which is [
26]. This paper displays both CNN and SVM algorithms.
After reviewing the research, to our best knowledge, it is clear that publicly available formal datasets are necessary. Our contributions to the literature include a publicly available formal dataset, secondly, a multivariate dataset that contains 6 different species of wheat, in addition to species, the promised dataset includes both sunn pest damaged and healthy wheat grains. Finally, our dataset was displayed as a pile and not one by one. As a result, when these contributions are combined with DL and ML algorithms, it can help decrease the cost of eliminating the damaged grains which leads factories to spend less money to produce bread, pasta, and other bakery products [
27].
2. Data Description
Today, about 5000 different wheat species are grown around the world. These wheat species appear differently in shape, color, and texture. For example, while the grain appearance of the Müfitbey wheat species is white, the grain is redder in the Bezostaja wheat species. In this dataset, it’s highlighted that wheat grains differ not only in shape and size but also texture based because of the sunn pest damage. Enrichment of the dataset with various wheat species is important for the aforementioned reasons. In this dataset, 6 different wheat species were used: Bezostaja, Müfitbey, Nacibey, Sönmez-2001, Tosunbey, and Ekiz. Wheat samples were taken from Eskişehir, which has a considerable position in Türkiye’s wheat production, in order to better examine the effect of sunn pest by accessing various species.
When the literature was examined, it was seen that the datasets containing wheat grains were not sufficiently diverse in terms of content. Datasets consisting of individually positioned wheat grains where environmental conditions are synthetic, such as a fully black background, do not perform well in real-life problems. Because it is obvious that the wheat grains displayed on the production line in real life will differ from these data in terms of many parameters such as light, angle, and positions. Since our dataset is created by considering all these incompatibilities, it offers developers the opportunity to work with data that is compatible with real-life conditions. The wheat grains in each image of the dataset belong to the same condition, meaning that if an image contains 30 wheat grains, all of them are either healthy or damaged. To comprehensively examine the various types of damage caused by the sunn pest, all of the damaged grains are included in the dataset.
Developing applications on low-quality images is one of the major challenges in ML problems. In studies that focus on damaged wheat grains, it is very important that the image quality is high in order to observe the damage to the sunn pest or other damage factors in these wheat grains. Because while sunn pest damage in wheat is sometimes observed with morphological changes, it may appear as very small suction points in some wheat grains. In this dataset, high-quality images were taken to better understand the damage of sunn pests such as small suction points on wheat grains. The height and width pixel values of 170 images in the dataset are 5184 and 2920 respectively and the average file size of each image is 9.56 MB. These pixel values allow developers to examine grains immensely and perform high-performance ML models. While damaged wheat grains images take up 724.7 MB of space, healthy wheat images take up 901.7 MB and the high-quality images in the dataset take up a total of 1626.4 MB.
2.1. Directory Structure
The 6 species and the damage condition of the obtained images were named in
Figure 1 using the two-digit numbering technique so that they can be read systematically. The first part of the naming of the images is to give a numbered form of which species they belong to. For example, naming the images of
bezostaja, mufitbey, nacibey, sonmez-2001, tosunbey and
ekiz species start with 01, 02, 03, 04, 05, and 06 respectively. After the species part, naming is done according to whether it is damaged or healthy wheat. If the wheat grains in the image are damaged, it is labeled as 1, otherwise, it is labeled as 0. Finally, the sample number concat and forms the image name. According to all these rules mentioned above, the file name of the first image of the “bezostaja” species and in the “damaged” condition will be “01_1_01.png”.
Figure 2 shows the structure of the dataset. First, there is the main folder which consists of the species that are
bezostaja, mufitbey, nacibey, sonmez-2001, tosunbey and
ekiz. In addition, every species has subfolders named damaged and healthy. These subfolders include the healthy and sunn pest damaged wheat grains images of the related species.
3. Image Acquisition
This dataset was collected from the wheat grains harvested between July 1 through July 10, 2022, in Eskişehir, Türkiye. The grains have been snapped with a camera that is Olympus OMD m1 mark 2 with a lens of 60mm macro. The camera has high-resolution mode so it can take a 50MP image. Also, it has 121 phase detection autofocus points and a sensitivity range of ISO 100-25600. The images were taken from a 1-meter height with a right angle and in a room that has a window and a lightbulb with laboratory conditions. The significant factor while capturing is to not cast a shadow from the camera. Also, a white background was used to capture the images.
In the dataset, there are 6 different species which are
bezostaja, mufitbey, nacibey, sonmez-2001, tosunbey and
ekiz. These species were chosen because they are the most popular among Turkish farmers due to their high quality. In the dataset, there are 2502 healthy and 1063 sunn pest damaged grains. As can be seen, there is a huge difference between damaged and healthy grains in terms of number because after harvesting it is observed that damaged grains are lesser than healthy ones. As a result of all these stages, images of 83 sunn pest damaged and 87 healthy wheat images which makes 170 images in total were taken and added to the dataset. The number of images captured for 6 species and sunn pest condition is shown in
Figure 3.
In the proposed dataset the wheat grains are organized in three different directions: horizontal, vertical, and diagonal, as can be seen in
Figure 4. The wheat grains are also piled one on top of the other. Due to the grains’ random dispersion, which led to their intertwining throughout dispersion, the dataset is more applicable to real-life circumstances.
3.1. Wheat Grains
The dataset includes 6 different species of wheat;
bezostaja, mufitbey, nacibey, sonmez-2001, tosunbey and
ekiz. Each of these species is divided into two conditions; damaged or healthy. Wheat grains differ in various parameters such as width, length, color, stain condition, and wrinkled texture.
Figure 5 shows the healthy and sunn pest damaged wheat grains. While
Figure 5 displays healthy grains,
Figure 5 shows damaged grains. When taking the photos of the wheat grains that were damaged by the sunn pest, attention was paid to showing the damaged area in the image.
In segmentation developments, the fact that the object to be segmented and the background are in similar colors is a difficult situation that can be encountered in real-life problems. An example of this is the thresholding method, which tries to separate the object from the background by using the difference between the pixel brightness values in the image and a certain threshold value. In real life, a white or light-toned background makes it difficult to distinguish wheat on the production line from the background or may produce low-accuracy results. Another subject that should be considered is, wheat grains do not show up with a tidy dispersion in the production line, and half of some wheat grains and a very small part of others can be seen in the frame. For all these reasons, we chose to create a natural environment for the background, rather than synthetically segmenting wheat and placing it on a black background in our dataset. In addition, the wheat grains are scattered on the table. However, the photos were taken as randomly as possible in the diagonal, horizontal, or vertical directions, showing all or part of them.
Table 1 shows the distribution of wheat grain count by species. This analysis was done by manually counting the wheat in each image one by one. Wheat, which is partially visible in the image, is also included while counting. As can be seen in
Table 1, the distribution of wheat numbers in the images of damaged and healthy wheat classes is uneven. The reason for this is that in a cluster of wheat grains taken from the same species, sunn pest damaged wheat grain is less common. In addition, the number of wheat grains belonging to the Ekiz species is quite high compared to other wheat grains. While shooting, 2 different damaged wheat clusters were used according to the 3% and 1.5% sucking of the sunn insect on the wheat. For this reason, the total wheat number of Ekiz species is about 2 times of other species.
3.2. Color Channel Analysis of Images
Understanding the features of images is significant for achieving high accuracy scores in computer vision development. In colored images, each pixel can be represented by a vector of three numbers for each of the three primary color channels ranging from 0 to 255. These RGB channel values are used together to decide the color of that pixel.
RGB channel analysis, which examines the distribution of an image’s red, green, and blue channels, plays a significant role in this regard and provides valuable insights into the color distribution, tones, and brightness of the image. Furthermore, RGB channel analysis can aid in object recognition by distinguishing between objects and backgrounds in the image.
The RGB values of each pixel in healthy or damaged images of wheat grains can be used to segment the image into regions of interest, which can assist in detecting sunn pest damaged areas. By thresholding the RGB values, the damaged regions can be separated from the undamaged regions. The threshold values can be adjusted to increase the sensitivity of the segmentation process and to reduce false positives. In addition to segmentation, color channel analysis can also aid in feature extraction, which is important in identifying the sunn pest damage on the wheat or classifying wheat species by their textures. By analyzing the color distribution, tone, and brightness of the image, features such as texture, shape, and size of the damaged area can be extracted. These features can then be used to develop ML models for sunn pest damage detection or segmentation.
Figure 6 was created by averaging the RGB channel values of each pixel of all images with
Figure 6 for damaged wheat images and
Figure 6 for the healthy wheat images and presents a comprehensive color analysis of the image. In
Figure 6, the x-axis represents the RGB channel values between 0 and 255, the y-axis represents the value frequency, and each channel in the Figure was shown with its own color. When the color channels of the damaged and healthy wheat grains in the dataset with 170 images were examined, it was seen that the damaged wheat grain had lower RGB channel values than expected due to the darker absorption points created by sunn pest. While the most repetitive RGB values are (80,98,149) in damaged grains, these values are (95,112,129), in healthy grains and red, green channel values are higher than the damaged ones.
4. Conclusions
Sunn pest causes yield loss in food production by reducing the protein content of wheat grains. Although there are machines that detect and sort the damaged grains, it is a solution that is not accessible to everyone in the agricultural sector due to its cost. In this study, a new dataset was presented to the literature by taking a total of 170 samples of healthy and sunn pest damaged wheat grains from 6 different wheat species produced in Türkiye. We believe that; our dataset will make an important contribution to the rapidly increasing artificial intelligence developments in agriculture such as wheat species classification, sunn pest damaged or broken wheat grains detection, and also segmentation. These developments will offer cheap solutions to the problem of the inaccessibility of optical sorters in the industry due to exorbitant prices. As a result, our dataset can help reduce the cost of eliminating damaged grains to gum up factories to spend more money to produce bread, pasta, and other baked goods. The other feature that makes our dataset valuable is the ability to examine the damage caused by the sunn pest in different wheat species. Also, factors such as background color, wheat angle, and images containing broken or half wheat grain made our dataset suitable for real-life conditions. In addition, all the wheat in the images are placed collectively, not separately, because wheat will not appear on the production line of a factory regularly and separately.
In future studies, we aim to enrich the dataset by adding different wheat species and increasing the number of samples. In addition, grains containing various damage variations can be added in the future to expand the usage areas of the dataset and to allow for new developments focusing on different damages in wheat grain.
Author Contributions
Data curation, M.Ç., Ö.Ö. and M.O.; formal analysis, M.Ç.; methodology, M.Ç. and Ö.Ö.; software, M.Ç. and Ö.Ö.; validation, N.P.A., M.Ç., Ö.Ö. and T.T.S.; resources, M.O.; writing—original draft preparation, M.Ç. and Ö.Ö.; writing—review and editing, N.P.A. and T.T.S.; visualization, M.Ç. and Ö.Ö.; supervision, M.O.; project administration, A.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research has received funding from BİTES Defence and Aerospace Technologies.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflict of interest.
References
- GPMNASA. Expert Group- Growing Wheat. https://gpm.nasa.gov/education/sites/default/files/lesson_plan_files/water-for-wheaties/PR_AG_HS_GrowingWheat.pdf. Last accessed 27 January 2023.
- Zingale, S.; Spina, A.; Ingrao, C.; Fallico, B.; Timpanaro, G.; Anastasi, U.; Guarnaccia, P. Factors Affecting the Nutritional, Health, and Technological Quality of Durum Wheat for Pasta-Making: A Systematic Literature Review. Plants 2023, 12, 530. [Google Scholar] [CrossRef] [PubMed]
- Aydoğan, S.; Ünlü, L. Determination of damage Rate of Sunn pests (Eurygaster spp. and Aelia spp.) in Bread Wheat Varieties in Karatay (KONYA) District. Selcuk Journal of Agriculture and Food Sciences 2020, 34, 193–199. [Google Scholar] [CrossRef]
- Gummadov, N.; Keser, M.; Akin, B.; Cakmak, M.; Mert, Z.; Taner, S.; Ozturk, I.; Topal, A.; Yazar, S.; Morgounov, A. Genetic gains in wheat in Turkey: Winter wheat for irrigated conditions. The crop journal 2015, 3, 507–516. [Google Scholar] [CrossRef]
- Mutlu, Ç.; Karaca, V. Management Success Against Sunn Pest In Response to Transition from Aerial Spraying to Controlled Farmer Management: An Overview. In 1st International Gobeklitepe Agriculture Congress Proceedings BookI; Dr Erol, B.; Dr Binici, T.; Dr Sakin, E.; Dr Özmen Özbakır, G.; O, M.I.; Palabıçak, M.A.; Şimşek, E., Eds.; 2019; pp. 518–522.
- Turhal, Ü.Ç.; Turhal, K. Determinination of Efects of Sunn Pest on Wheat Grain by Artificial Neural Networks. Trakya University Journal of Natural Sciences 2014, 15, 25–30. [Google Scholar]
- Dr Barbaroğlu, N.E.; Akci, E.; Çulcu, M.; Yalçın, F. Süne ve Mücadelesi, 1 ed.; Ezgi Ofset Matbaacılık: Adakale Sokak No: 25-12/13 Kızılay / ANKARA, 2020. [Google Scholar]
- Wang, Y.; Wang, Y.; Wang, Z.; Li, R.; Hua, Z.; Zhang, C.; Zhang, Z.; Song, H. Nanodet-Ghost: A Lightweight Network for Quality Detection of Wheat Kernels Appearance. Available at SSRN 411 9166. [CrossRef]
- Nayeem, S.A. Qualitative analysis of City Group’s marketing strategies and CSR activity for Teer 2020.
- Pervan Akman, N.; Çolak, M.; Özkan, Ö.; Tümer Sivri, T.; Berkol, A.; Olgun, M.; Budak Başçiftçi, Z.; Ayter, G.; Sezer, O.; Ardiç, M. Wheat Dataset for Species Classification and Sunn Pest Damage Detection. Mendeley Data, V2, 2023, doi.org/10.17632/gmw48bvxdz.1.
- Wheat Kernels Data Set. https://archive.ics.uci.edu/ml/datasets/Wheat+kernels, 2019. Last accessed 24 February 2023.
- Object Detection_1 Image Dataset. https://universe.roboflow.com/new-workspace-v6q7p/object-detection_1-gsahb/dataset/1, 2021. Last accessed 6 March 2023.
- Wheat Seed Classification Image Dataset. https://universe.roboflow.com/bcd-hhv9y/wheat-seed-classification/browse?queryText=&pageSize=50&startingIndex=0&browseQuery=true, 2023. Last accessed 6 March 2023.
- Fine Tune aa 1 Image Dataset. https://universe.roboflow.com/rice-rwmyq/fine_tune_aa_1/dataset/7, 2023. Last accessed 6 March 2023.
- Seed Image Dataset. https://universe.roboflow.com/meet-patel-g0uqb/seed-t4eor/dataset/1, 2023. Last accessed 6 March 2023.
- Wheat Grain Counting 100 Images. https://www.kaggle.com/datasets/ociule/wheat-grain-counting-100-images, 2019. Last accessed 27 January 2023.
- Shedole, S.; Sowmya, B.; VP, N.A. A Convolution Neural Network-Based Wheat Grain Classification System. Journal of Scientific Research 2022, 66. [Google Scholar] [CrossRef]
- Bernardes, R.C.; De Medeiros, A.; da Silva, L.; Cantoni, L.; Martins, G.F.; Mastrangelo, T.; Novikov, A.; Mastrangelo, C.B. Deep-learning approach for fusarium head blight detection in wheat seeds using low-cost imaging technology. Agriculture 2022, 12, 1801. [Google Scholar] [CrossRef]
- Laabassi, K.; Belarbi, M.A.; Mahmoudi, S.; Mahmoudi, S.A.; Ferhat, K. Wheat varieties identification based on a deep learning approach. Journal of the Saudi Society of Agricultural Sciences 2021, 20, 281–289. [Google Scholar] [CrossRef]
- Sharma, A.; Singh, T.; Garg, N. Combining near-infrared hyperspectral imaging and ANN for varietal classification of wheat seeds. 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT). IEEE, 2022, pp. 1103–1108.
- Kaya, E.; Saritas, İ. Towards a real-time sorting system: Identification of vitreous durum wheat kernels using ANN based on their morphological, colour, wavelet and gaborlet features. Computers and Electronics in Agriculture 2019, 166, 105016. [Google Scholar] [CrossRef]
- Sabanci, K.; Aslan, M.F.; Ropelewska, E.; Unlersen, M.F.; Durdu, A. A novel convolutional-recurrent hybrid network for sunn pest–damaged wheat grain detection. Food Analytical Methods 2022, 15, 1748–1760. [Google Scholar] [CrossRef]
- Agarwal, D.; Sweta. ; Bachan, P. Machine learning approach for the classification of wheat grains. Smart Agricultural Technology 2023, 3, 100136. [Google Scholar] [CrossRef]
- Abbaspour-Gilandeh, Y.; Ghadakchi-Bazaz, H.; Davari, M. Discriminating healthy wheat grains from grains infected with Fusarium graminearum using texture characteristics of image-processing technique, discriminant analysis, and support vector machine methods. Journal of Intelligent Systems 2020, 29, 1576–1586. [Google Scholar] [CrossRef]
- Khatri, A.; Agrawal, S.; Chatterjee, J.M. Wheat seed classification: Utilizing ensemble machine learning approach. Scientific Programming 2022, 2022. [Google Scholar] [CrossRef]
- Unlersen, M.F.; Sonmez, M.E.; Aslan, M.F.; Demir, B.; Aydin, N.; Sabanci, K.; Ropelewska, E. CNN–SVM hybrid model for varietal classification of wheat based on bulk samples. European Food Research and Technology 2022, 248, 2043–2052. [Google Scholar] [CrossRef]
- Tagoe, A.; Hamidu, J.; Donkoh, A.; Achiaa, M. The performance of broiler chicken fed diets containing varying levels of Sortex® rejected rice. Ghanaian Journal of Animal Science, Vol. 11 No.1.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).