2.1.2. Data Pre-Processing
The higher the quality of the samples and the more accurate the annotations, the better the model parameters learned from the training data [
32,
33]. We selected 50 blurry or overly complex images from 1386 images as the evolutionary experimental set to verify whether the system designed in this paper has real-time evolutionary capability; The remaining 1336 images will be used as the dataset in the experiment of the article.
In the data preprocessing stage, we first manually annotated the images using the LabelImg image annotation tool and saved them in the COCO dataset format. After observing and analyzing all the images, we found that the growth forms of soybean seedlings and the six weed species change during the soybean seedling stage, and the individual plants can be discerned to a large extent. Therefore, when annotating, we used a bounding box for each plant as the target. The annotated categories included soybean seedlings, Purslane, Herb of Common Crabgrass, Copper leaf Herb, Spinegreens,
Chenopodium album L. and
Calystegia hederacea Wall., which were labeled as "soybean", "machixian", "matang", "tiexiancai", "ciercai", "li", and "dawanhua", respectively. The quantities of various categories are shown in
Table 1.
It can be seen that the number of various classes is significantly unbalanced. It should be noted that although quinoa is the main source of grass damage, the number of Chenopodium album L. in the soybean seedling fields studied in this study is small, accounting for only 0.48%.
In addition, since the purpose of this detection model is to be deployed on agricultural intelligent terminals, and the images obtained by their image acquisition systems have high noise, incompleteness, or distortion, the original sample set does not contain images with these characteristics. Data augmentation not only requires effective simulation of actual images, but also the ability to increase the number of weed samples.
Although the sample collection has taken into account the multiple growth states of soybean seedlings and weeds in the natural environment, the lack of image samples and the absence of noisy, incomplete, or distorted images hinder effective training of the identification model. In this article, the DataAug image enhancement algorithm is used to generate images from the original samples to increase the diversity of soybean seedling and weed images, thus increasing the training samples. The principle of the algorithm is to generate images by translating, rotating, mirroring (flipping horizontally, vertically, and diagonally), changing brightness, and adding noise to the original sample images. In order to generate quality training samples, the image enhancement algorithm in this article operates as follows:
Let the channel of the sample image be , and the length and width be and respectively, then the image sample size can be denoted as:. In this study, the image sample size is ;
Let be the set of all integers;
Suppose the top-left coordinate of the bounding box is , and the bottom-right coordinate is . If there are bounding boxes in the image, we have .
Because the x-axis offset , meanwhile, the sample image with the size of 3,024*3,024 is to be compressed to the input image of the recognition model with the size of 640*640, in order to get a good quality polymorphic image, the relative offset of the x-axis , in this article is chosen.
Let the number of horizontal shift operation methods be
, then there is
Similarly, y-axis offset
, the relative offset of the y-axis
, then the number of vertical shift operation methods is
, where
It is worth pointing out that the shift operation cannot be done when there is a target in the upper left-most and lower right-most corners of the image at the same time. The schematic diagram of the panning operation is shown in
Figure 2.
- 2.
Rotation Parameter
Rotation parameter ;
- 3.
Brightness Coefficient
Brightness coefficient ;
- 4.
Image Noise Addition
As can be seen from 2.1.1., the samples in this study are clear images captured by mobile phones between 7:00 a.m. and 11:00 a.m. in summer. The image acquisition system of the mobile phone not only has a high-performance photosensitive device but also carries powerful image preprocessing programs, which effectively remove the additive noise generated during the process of receiving photons and converting them into RAW images by the photosensitive device. These noises mainly include photon noise, darkness noise, read noise, and ADC noise. Since the purpose of this model is to be deployed on agricultural intelligent terminals, the photosensitive device performance of the cameras in these devices is generally poor, and the signal processing module cannot effectively remove noise. Therefore, in order to simulate the characteristics of images captured by cameras deployed on agricultural intelligent terminals, we need to add these noises to the original sample images. This article studies the soybean seedling stage in summer, when agricultural intelligent devices generally operate between 6:00 a.m. and 6:00 p.m., with good environmental illumination. Therefore, the main noise source in the camera comes from photon noise generated by the randomness of photoexcited electrons. The noise characteristics follow a Poisson Distribution, and its function expression is as follows:
Where
is environmental illumination;
is photoelectric conversion efficiency;
is exposure time.
In this study, the exposure time and photoelectric conversion efficiency remain almost constant during the operation of the device, and the Poisson distribution function only needs to consider the impact of environmental illumination. In the area studied, the normal illumination on a sunny summer day is while the illumination on a rainy day is that of a sunny day. Therefore, adjusting the environmental illumination can simulate the image noise during the operation of the device on both rainy and sunny days. When simulating the noise at noon under strong illumination, since the limit of the Poisson distribution is the Gaussian distribution, the noise at this time is simulated using two-dimensional Gaussian noise with a mean of 0.
In the above calculations
and
are obtained with uniform sampling. So the number of samples
obtained by applying the above five operations simultaneously to one original sample is:
Bringing in the above parameters yields
So the number of the images generated from 1,336 original samples .
Therefore in this article, when the attribute need_aug_num of the DataAug function meets the condition of generating quality training samples is satisfied.
An example of data enhancement is shown in
Figure 3. The data preprocessing module obtains five images to simulate the growth of soybean seedlings and weeds under complex natural conditions by applying data enhancement methods such as translation, shear, flip, rotation, and addition noise to the original images. This way, a larger sample dataset can be conveniently obtained in the laboratory.
Through data preprocessing, we obtain the dataset required to train the model; however, the size of the input images (3,024*3,024) does not match the target input size for the recognition module (640*640). Thus, we adopt the adaptive image scaling technique (Letterbox) to transform an input image of arbitrary size into 640*640, which well matches the model input and the target recognition module, thereby making our SWIM model handle more images of various sizes obtained from different devices.
In this article, the original data is first divided into a training set (training set + validation set) and a test set, and the training set is enhanced to obtain the dataset used in this article.