1. Introduction
Measuring fish morphological features and observing their behavior are important tasks that have to be carried out daily, in fish cultures. The morphological features include the body dimensions, estimation of the fish mass, eye diameter and color, the gill color as well as malformations in their shape. These features can be used to assess the welfare of the fish, growing conditions, their health and feeding needs. The behavior of the fish can be characterized by its trajectory, its speed, etc, and is an important indication of stress, starvation and health status. These indications are taken into consideration in fish farms to optimize the feeding process, stock control and the proper time for harvest. The measurement of fish dimensions, weight, etc, was performed manually until recently, in an invasive way, by taking sample fish out of the water. This is a manual, time consuming, high cost, inaccurate and harmful for the fish, procedure. Tasks like fish tracking and classification, morphological feature estimation and behavior monitoring are also important in observing the population of various species in rivers or the open sea.
A review of smart aquaculture systems has been presented in [
1] where several processes are described for breeding, nursery to grow out stages of cultured species, preparation of cultured water resource, management of water quality, feed preparation, counting, washing the cultured systems, etc. Another review of computer vision applications for aquaculture is presented in [
2]. These applications include fish and egg counting, size measurement, mass estimation, gender detection, quality inspection, species and stock identification, monitoring of welfare and behavior. The trends in the application of imaging technologies for the inspection of fish products are examined in [
3]. The reviewed image processing approaches are classified by the position of the light source (reflectance, transillumination, transflectance, etc). The applications examined include rigor mortis, tissue and skin color as well as morphology to determine fat, firmness, shape, wounds, blood detection, etc.
Several approaches have been proposed concerning the estimation of fish freshness in a controlled lab environment, based either on sensors or image processing. In [
4], various sensors that have been used in the literature for freshness estimation are reviewed. These sensors include biosensors, electric nose or tongue, colorimetric sensor array, dielectric and various sensor for spectroscopy (nuclear magnetic resonance, Raman, optical, near infrared, fluorescence, etc). Quality management systems have also been proposed for freshness, safety, traceability of products, adopted processes, diseases and authenticity [
5]. In [
6], the freshness of Scomber japonicus (mackerel) stored at a low temperature is assessed from the correlations between the light reflection intensity of mackerel eyes and the volatile basic nitrogen content. The assessment of fish freshness from the color of the eyes is also examined in [
7]. In this approach, a handheld Raspberry PI device is used to classify the freshness of a fish in three categories (extremely fresh, fresh, spoiled) based on pixel counting.
Fish classification and counting from underwater images and videos is another major category where several approaches have been proposed in the literature. In [
8], fish appearing in underwater images are classified in 12 classes based on Fast Regional-Convolutional Neural Networks (Fast R-CNNs). Similarly, in [
9], You-Only-Look-Once (YOLO) [
10] and Gaussian Mixture Models (GMM) [
11] are compared for the classification of 15 species with an accuracy between 40% and 100% (>80% in most cases). Lekunberri et al [
12], count and classify various tuna fish species transferred on conveyor belt with 70% accuracy. Their approach is based on various types of neural networks (Mask R-CNN [
13], ResNet50V2 [
14]) while the size of tuna fish, ranging from 23cm to 62cm, is also measured. Underwater fish recognition is performed in [
15] with an accuracy of 98.64%. Similarly, fish recognition from low resolution images is performed in [
16] with 78% precision.
Morphological feature estimation is often based on active and passive 3D reconstruction techniques. The active techniques are more accurate but require expensive equipment such as Lidars, while passive technique employ lower cost cameras. Challenges of passive 3D reconstruction include the accurate depth estimation from two images that have been retrieved concurrently, occlusions, patterns and saturate areas that may cause confusion. In [
17] a system based on stereo camera is described for accurate fish length estimation as well as fish tracking. A monocular 3D fish reconstruction is presented in [
18] where successive images are used from fish carried on a conveyor belt in order to measure their size. CNNs implemented on Graphical Processing Units (GPUs) are used for foreground segmentation and stereo matching. A median accuracy of less than 5mm can be achieved using an equivalent baseline of 62mm.
In [
19], Facebook’s Detectron2 machine learning (ML) library has been employed for object detection and image preprocessing to generate 22 metadata properties including morphological features of the examined specimens with error rates as low as 1.1%. Otsu threshold is used for segmentation of relatively simple images and pattern matching to locate the eye. If the fish is detected without eye the images are upscaled.
Fish tracking (and classification) can be performed with both optical and sonar imaging as described in [
20]. Using sonar imaging is the only way to monitor fish in night time. In this approach, the Norfair [
21] tracking algorithm in combination with YOLOv4 [
22] are used to track and count fish. The employed sonar equipment, is dual-frequency identification sonar (DIDSON) that exploits higher frequencies and more sub-beams than common hydroacoustic tools. The use of DIDSON has also been described in [
23] for the detection of fish morphology and swimming behavior. In this approach, fish must be large enough and in an adequate distance thus, it is not appropriate for counting small fish. Fish length should preferably be around 68 cm, otherwise an estimation error ranging from 2%-8% was measured for fish with different size (40cm–90cm). In [
24], optical and sonar images are also employed for fish monitoring.
In the system described in this paper, a three-stage approach is followed to detect fish in low quality image frames where fish cannot be easily discriminated from the background. The input image frame is segmented to extract the bounding boxes of the detected fish as separate image patches. In the second stage, each patch is classified to a draft orientation in order to select the corresponding pre-trained ML model that can align a number of landmarks on the fish body. The shape (and potential malformations) of the fish can be recognized in the third stage in order to measure fish dimensions, to classify fish in categories and to map fish body parts of special interest such as the eyes and gill.
In the first stage of the proposed system, the open source, fish detection, deep learning method presented in [
25] was employed. Although detailed instructions are given on how to train a customized fish detection model, the pre-trained one performed very well even with the low resolution images of the developed dataset. Therefore, the pre-trained model was stored on the target platform and was called from an appropriate Python script that was developed for image segmentation. The output of this script is a number of image patches, and each one of these patches contains a single fish. The coordinates of these patches in the original input frame are also extracted. Each one of the image patches is classified in a draft fish orientation category following high speed methods, based on OpenCV [
26] services. The coordinates of the patches can be used to track the movement of the fish in successive frames. The short history of fish positions that have been detected in the first stage can be used to estimate extended bounding boxes for the intermediate positions through interpolation in order to bypass the time consuming fish detection process in some frames.
The extracted patches from each image frame are used as inputs to the last stage of the developed system that performs shape alignment. Aligning a number of landmarks is based on the machine learning (ML) approach called Ensemble of Regression Trees (ERT) presented by Kazemi and Sullivan in [
27] and is exploited in popular image processing libraries such as DLIB [
28] and Deformable Shape Tracking (DEST) [
29]. The DEST library has been exploited in our previous work [
30] for driver drowsiness applications. The source code of the DEST library was ported to Ubuntu and Xilinx Vitis environments to support hardware acceleration of the shape alignment process on embedded targets. The DEST library has also been ported to MS Visual Studio 2019 environment and this version is adapted in this work for fish shape alignment.
Previous work on fish morphological feature measurement has also been presented by one of the authors in [
31] but it concerned different fish species and employed different approaches for image segmentation, pattern matching and contour recognition. The fish contour detection method followed in [
31] were not based on shape alignment and ERT. However, the absolute fish dimension estimation with stereo vision presented in [
31] is also applicable in this work too.
The contribution of the present work can be summarized in the following: a) shape alignment based on ERT models is employed for high precision morphological feature estimation, b) different ERT models are trained for different fish orientations, c) high speed fish orientation detection is performed based on OpenCV services, d) an accurate fish detection method capable of detecting fish in low resolution images is adopted, e) fish tracking is supported and f) hardware acceleration techniques are applicable at the fish detection and shape alignment stages for real time video processing.
This paper is organized as follows. The materials and methods used, are described in
Section 2. More specifically, the dataset, tools and target environment are presented in 2.1. The general architecture of the proposed system is described in 2.2. The employed fish detection and the fish orientation methods are described in 2.3 and 2.4, respectively. The methodology for implementing fish tracking is described in 2.5. The ERT background and the customized DEST package used for fish shape alignment and morphological feature extraction are described in 2.6 and 2.7, respectively. The experimental results are presented in
Section 3. A discussion on the experimental results, follows in
Section 4 and the conclusions are presented in
Section 5.
3. Experimental Results
A test set of P=100 fish photographs have been used to estimate the absolute error in the position of the landmarks compared to the annotation defined as ground truth in the LAE editor. The relative error ε
ri between the estimated landmark
position and its corresponding position
ki in the ground truth annotation is the Euclidean distance between these two positions, expressed in pixels:
If
and
and the image (width, height) is (
w,h), the normalized relative error for landmark
i, (
εni) is:
The standard deviation (SD),
σε in the distribution of the landmark estimation error
εni across all
L landmarks is:
where
με is mean error of
εni i.e.,
The standard deviation in the distribution of estimation error
εnij of a specific landmark
i in
P images (0≤j<P) is:
where
μi is mean error of
εnij i.e.,
Another standard deviation metric (
σP) used is in the average relative error
μεj (see eq. (14)) of all landmarks of an image in all the
P test images:
where
μP is the mean of
μεj:
Table 2, shows the average, minimum and maximum, absolute and relative errors that have appeared in all landmarks and all the
P test images.
The standard deviation
σε limits as well as the
σP deviation of the average error in the
P test images are listed in
Table 3.
The mean error
μi of each landmark
i, along with its standard deviation
σi is plotted in
Figure 11. This plot is of particular interest because it highlights the landmarks that show the highest error.
The error in the relative height and length estimation of the fish in the test set is listed in
Table 4 along with their standard deviations.
Concerning the fish orientation methods described in Subsection 2.4,
Table 5 lists the success rates achieved with each method. The PCA method is capable of recognizing the fish tilt with a very good accuracy (less than ±10
o in more than 95% of the cases). However, the direction that the fish is facing, is recognized with much lower accuracy as shown in
Table 5. COD performs a draft classification in left or right direction, while FOD performs a more detailed orientation classification in quadrants Q0-Q3 with the success rate listed in
Table 5. Eye template matching (TM) is also used to classify the direction in one of the Q0-Q3 quadrants. Then, a number of combinations of these orientation classification methods are tested.
In PCA+COD, the tilt found by PCA is used while COD is used to detect the direction. In PCA+COD (2), COD direction is taken into consideration only if the confidence is above a threshold. In PCA+TM the coarse left-right direction indicated by TM is taken into consideration to decide the direction on the tilt estimated by PCA. If, for example, the tilt is from bottom-left to top-right, then Q1 is selected if TM finds the fish eye in the right quadrants (Q1, Q3). If the fish eye is found in the left quadrants (Q0, Q2), then Q2 is selected. In PCA+TM (2) method the TM direction is considered only if the template matching found the fish eye in one of the quadrants that are compatible with the fish tilt indicated by PCA. For example, if the fish is facing up-right, its caudal fin is in Q2 and its head is in Q1. If the TM method finds the fish eye in Q1 or Q2 then, the direction indicated by TM will be assumed correct. Specifically, with fish eye found by TM in Q1 it will correctly be recognized that the fish is facing up-right while if the fish eye is found in Q2 it will be assumed by mistake that the fish is facing down-left. If the TM finds the fish eye in Q0 or Q3, the direction indicated by TM will not be taken into consideration and only the direction indicated by PCA will be used.
In
Table 6, a comparison can be found between our fish length estimation method and the references that present fish size estimation results.
4. Discussion
From the experimental results presented in the previous section, it was shown that the average relative error in the alignment of a single landmark is 4.8% (corresponding to an absolute error of 17.54 pixels) with the following SDs: σε=0.03, σΡ=0.0215. When estimating specific fish body dimensions, the relative error is 5.4% in the length and 5.5% in the height (with the corresponding SD being 0.049 and 0.062, respectively). Taking into consideration that the length of the fish recognized in the photographs of the dataset ranges from 10cm to 20cm, the average absolute error is in the order of 0.5cm-1cm.
From
Figure 11, more details can be seen about the accuracy in the alignment of individual landmarks. For example, landmarks 7 (top of the caudal fin) and 9 (bottom of the caudal fin) are located with a mean error equal to 6.8% and 7.8%, respectively. These are the highest relative errors measured per landmark. They appear at the landmarks that mark the edge of the caudal fin because in most photographs of the dataset used for training and testing, the caudal fin perimeter is often indistinguishable from the background. Landmarks 1 (mouth) and 8 (middle of the caudal fin), that are used for the estimation of fish length are located with a mean error of 6.8% and 5.6%, respectively. When they are combined to estimate the fish length, the average relative error is 5.4%.
Landmarks 3 and 13 are used to estimate fish height. Their average relative error is 4.8% and 4.6%, lower than the error shown by landmarks 1 and 8 that have been used for length estimation. However, when they are combined to estimate fish height, the average relative error measured is a little higher (5.5%) than that of the fish length (5.4%). Other landmarks of interest are No. 17 and 18 that are used to locate the fish eye ROI. These landmarks are located with an average relative error of 4.5% and 3.6%. Taking into consideration the fish size range mentioned above, this relative error in the fish eye localization is translated to about 0.4cm-0.8cm. Although it is not always safe to assume that the fish eye is within landmarks No. 17 and 18, additional pattern recognition methods can be applied to detect accurately the shape of the eye near the area indicated from these landmarks. Similarly, the gills’ area is another ROI located by landmarks No. 14, 15 and 16. The mean relative errors in the estimation of the position of these landmarks range between 3.8% and 4.7%.
Concerning the fish orientation classification,
Table 5 shows that the best results are achieved with the combination of PCA with TM when the false eye template matching estimations are ignored. PCA is capable of detecting with a high accuracy the tilt of the fish. However, it could detect the direction of the fish in only 44.8% of the times, using the low contrast images of the developed dataset. The COD achieved a higher success rate (67.2%) but can detect only a draft left or right direction. The FOD method could be used to classify the direction of the fish in four quadrants but its classification accuracy is only 43.1%. On the other hand, fish eye template matching has a relatively higher success rate of 63.8%. This could have been even higher, if the resolution and the quality of the dataset images was better because in many fish image patches the eye is not visible at all. In these cases, the eye is confused either with background objects or with other parts of the fish like a strip in the caudal fin of some species like Diplodus annularis. Certain combinations of these orientation detection methods were also tested as shown in
Table 5. Combining PCA with the left or right classification of COD achieved a success rate of 65.5%. The highest accuracy was achieved when the PCA was combined with TM and can reach 79.3%. A much higher orientation accuracy is expected to be achieved if the track of the fish is also taken into consideration at explained in subsection 2.5.
Comparing the fish size estimation methods listed in
Table 6, it is obvious that the proposed fish length or height estimation achieves one of the highest accuracies. In [
17], a slightly lower error (5%) is achieved while in [
23] the error is lower only in some specific fish sizes. However, in most of the cases presented in the literature, fish size is estimated in a controlled environment (e.g., on a conveyor belt) or with high resolution underwater images. Estimating fish size in low contrast and low quality images like the ones of the dataset used, is a much more challenging task.
Summarizing, the developed framework offers a number of useful services for fish monitoring such as morphological feature estimation, fish orientation classification and fish tracking. These services can be useful both for monitoring fish in free waters and aquacultures. The fish detection, orientation classification and shape alignment methods for morphological feature estimation were described in detail. The principles of fish tracking in the developed framework were also discussed.
One of the limitations of the current work is the latency of the fish detection. Specific directions were given to pipeline the fish detection in specific frames with other tasks that can run in parallel threads. These tasks can be the bounding box interpolation in intermediate frames between actual fish detections, the execution of the orientation methods and the shape alignment. Shape alignment is already available with hardware acceleration on embedded platforms. Similarly, the neural network for the inference that performs fish detection can also be implemented in hardware on the same target platform. Developing such an architecture is part of our on-going work in order to achieve a high frame processing speed and real time operation.
Finally, more sophisticated techniques can also be incorporated in the presented fish tracking approach. For example, feedback from the shape tracking stage can be used to identify with higher confidence the fish in successive frames without confusing their positions. The fish orientation can also indicate when the fish is changing direction in its track.
Figure 1.
Fish patches extracted from a 4608 × 3456 resolution photograph (a) and from a 3840 × 2160 resolution video frame (b).
Figure 1.
Fish patches extracted from a 4608 × 3456 resolution photograph (a) and from a 3840 × 2160 resolution video frame (b).
Figure 2.
The architecture of the developed system for fish monitoring and morphological feature extraction.
Figure 2.
The architecture of the developed system for fish monitoring and morphological feature extraction.
Figure 3.
Fish movement in video frames with time distance of 2 seconds.
Figure 3.
Fish movement in video frames with time distance of 2 seconds.
Figure 4.
Fish Detection Interpolation (FPI).
Figure 4.
Fish Detection Interpolation (FPI).
Figure 5.
The Segmentation of the image using OpenCV Otsu threshold and applying PCA with m=2 to get the angles of the eigenvectors with the higher magnitude (black and white line). In both images the tilt denoted by the black line is recognized successfully but the direction in (a) is correct while in (b) is incorrect.
Figure 5.
The Segmentation of the image using OpenCV Otsu threshold and applying PCA with m=2 to get the angles of the eigenvectors with the higher magnitude (black and white line). In both images the tilt denoted by the black line is recognized successfully but the direction in (a) is correct while in (b) is incorrect.
Figure 6.
COD and FOD methods. (a) Q0+Q2>Q1+Q3 (COD: fish facing left), Q0>Q2 (FOD: fish facing Up-Left. (b) Q1+Q3>Q0+Q2 (COD: fish facing right), Q3>Q1 (FOD: fish facing Down-Right).
Figure 6.
COD and FOD methods. (a) Q0+Q2>Q1+Q3 (COD: fish facing left), Q0>Q2 (FOD: fish facing Up-Left. (b) Q1+Q3>Q0+Q2 (COD: fish facing right), Q3>Q1 (FOD: fish facing Down-Right).
Figure 7.
Fish eye template (a), successful matching in (b) and (c), caudal fin confused with fish eye in (d) and background object confused with fish eye in (e).
Figure 7.
Fish eye template (a), successful matching in (b) and (c), caudal fin confused with fish eye in (d) and background object confused with fish eye in (e).
Figure 8.
Monitoring fish in 3 successive frames.
Figure 8.
Monitoring fish in 3 successive frames.
Figure 9.
Fish shape alignment based on 18 landmarks and the developed LAE landmark annotator.
Figure 9.
Fish shape alignment based on 18 landmarks and the developed LAE landmark annotator.
Figure 10.
Fish shape alignment and training procedure.
Figure 10.
Fish shape alignment and training procedure.
Figure 11.
The mean relative error and standard deviation for each one of the 18 landmarks.
Figure 11.
The mean relative error and standard deviation for each one of the 18 landmarks.
Table 1.
Association of the coordinates of the bounding boxes detected in Frames 2 and 3 with the fish detected in Frame 1.
Table 1.
Association of the coordinates of the bounding boxes detected in Frames 2 and 3 with the fish detected in Frame 1.
Fish Frame1 |
Frame1 Y |
Frame1X |
Fish Frame2 |
Frame2 Y |
Frame2 X |
Fish Frame3 |
Frame3 Y |
Frame3 X |
F0 |
186 |
164 |
F0 (12) |
197 |
165 |
F3(12) |
206 |
250 |
F1 |
190 |
63 |
F3(52) |
207 |
238 |
F0(11) |
197 |
176 |
F2 |
231 |
409 |
F1(14) |
203 |
69 |
F1(8) |
211 |
71 |
F3 |
190 |
242 |
F2(14) |
242 |
418 |
F2(23) |
243 |
441 |
F4 |
246 |
353 |
F4(13) |
259 |
355 |
F6(10) |
276 |
474 |
F5 |
230 |
285 |
F6(12) |
280 |
464 |
F1(104) |
117 |
10 |
F6 |
269 |
459 |
F5(21) |
249 |
295 |
F4(10) |
265 |
364 |
F7 |
231 |
254 |
|
|
|
|
|
|
Table 2.
Global absolute and relative errors.
Table 2.
Global absolute and relative errors.
Type of Error |
Error |
Min Absolute Error |
0.36 pixels |
Max Absolute Error |
78.33 pixels |
Average Absolute Error |
17.54 pixels |
Min Relative Error |
0.1% |
Max Relative Error |
33.6% |
Average Relative Error |
4.8% |
Table 3.
Relative error standard deviation σε limits and σP.
Table 3.
Relative error standard deviation σε limits and σP.
Parameter |
Value |
Min σε deviation |
0.0068 |
Max σε deviation |
0.081 |
Average σε deviation |
0.03 |
σP deviation |
0.0215 |
Table 4.
Error and standard deviation in measuring relative fish length and height.
Table 4.
Error and standard deviation in measuring relative fish length and height.
Parameter |
Value |
Fish length error |
5.4% |
Length error deviation |
0.049 |
Fish height error |
5.5% |
Width error deviation |
0.062 |
Table 5.
Success rate in fish direction recognition with PCA, left or right direction classification with COD, classification in Q0-Q3 quadrants with FOD, template matching (TM) and their combinations.
Table 5.
Success rate in fish direction recognition with PCA, left or right direction classification with COD, classification in Q0-Q3 quadrants with FOD, template matching (TM) and their combinations.
|
PCA |
COD |
FOD |
TM |
PCA+COD |
PCA+COD (2) |
PCA+TM |
Success Rate: |
44.8% |
67.2% |
43.1% |
63.8% |
65.5% |
65.5% |
77.6% |
Table 6.
Fish size estimation comparison.
Table 6.
Fish size estimation comparison.
Reference |
Description |
Error |
[12] |
Tuna fish size estimation |
SD: 0.328-0.396 |
[17] |
Fish length estimation |
Error 5% |
[18] |
Fish size estimation |
Error 8% |
[23] |
Fish length estimation |
Error 2%-8% depending on the fish size |
This work |
Fish length estimation |
Error 5.4% SD: 0.049 |