Preprint Essay Version 1 This version is not peer-reviewed

IV-YOLO: A Lightweight Dual-Branch Object Detection Network

Version 1 : Received: 28 August 2024 / Approved: 28 August 2024 / Online: 28 August 2024 (12:30:54 CEST)

How to cite: Yan, X.; Tian, D.; Zhou, D.; Wang, C.; Zhang, W. IV-YOLO: A Lightweight Dual-Branch Object Detection Network. Preprints 2024, 2024082054. https://doi.org/10.20944/preprints202408.2054.v1 Yan, X.; Tian, D.; Zhou, D.; Wang, C.; Zhang, W. IV-YOLO: A Lightweight Dual-Branch Object Detection Network. Preprints 2024, 2024082054. https://doi.org/10.20944/preprints202408.2054.v1

Abstract

With the increasing demand for security surveillance, assisted driving, and remote sensing, target detection networks with rich environmental perception capabilities and high detection accuracy have become a research hotspot. However, detection techniques based on single-modality images face limitations in environmental adaptability, as they are easily affected by lighting conditions, as well as obstacles such as smoke, rain, and vegetation, leading to information loss and failing to meet the required accuracy. To address these issues, we propose a target detection network—IV-YOLO, which integrates features from both visible and infrared images. We built a dual-branch fusion network based on YOLOv8 (You Only Look Once v8) that combines infrared and visible images for target detection. On this foundation, we designed a Bidirectional Pyramid Feature Fusion structure (Bi-Fusion) to fully integrate complementary features from different modalities, reducing detection errors caused by feature redundancy and extracting fine-grained features of small targets through dual-modal fusion. Additionally, we developed the Shuffle-SPP structure, which combines channel and spatial attention mechanisms to enhance focus on deep features and further aggregate upsampling, extracting richer features. To improve the model’s expressive power, we designed a loss function tailored for multi-scale target detection boxes, accelerating the network’s convergence during training. Experimental results show that the proposed IV-YOLO network achieves an average precision (mAP) of 74.6% on the UAV remote sensing dataset, 75.4% on the KAIST pedestrian dataset, and 84.8% on the FLIR dataset. While ensuring detection accuracy, our model significantly reduces computational load, making it well-suited to meet real-time requirements. This architecture is expected to find broad application in fields such as autonomous driving, public safety, and others.

Keywords

Dual-branch image object detection; IV-YOLO; Bi-directional pyramid feature fusion; Attention mechanism; Small target detection

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.