Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Evaluation of Deformable Convolution: An Investigation In Image And Video Classification

Version 1 : Received: 30 May 2024 / Approved: 31 May 2024 / Online: 31 May 2024 (10:34:20 CEST)

How to cite: Burgos Madrigal, A.; Romero Bautista, V.; Díaz Hernández, R.; Altamirano Robles, L. Evaluation of Deformable Convolution: An Investigation In Image And Video Classification. Preprints 2024, 2024052124. https://doi.org/10.20944/preprints202405.2124.v1 Burgos Madrigal, A.; Romero Bautista, V.; Díaz Hernández, R.; Altamirano Robles, L. Evaluation of Deformable Convolution: An Investigation In Image And Video Classification. Preprints 2024, 2024052124. https://doi.org/10.20944/preprints202405.2124.v1

Abstract

Convolutional Neural Networks (CNNs) present drawbacks for modeling geometric transformations, such as scaling and rotation, caused by the convolution operation’s locality. Deformable convolution (DCON), a mechanism that substitutes standard convolution, increasing the receptive field to capture relevant features, is a promising approach to solve this drawback and improve the robustness of CNNs. However, the optimal way to replace the standard convolution with its deformable counterpart in a CNN model is unclear. In this study, we clarify this aseveration by conducting several experiments using deformable convolutions applied in the layers that conform a small four-layer CNN model. We also use deformable convolutions on the four-layers of several ResNet CNNs with depths 18, 34, 50, and 101. The models were tested in binary balanced classes with 2D data for image classification: Cats & Dogs, EyesPACS, Spyders & Chickens, and Shapes. After this testing, we evaluated DCON in 3D data for action recognition: UCF101 and Human2 (a dataset we compiled to control movement, clothing and background). The contribution of this research lies in a guideline to use DCON. It can be summarized as follows: if DCON is used on the first layers of the proposal of model (with simple features), the computational resources expressed as the quantity of Flops will tend to increase and produce bigger misclassification than the standard CNN. However, if the DCON is used at the end layers, the quantity of Flops used in the training and testing will decrease, and the classification accuracy will improve by up to 20% about the base model. Moreover, it gains robustness when using deformable convolutions because it can adapt to the region of interest. Also, the best kernel size of the DCON is three. It showed better results than size five. In the last case, the quantity of Flops increase quadratically, but their performance does not increase significantly. With these results, we propose a guideline to use the DCON and contribute to understanding the impact of DCON on the robustness of CNNs.

Keywords

Computer Vision; Image/Video Analysis; Deformable Neural Networks

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.