Article
Version 1
Preserved in Portico This version is not peer-reviewed
An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring
Version 1
: Received: 27 March 2024 / Approved: 29 March 2024 / Online: 1 April 2024 (09:49:59 CEST)
A peer-reviewed article of this Preprint also exists.
Qin, Y.; Ye, O.; Fu, Y. An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring. Electronics 2024, 13, 1522. Qin, Y.; Ye, O.; Fu, Y. An Automatic Near-Duplicate Video Data Cleaning Method Based on a Consistent Feature Hash Ring. Electronics 2024, 13, 1522.
Abstract
In recent decades, with the ever-growing scale of video data, near-duplicate videos continue to emerge. Data quality issue caused by near-duplicate videos is becoming more and more prominent, which has affected the application of normal videos. Although current studies on near-duplicate video detection can be helpful to uncover data quality issues for videos, they still lack a process of automatic merging for the video data represented by high-dimensional features, which are difficult to automatically clean the near-duplicate videos to improve data quality for video datasets. At present, there are few studies on near-duplicate video data cleaning. The existing studies have the sensitive problems of video data orderliness and clustering initial center under a condition that prior distribution is unknown, which seriously affect the accuracy of near-duplicate video data cleaning. To address the above issues, an automatic near-duplicate video data cleaning method based on a consistent feature hash ring is proposed in this paper. First, a residual network with convolutional block attention modules, a long short-term memory deep network, and an attention model are integrated to construct an RCLA deep network with the multi-headed attention mechanism to extract spatiotemporal features of video data. Then, a consistent feature hash ring is constructed, which can effectively alleviate the sensitivity of video data orderliness while providing a condition of near-duplicate video merging. To reduce the sensitivity of the initial cluster centers to results of near-duplicate video cleansing, an optimized feature distance-means clustering algorithm is constructed by utilizing a mountain peak function on a consistent feature hash ring, which can implement automatic cleaning of near-duplicate video data. Finally, experiments are conducted based on a commonly used dataset named CC_WEB_VIDEO and a coal mining video dataset. Compared with some existing works, simulation results demonstrate that the performance of the proposed method.
Keywords
Video cleaning; consistent feature hash ring; feature distance means; mountain peak function; multi-head attention mechanism; near-duplicate videos
Subject
Computer Science and Mathematics, Analysis
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment