Version 1
: Received: 1 August 2024 / Approved: 2 August 2024 / Online: 2 August 2024 (06:04:36 CEST)
How to cite:
Dave, I.; Gunawardhana, M.; Sadith, L.; Zhou, H.; David, L.; Harari, D.; Shah, M.; Khan, M. Unifying Video Self-Supervised Learning across Families of Tasks: A Survey. Preprints2024, 2024080133. https://doi.org/10.20944/preprints202408.0133.v1
Dave, I.; Gunawardhana, M.; Sadith, L.; Zhou, H.; David, L.; Harari, D.; Shah, M.; Khan, M. Unifying Video Self-Supervised Learning across Families of Tasks: A Survey. Preprints 2024, 2024080133. https://doi.org/10.20944/preprints202408.0133.v1
Dave, I.; Gunawardhana, M.; Sadith, L.; Zhou, H.; David, L.; Harari, D.; Shah, M.; Khan, M. Unifying Video Self-Supervised Learning across Families of Tasks: A Survey. Preprints2024, 2024080133. https://doi.org/10.20944/preprints202408.0133.v1
APA Style
Dave, I., Gunawardhana, M., Sadith, L., Zhou, H., David, L., Harari, D., Shah, M., & Khan, M. (2024). Unifying Video Self-Supervised Learning across Families of Tasks: A Survey. Preprints. https://doi.org/10.20944/preprints202408.0133.v1
Chicago/Turabian Style
Dave, I., Mubarak Shah and Muhammad Khan. 2024 "Unifying Video Self-Supervised Learning across Families of Tasks: A Survey" Preprints. https://doi.org/10.20944/preprints202408.0133.v1
Abstract
Video self-supervised learning (VideoSSL) offers significant potential for reducing annotation costs and enhancing a wide range of downstream tasks in video understanding. The ultimate goal of VideoSSL is to achieve human-level video intelligence across a spectrum of tasks, from low-level tasks such as pixel temporal correspondence to high-level complex spatio-temporal tasks like action recognition. However, most existing VideoSSL methods focus on isolated aspects of this spectrum and fail to integrate different levels of task complexity. Our study presents the first comprehensive survey that connects all families of VideoSSL methods. We provide a detailed review of the full spectrum of VideoSSL, from low to high levels, by conceptually linking their self-supervised learning objectives and including a comprehensive categorization. Our extensive evaluation highlights the strengths and limitations of each SSL objective across various downstream task families. We also detail the challenges in current VideoSSL research such as data curation, interpretability, deployment, and privacy concerns, an area that previous surveys have not thoroughly explored. In addressing these challenges, we recognize the strengths of existing methods in addressing these challenges and outline future directions for research.
Keywords
Video Understanding; Self-Supervised Learning; Representation Learning
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.