Preprint Article Version 3 This version is not peer-reviewed

Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition

Version 1 : Received: 28 January 2024 / Approved: 29 January 2024 / Online: 29 January 2024 (08:19:58 CET)
Version 2 : Received: 18 February 2024 / Approved: 20 February 2024 / Online: 21 February 2024 (03:46:16 CET)
Version 3 : Received: 25 July 2024 / Approved: 25 July 2024 / Online: 26 July 2024 (04:29:00 CEST)

How to cite: Abduljalil, H.; Elhayek, A.; Marish Ali, A.; Alsolami, F. Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition. Preprints 2024, 2024011998. https://doi.org/10.20944/preprints202401.1998.v3 Abduljalil, H.; Elhayek, A.; Marish Ali, A.; Alsolami, F. Spatiotemporal Graph Autoencoder Network for Skeleton-Based Human Action Recognition. Preprints 2024, 2024011998. https://doi.org/10.20944/preprints202401.1998.v3

Abstract

The task of human action recognition (HAR) based on skeleton data is a challenging yet crucial technique owing to its wide-ranging applications in numerous domains, including patient monitoring, security surveillance, and observation of human-machine interactions. While numerous algorithms have been proposed in an attempt to distinguish between a myriad of activities, most practical applications necessitate highly accurate detection of specific activity types. This study proposes a novel and highly accurate spatiotemporal graph autoencoder network for HAR based on skeleton data. Furthermore, an extensive investigation was conducted employing diverse modalities. To this end, a spatiotemporal graph autoencoder was constructed to automatically learn both spatial and temporal patterns from human skeleton datasets. The powerful graph convolutional network, designated as GA-GCN, developed in this study, notably outperforms the majority of existing state-of-the-art methods when evaluated on two common datasets, namely NTU RGB+D and NTU RGB+D 120. On the first dataset, the proposed approach achieved accuracies of 92.3\% and 96.8\% for the cross-subject and cross-view evaluations, respectively. On the more challenging NTU RGB+D 120 dataset, GA-GCN attained accuracies of 88.8\% and 90.4\% for the cross-subject and cross-set evaluations, respectively.

Keywords

graph convolutional networks; graph autoencoder; deep learning; human activity analysis; skeleton-based human action recognition

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.