Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image

Version 1 : Received: 10 September 2024 / Approved: 11 September 2024 / Online: 11 September 2024 (10:34:23 CEST)

How to cite: Ren, Y.; Zhou, M.; Zhou, P.; Wang, S.; Liu, Y.; Geng, G.; Li, K.; Cao, X. Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image. Preprints 2024, 2024090871. https://doi.org/10.20944/preprints202409.0871.v1 Ren, Y.; Zhou, M.; Zhou, P.; Wang, S.; Liu, Y.; Geng, G.; Li, K.; Cao, X. Enhanced Multi-Scale Attention-Driven 3D Human Reconstruction from Single Image. Preprints 2024, 2024090871. https://doi.org/10.20944/preprints202409.0871.v1

Abstract

Due to the inherent limitations of a single viewpoint, reconstructing 3D human meshes 1 from a single image has long been a challenging task. While deep learning networks enable us to 2 approximate the shape of unseen sides, capturing the texture details of the non-visible side remains 3 difficult with just one image. Traditional methods utilize Generative Adversarial Networks (GANs) 4 to predict the normal maps of the non-visible side, thereby inferring detailed textures and wrinkles 5 on the model’s surface. However, we have identified challenges with existing normal prediction 6 networks when dealing with complex scenes, such as a lack of focus on local features and insufficient 7 modeling of spatial relationships.To address these challenges, we introduce EMAR—Enhanced 8 Multi-scale Attention-driven Single Image 3D Human Reconstruction. This approach incorporates 9 a novel Enhanced Multi-Scale Attention (EMSA) mechanism, which excels at capturing intricate 10 features and global relationships in complex scenes. EMSA surpasses traditional single-scale attention 11 mechanisms by adaptively adjusting the weights between features, enabling the network to more 12 effectively leverage information across various scales.Furthermore, we have improved the feature 13 fusion method to better integrate representations from different scales. This enhanced feature fusion 14 allows the network to more comprehensively understand both fine details and global structures 15 within the image. Finally, we have designed a hybrid loss function tailored to the introduced attention 16 mechanism and feature fusion method, optimizing the network’s training process and enhancing the 17 quality of reconstruction results.Our network demonstrates significant improvements in performance 18 for 3D human model reconstruction. Experimental results show that our method exhibits greater 19 robustness to challenging poses compared to traditional single-scale approaches.

Keywords

enhancing multi-scale attention; single image; normal map; human reconstruction

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.