Preprint Article Version 1 This version is not peer-reviewed

Enhancing Facial Expression Recognition through Light Field Cameras

Version 1 : Received: 6 August 2024 / Approved: 6 August 2024 / Online: 7 August 2024 (09:11:01 CEST)

How to cite: Oucherif, S. D.; Nawaf, M. M.; Boï, J.-M.; Nicod, L.; Mallor, E.; Dubuisson, S.; Merad, D. Enhancing Facial Expression Recognition through Light Field Cameras. Preprints 2024, 2024080486. https://doi.org/10.20944/preprints202408.0486.v1 Oucherif, S. D.; Nawaf, M. M.; Boï, J.-M.; Nicod, L.; Mallor, E.; Dubuisson, S.; Merad, D. Enhancing Facial Expression Recognition through Light Field Cameras. Preprints 2024, 2024080486. https://doi.org/10.20944/preprints202408.0486.v1

Abstract

In this paper, we study facial expression recognition (FER) using three modalities obtained from the Light Field camera; sub-aperture (SA), depth map, and all-in-focus (AiF) images. Our objective is to construct a more comprehensive and effective FER system by investigating multimodal fusion strategies. For this purpose, we employ EfficientNetV2-S, pre-trained on AffectNet, as our primary convolutional neural network. This model, combined with a BiGRU, is used to process SA images. We evaluate various fusion techniques at both decision and feature levels to assess their effectiveness in enhancing FER accuracy. Our findings show that the model using SA images surpasses state-of-the-art performance, achieving 88.13% ± 7.42% accuracy under the subject-specific evaluation protocol and 91.88% ± 3.25% under the subject-independent evaluation protocol. These results highlight our model's potential in enhancing FER accuracy and robustness, outperforming existing methods. Furthermore, our multimodal fusion approach, integrating SA, AiF and depth images, demonstrates substantial improvements over unimodal models. The decision-level fusion strategy, particularly using average weights, proved most effective, achieving 90.13% ± 4.95% accuracy under the subject-specific evaluation protocol and 93.33% ± 4.92% under the subject-independent evaluation protocol. This approach leverages the complementary strengths of each modality, resulting in a more comprehensive and accurate FER system.

Keywords

Light Field Cameras; Facial Expression Recognition; Multimodality

Subject

Computer Science and Mathematics, Computer Vision and Graphics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.