Version 1
: Received: 18 April 2024 / Approved: 19 April 2024 / Online: 22 April 2024 (12:21:52 CEST)
How to cite:
Pan, S.-T.; Wu, H.-J. Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints2024, 2024041307. https://doi.org/10.20944/preprints202404.1307.v1
Pan, S.-T.; Wu, H.-J. Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints 2024, 2024041307. https://doi.org/10.20944/preprints202404.1307.v1
Pan, S.-T.; Wu, H.-J. Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints2024, 2024041307. https://doi.org/10.20944/preprints202404.1307.v1
APA Style
Pan, S. T., & Wu, H. J. (2024). Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip. Preprints. https://doi.org/10.20944/preprints202404.1307.v1
Chicago/Turabian Style
Pan, S. and Han-Jui Wu. 2024 "Emotion Detection Based on Consecutive Facial by Combining CNN and LSTM Implemented on FPGA Chip" Preprints. https://doi.org/10.20944/preprints202404.1307.v1
Abstract
This paper proposed emotion recognition methods for consecutive facial images, and implements the inference of neural network model on a field-programmable gate array (FPGA). The proposed emotion recognition methods are based on a neural network model architecture that combines convolution neural networks (CNNs), long short-term memory (LSTM), and fully connected neural networks (FCNNs), called CLDNN or ConvLSTM-FCN. This type of neural network mod-el can analyze the local feature sequences obtained through convolution of data, making it suita-ble for processing time-series data such as consecutive facial images. In this paper, sequences of facial images are sampled from videos corresponding to various emotional state of subjects. The sampled images are then pre-processed with the processes includes facial detection, greyscale conversion, resize, and data augmentation if necessary. The 2-D CNN in ConvLSTM-FCN is used for feature extraction for these pre-processed facial images. These sequences of facial im-age features are time sequences with dependent properties between the elements within them. The LSTM is then used to model these time sequences followed by fully connected neural net-works (FCNNs) for classification. The proposed consecutive facial emotion recognition method achieves an average recognition rate of 99.51% on RAVDESS dataset, 87.80% on BAUM-1s dataset and 96.82% on eNTERFACE’05 data set, using 10-fold cross-validation on the PC. Some com-parisons of recognition accuracies between the proposed method and the other existing related works are conducted in this paper. According to the comparisons, the proposed emotion recog-nition methods outperform the existing related researches. The proposed emotion recognition methods are then implemented on an FPGA chip using the neural network model inference algo-rithms in this paper, and the accuracies of the experiments conducted on the FPGA chip are iden-tical to those obtained on the PC. This verifies that the proposed neural network model imple-mented on the FPGA chip performs well.
Computer Science and Mathematics, Signal Processing
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.