Altmetrics
Downloads
95
Views
199
Comments
0
A peer-reviewed article of this preprint also exists.
This version is not peer-reviewed
Submitted:
19 February 2024
Posted:
20 February 2024
You are already at the latest version
Video ID |
Name | Short description | Expected emotion |
Duration (sec) |
---|---|---|---|---|
1 | Puppies | Cute puppies running | Happiness | 13 |
2 | Avocado | A toddler holding an avocado | Happiness | 8 |
3 | Runner | Competitive runners supporting a girl from another team over the finish line |
Happiness | 24 |
4 | Maggot | A man eating a maggot | Disgust | 37 |
5 | Raccoon | Man beating raccoon to death | Anger | 16 |
6 | Trump | Donald Trump talking about foreigners | Anger | 52 |
7 | Montain bike | Mountain biker riding down a rock bridge | Surprise | 29 |
8 | Roof run | Runner almost falling of a skyscraper | Surprise | 18 |
9 | Abandoned | Social worker feeding a starved toddler | Sadness | 64 |
10 | Waste | Residents collecting electronic waste in the slums of Accra |
Sadness | 31 |
11 | Dog | Sad dog on the gravestone of his master | Sadness | 11 |
12 | Roof bike | Person biking on top of a skyscraper | Fear | 28 |
13 | Monster | A man discovering a monster through his camera |
Fear | 156 |
14 | Condom ad | Child throwing a tantrum in a supermarket |
Multiple | 38 |
15 | Soldier | Soldiers in battle | Multiple | 35 |
Model Name | Utility | Input | Architecture |
---|---|---|---|
MLP | Baseline | Alternation of ReLu-activated densely connected layers with dropout layers to limit overfitting. |
|
The last layer is a SoftMax activated dense layer of neurons. |
|||
biLSTM | Considers the temporal dependencies of the plant signal |
Two blocks’ model | |
Downsampled plant signal |
1. LSTM Layers embedded in a bidirectional wrapper |
||
2. Alternation of 2 ReLu-activated dense layers with dropout layers. Each dense layer is composed of 1024 and 512 neurons respectively. |
|||
The last layer is a SoftMax activated dense layer of neurons. |
|||
MFCC-CNN | Specialized in 2D or 3D inputs like in multifeatured time-series |
Two blocks’ model. | |
1. Alternation of convolutional layers with max pooling operations. |
|||
2. Alternation of ReLu activated dense layers with dropout layers. |
|||
MFCCs features | The last layer is a SoftMax activated dense layer of neurons |
||
MFCC-ResNet | Pretrained DeepCNN to emphasize the importance of the network depth |
ResNet architecture slightly modified to fit the emotion detection task. The top dense layers used for classification are replaced by a dense layer of 1024 neurons, followed by a dropout layer. |
|
The last layer is a SoftMax activated dense layer of nbemotion neurons |
|||
Random Forest not windowed |
Effective for diverse datasets. Good overall robustness. |
Utilizes an ensemble of decision trees. Parameters include : 300 (number of trees), : 20 (maximum depth of each tree), and : None. This configuration is aimed at handling complex classification tasks, balancing bias and variance. |
|
1-Dimensional CNN not windowed |
Suitable for time-series analysis |
Sequential model with a 1D convolutional layer (64 filters, kernel size of 3, `swish’ activation, input shape of (10000, 1)). Followed by a MaxPooling layer (pool size of 2), a Flatten layer, a Dense layer (100 neurons, `swish’ activation), and an output Dense layer (number of neurons equal to unique classes in `y’, ’softmax’ activation). Compiled with Adam optimizer, `sparse_categorical_crossentropy’ loss, and accuracy metrics.. |
|
Raw plant signal normalized, not windowed |
The last layer is a SoftMax activated dense layer of nbemotion neurons |
||
biLSTM not windowed |
Considers the temporal dependencies of the plant signal |
Sequential model with a Bidirectional LSTM layer (1024 units, return sequences true, input shape based on reshaped training data), followed by another Bidirectional LSTM layer (1024 units). Concludes with a Dense layer (100 neurons, ’swish’ activation) and an output Dense layer (number of neurons equal to unique classes in `y’, ’softmax’ activation). Optimized with Adam (learning rate 0.0003), using loss and accuracy metrics. |
|
The last layer is a SoftMax activated dense layer of nbemotion neurons |
Model Name |
Parameter | Values | Number of configurations |
---|---|---|---|
MLP | Dense Units Dense Layers Dropout Rate Learning Rate Balancing Window Hop |
1024, 4096 2,4 0, 0.2 3e-4, 1e-3 Balance, Weights, None 5, 10, 20 5, 10 |
288 |
biLSTM | LSTM Units LSTM Layers Dropout Rate Learning Rate Balancing Window Hop |
64, 256, 1024 1,2,3 0, 0.2 3e-4, 1e-3 Balance, Weights, None 5, 10, 20 5, 10 |
648 |
MFCC-CNN | Conv Filters Conv Layers Conv Kernel Size Dropout Rate Learning Rate Balancing Window Hop |
64, 128 2,3 3,5,7 0, 0.2 3e-4, 1e-3 Balance, Weights, None 5, 10, 20 5, 10 |
864 |
MFCC-ResNet | Pretrained Number of MFCCs Dropout Rate Learning Rate Balancing Window Hop |
Yes, No 20, 40, 60 0, 0.2 3e-4, 1e-3 Balance, Weights, None 5, 10, 20 5, 10 |
432 |
RF no windowing | Number of estimators Max Depth Balancing |
100, 200, 300, 500, 700 None, 10, 20, 30 Balance, Weights, None |
60 |
1D CNN no windowing | Conv Filters Conv Layers Conv Kernel Size Dropout Rate Learning Rate Balancing |
64, 128 2,3 3,5,7 0, 0.2 3e-4, 1e-3 Balance, Weights, None |
144 |
biLSTM no windowing | LSTM Units LSTM Layers Dropout Rate Learning Rate Balancing |
64, 256, 1024 1,2,3 0, 0.2 3e-4, 1e-3 Balance, Weights, None |
108 |
Model Name | Parameters | Values |
---|---|---|
MLP | Dense Units | 4096 |
Dense Layers | 2 | |
Dropout Rate | 0.2 | |
Learning Rate | 0.001 | |
Balancing | Balanced | |
Window | 20sec | |
Hop | 10sec | |
biLSTM | LSTM Units | 1024 |
LSTM Layers | 2 | |
Dropout Rate | 0 | |
Learning Rate | 0.0003 | |
Balancing | Balanced | |
Window | 20sec | |
Hop | 10sec | |
MFCC-CNN | Conv Filters | 96 |
Conv Layers | 2 | |
Conv Kernel Size | 7 | |
Dropout Rate | 0.2 | |
Learning Rate | 0.0003 | |
Balancing | Balanced | |
Window | 20sec | |
Hop | 10sec | |
MFCC-ResNet | Pretrained | No |
Number of MFCCs | 60 | |
Dropout Rate | 0.2 | |
Learning Rate | 0.001 | |
Balancing | Balanced | |
Window | 20sec | |
Hop | 10sec | |
RF no windowing | Number of estimators | 300 |
Max Depth | 20 | |
Balancing | None | |
1D CNN no windowing | Conv Filters | 96 |
Conv Layers | 2 | |
Conv Kernel Size | 7 | |
Dropout Rate | 0.2 | |
Learning Rate | 0.0003 | |
Balancing | None | |
biLSTM no windowing | LSTM Units | 1024 |
LSTM Layers | 2 | |
Dropout Rate | 0 | |
Learning Rate | 0.0003 | |
Balancing | None |
Model | Test set Accuracy | Test set Recall |
---|---|---|
MLP | 0.399 | 0.220 |
biLSTM | 0.260 | 0.351 |
MFCC-CNN | 0.377 | 0.275 |
MFCC-RestNet | 0.318 | 0.324 |
RF no windowing | 0.552 | 0.552 |
1D CNN no windowing | 0.461 | 0.514 |
biLSTM no windowing | 0.448 | 0.380 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated