Altmetrics
Downloads
231
Views
223
Comments
0
This version is not peer-reviewed
Submitted:
14 June 2024
Posted:
17 June 2024
You are already at the latest version
Hyper-parameter | MLP |
---|---|
Hidden Dimension | 1568 |
# Hidden layers | 3 |
Batch Size | 32 |
Training Epochs | 100 |
LR Decay Method | Linear |
Learning Rate | 0.025 |
Update Interval (for DST) | 1 |
Hyper-parameter | Multi30k | IWSLT14 | WMT17 |
---|---|---|---|
Embedding Dimension | 512 | 512 | 512 |
Feed-forward Dimension | 1024 | 2048 | 2048 |
Batch Size | 1024 tokens | 10240 tokens | 12000 tokens |
Training Steps | 20000 | 20000 | 80000 |
Dropout | 0.1 | 0.1 | 0.1 |
Attention Dropout | 0.1 | 0.1 | 0.1 |
Max Gradient Norm | 0 | 0 | 0 |
Warmup Steps | 3000 | 6000 | 8000 |
Decay Method | inoam | inoam | inoam |
Label Smoothing | 0.1 | 0.1 | 0.1 |
Layer Number | 6 | 6 | 6 |
Head Number | 8 | 8 | 8 |
Learning Rate | 0.25 | 2 | 2 |
Update Interval (for DST) | 200 | 100 | 100 |
Multi30k | IWSLT14 | WMT17 | |
---|---|---|---|
FC | 31.51 | 24.11 | 25.20 |
CHTs | 29.97 | 21.98 | 22.43 |
RigL | 28.52 | 20.73 | 21.20 |
SET | 28.98 | 20.09 | 20.66 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated