Altmetrics
Downloads
77
Views
51
Comments
0
This version is not peer-reviewed
Submitted:
26 June 2024
Posted:
27 June 2024
You are already at the latest version
1 | Pre-trained embeddings link: https://ai.tencent.com/ailab/nlp/en/data/tencent-ailab-embedding-en-d200-v0.1.0-s.tar.gz
|
Configuration Item | Value |
---|---|
Framework | PyTorch |
Python Version | 3.8.5 |
PyTorch Version | 1.12.1+cu113 |
Transformers Version | 4.33.2 |
Experiment Device | Precision 3930 Rack Workstation |
CPU | 3.3 GHz Intel(R) Xeon(R) E-2124 |
GPU | NVIDIA RTX A4000 × 2 |
Memory | 64GB |
Operating System | 64-bit Windows 10 Professional Workstation |
Parameter Name in English | Value | Parameter Name and Description in Chinese |
---|---|---|
Attention Heads | 4 | Number of attention heads: This refers to the number of heads in multi-head self-attention mechanisms, allowing the model to focus on different parts of input sequences at different positions, enhancing its ability to capture contextual information. |
Batch Size | 32 | Batch size: The number of data samples used in each iteration to update the model weights. Larger batch sizes can accelerate training but may require more memory. |
Dropout Ratio | 0.2 | Dropout rate: The probability of randomly dropping neurons during training, used to prevent overfitting and enhance model generalization capability. |
Embedding Size d | 200 | Embedding dimension d: Dimensionality of embedding vectors used to represent input data, mapping discrete features into continuous vector spaces, commonly employed in natural language processing and recommendation systems. |
Hidden State Size | 32 | Hidden state dimension : Dimension of hidden states in Transformer models, representing abstract representations of inputs at each position, influencing model representational capacity. |
Learning Rate | 5e-5 | Learning rate: Controls the rate at which model parameters are updated during training, crucial for balancing training speed and performance, often requiring adjustment. |
Max Seq Length | 128 | Maximum sequence length: Maximum allowable length of input sequences. Input sequences longer than this may need to be truncated or padded to fit the model. |
Transformer Encoder Layers | 6 | Transformer encoder layers: Number of encoder layers in the Transformer model, determining model complexity and hierarchical performance. |
Weight Decay | 0.01 | Weight decay: Regularization technique to mitigate model overfitting and improve generalization by penalizing large weights. |
Warmup Ratio | 0.1 | Warmup ratio: Proportion of total training epochs dedicated to gradually increasing the learning rate, typically used to stabilize training in the initial phase. |
Method | CADD | DOOR | ||||
---|---|---|---|---|---|---|
P | R | F | P | R | F | |
Lattice-LSTM[14] (2018) | 89.47 | 89.11 | 89.29 | 90.55 | 90.19 | 90.37 |
SoftLexicon-BERT [15] (2020) | 90.65 | 90.47 | 90.56 | 91.88 | 91.72 | 91.80 |
ALBERT-BiGRU-CRF[26](2020) | 90.63 | 90.55 | 90.59 | 92.07 | 91.80 | 91.93 |
BiGRU-CRF[27](2021) | 92.46 | 92.68 | 92.57 | 93.84 | 93.78 | 93.81 |
LEBERT[20](2021) | 93.25 | 92.43 | 92.84 | 94.08 | 93.63 | 93.85 |
GeoBERT[28] (2021) | 94.46 | 91.29 | 92.85 | 95.43 | 92.45 | 93.92 |
BERT-BiGRU-CRF[29](2022) | 94.90 | 91.79 | 93.32 | 95.28 | 93.56 | 94.41 |
BABERT[4] (2022) | 94.53 | 91.67 | 93.08 | 94.40 | 93.58 | 93.99 |
RoBERTa-BiLSTM-CRF[30](2022) | 93.57 | 92.65 | 93.11 | 94.50 | 94.13 | 94.31 |
MGeo[5] (2023) | 93.52 | 92.82 | 93.17 | 94.48 | 94.28 | 94.38 |
ASSPM (Our) | 94.33 | 93.65 | 93.99 | 95.52 | 94.31 | 94.91 |
Ablation method | Dateset | |||||
---|---|---|---|---|---|---|
CADD | DOOR | |||||
P | R | F | P | R | F | |
Baseline (ASSPM) | 94.33 | 93.65 | 93.99 | 95.52 | 94.31 | 94.91 |
A-ESL w/o EBMSoftLexicon | 1.17↓ | 1.53↓ | 1.35↓ | 1.16↓ | 2.05↓ | 1.61↓ |
A-SSLEBERT w/o SSLEBERT | 0.87↓ | 1.23↓ | 1.05↓ | 0.87↓ | 1.77↓ | 1.33↓ |
A-S w/o S | 0.77↓ | 1.13↓ | 0.95↓ | 0.71↓ | 1.61↓ | 1.17↓ |
A-BiGRU w/o BiGRU | 0.63↓ | 0.99↓ | 0.81↓ | 0.55↓ | 1.45↓ | 1.01↓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 MDPI (Basel, Switzerland) unless otherwise stated