Preprint Article Version 1 This version is not peer-reviewed

A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon Enhanced BERT and Contrastive Learning

Version 1 : Received: 22 July 2024 / Approved: 23 July 2024 / Online: 23 July 2024 (16:05:26 CEST)

How to cite: Peng, C.; Wang, X.; Li, Q.; Yu, Q.; Jiang, R.; Ma, W.; Wu, W.; Meng, R.; Li, H.; Huai, H.; Wang, S.; He, L. A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon Enhanced BERT and Contrastive Learning. Preprints 2024, 2024071804. https://doi.org/10.20944/preprints202407.1804.v1 Peng, C.; Wang, X.; Li, Q.; Yu, Q.; Jiang, R.; Ma, W.; Wu, W.; Meng, R.; Li, H.; Huai, H.; Wang, S.; He, L. A New Chinese Named Entity Recognition Method for Pig Disease Domain Based on Lexicon Enhanced BERT and Contrastive Learning. Preprints 2024, 2024071804. https://doi.org/10.20944/preprints202407.1804.v1

Abstract

Named Entity Recognition (NER) is a fundamental and pivotal stage in the development of various knowledge-based support systems, including knowledge retrieval and question-answering systems. In the domain of pig diseases, Chinese NER models encounter several challenges such as the scarcity of annotated data, domain-specific vocabulary, diverse entity categories, and ambiguous entity boundaries. To address these challenges, we propose PDCNER, a Pig Disease Chinese Named Entity Recognition method leveraging lexicon-enhanced BERT and contrastive learning. Firstly, we construct a domain-specific lexicon and pre-train word embeddings in the pig disease domain. Secondly, we integrate lexicon information of pig diseases into the lower layers of BERT using a Lexicon Adapter layer, which employs char-word pair sequences. Thirdly, to enhance feature representation, we propose a lexicon-enhanced contrastive loss layer on top of BERT. Finally, a Conditional Random Field (CRF) layer is employed as the model's decoder. Experimental results show that our proposed model demonstrates superior performance over several mainstream models, achieving a precision of 87.76%, a recall of 86.97%, and an F1-score of 87.36%. The proposed model outperforms BERT-BiLSTM-CRF and LEBERT by 14.05% and 6.8%, respectively, with only 10% of the samples available, showcasing its robustness in data scarcity scenarios. Furthermore, the model exhibits generalizability across publicly available datasets. Our work provides reliable technical support for the information extraction of pig diseases in Chinese and can be easily extended to other domains, thereby facilitating seamless adaptation for named entity identification across diverse contexts. The codes have been open-sourced at https://github.com/tufeifei923/pdcner.

Keywords

pig disease; Chinese named entity recognition; lexicon enhanced BERT; contrastive learning; small sample

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.