Version 1
: Received: 5 November 2024 / Approved: 6 November 2024 / Online: 7 November 2024 (10:09:31 CET)
How to cite:
Shen, Z.; Xiao, Z. A Chinese Short Text Similarity Method Integrating Sentence-level and Phrase-level Semantics. Preprints2024, 2024110453. https://doi.org/10.20944/preprints202411.0453.v1
Shen, Z.; Xiao, Z. A Chinese Short Text Similarity Method Integrating Sentence-level and Phrase-level Semantics. Preprints 2024, 2024110453. https://doi.org/10.20944/preprints202411.0453.v1
Shen, Z.; Xiao, Z. A Chinese Short Text Similarity Method Integrating Sentence-level and Phrase-level Semantics. Preprints2024, 2024110453. https://doi.org/10.20944/preprints202411.0453.v1
APA Style
Shen, Z., & Xiao, Z. (2024). A Chinese Short Text Similarity Method Integrating Sentence-level and Phrase-level Semantics. Preprints. https://doi.org/10.20944/preprints202411.0453.v1
Chicago/Turabian Style
Shen, Z. and Zhiyong Xiao. 2024 "A Chinese Short Text Similarity Method Integrating Sentence-level and Phrase-level Semantics" Preprints. https://doi.org/10.20944/preprints202411.0453.v1
Abstract
Short text similarity, as a pivotal research domain within Natural Language Processing (NLP), has been extensively utilized in intelligent search, recommendation systems, and question-answering systems. The majority of existing models for short text similarity concentrate on aligning the overall semantic content of entire sentences, frequently neglecting the semantic correlations between individual phrases within the sentences. This challenge is particularly acute in the Chinese language context, where synonyms and near-synonyms can introduce substantial interference in the computation of text similarity. In this paper, we introduce a short text similarity computation methodology that integrates both sentence-level and phrase-level semantics. By harnessing vector representations of Chinese words/phrases as external knowledge, our approach amalgamates global sentence characteristics with local phrase features to compute short text similarity from diverse perspectives, spanning from the global to the local level. Experimental findings substantiate that the proposed model surpasses previous approaches in Chinese short text similarity tasks. Specifically, it attains an accuracy of 90.16% on the LCQMC, marking an enhancement of 2.23% over ERNIE and 1.46% over the previously top-performing model, Glyce + BERT.
Keywords
Short text similarity; Chinese sentence pair classification; BERT; external knowledge integration
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.