Article
Version 1
This version is not peer-reviewed
Exploring Vulnerabilities in BERT Models
Version 1
: Received: 28 June 2024 / Approved: 2 July 2024 / Online: 2 July 2024 (14:52:49 CEST)
How to cite: Wang, J. Exploring Vulnerabilities in BERT Models. Preprints 2024, 2024070204. https://doi.org/10.20944/preprints202407.0204.v1 Wang, J. Exploring Vulnerabilities in BERT Models. Preprints 2024, 2024070204. https://doi.org/10.20944/preprints202407.0204.v1
Abstract
Recent research underscores the potential hazards that Backdoor Attacks pose to natural language processing (NLP) models. A thorough exploration of these attack methodologies is critical for comprehending the susceptibility of such models. Under normal circumstances, a model compromised by a backdoor attack will produce standard outputs; however, the presence of a specific trigger within the input leads to erroneous results. This paper focuses on the vulnerability of BERT, a widely recognized model in numerous NLP applications, by introducing a novel backdoor attack strategy that effectively compromises it. We manipulate the attention heads in BERT to enhance the backdoor attack. The efficacy of this method is demonstrated through experiments conducted on clean-label attack and a Sentiment Analysis task.
Keywords
BERT; Transformer
Subject
Computer Science and Mathematics, Computer Science
Copyright: This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Comments (0)
We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.
Leave a public commentSend a private comment to the author(s)
* All users must log in before leaving a comment