Preprint Article Version 1 Preserved in Portico This version is not peer-reviewed

An Embarrassingly Simple Method to Compromise Language Models

*
Version 1 : Received: 28 June 2024 / Approved: 28 June 2024 / Online: 29 June 2024 (06:22:54 CEST)

How to cite: Wang, J. An Embarrassingly Simple Method to Compromise Language Models. Preprints 2024, 2024062045. https://doi.org/10.20944/preprints202406.2045.v1 Wang, J. An Embarrassingly Simple Method to Compromise Language Models. Preprints 2024, 2024062045. https://doi.org/10.20944/preprints202406.2045.v1

Abstract

Language models like BERT dominate current NLP research due to their robust performance, but they are vulnerable to backdoor attacks. Such attacks cause the model to consistently generate incorrect predictions when specific triggers are present in the input, while maintaining normal behavior on clean inputs. In this paper, we propose a straightforward data poisoning method targeting the BERT architecture. Our approach does not involve complex modifications to the model or its training process; instead, it relies solely on altering a small portion of the training data. By introducing simple perturbations into just 10% of the training dataset, we demonstrate the feasibility of injecting a backdoor into the model. Our experimental results show a high attack success rate, indicating that the model trained on the poisoned data can reliably associate the trigger with the attacker's desired outputs while its performance on clean data remains unaffected. This highlights the stealth and effectiveness of our method, emphasizing the need for improved defensive strategies to protect against such threats. Our study underscores the critical importance of ongoing research and development to safeguard AI systems from malicious exploitation, ensuring the security and reliability of NLP applications.

Keywords

transformer; attack

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.