Version 1
: Received: 12 September 2024 / Approved: 13 September 2024 / Online: 13 September 2024 (15:46:19 CEST)
How to cite:
Zhang, Y.; Gao, Y.; Yang, L. CS-Eval—A Concise Benchmark for Evaluating the Security Risks of Large Language Models. Preprints2024, 2024091098. https://doi.org/10.20944/preprints202409.1098.v1
Zhang, Y.; Gao, Y.; Yang, L. CS-Eval—A Concise Benchmark for Evaluating the Security Risks of Large Language Models. Preprints 2024, 2024091098. https://doi.org/10.20944/preprints202409.1098.v1
Zhang, Y.; Gao, Y.; Yang, L. CS-Eval—A Concise Benchmark for Evaluating the Security Risks of Large Language Models. Preprints2024, 2024091098. https://doi.org/10.20944/preprints202409.1098.v1
APA Style
Zhang, Y., Gao, Y., & Yang, L. (2024). CS-Eval—A Concise Benchmark for Evaluating the Security Risks of Large Language Models. Preprints. https://doi.org/10.20944/preprints202409.1098.v1
Chicago/Turabian Style
Zhang, Y., Yongbing Gao and Lidong Yang. 2024 "CS-Eval—A Concise Benchmark for Evaluating the Security Risks of Large Language Models" Preprints. https://doi.org/10.20944/preprints202409.1098.v1
Abstract
Large language models (LLMs) are essential to the field of natural language processing, and as their applications expand, security risks have become increasingly prominent. This paper introduces a novel benchmark for evaluating LLMs security, termed CS-Eval, designed to effectively assess the models' ability to address vulnerabilities. CS-Eval targets seven key security risks: ethical dilemmas, marginal topics, error detection, detailed event handling, cognitive bias, logical reasoning, and privacy identification, and establishes a Multi-Security Hazard Dataset (MSHD). The evaluated models include GPT-4o, Llama-3-70B, Claude-3-Opus, ERNIE-4.0, Abab-6.5, Qwen1.5-110B, Gemini-1.5-Pro, Doubao-Pro, SenseChat-V5, and GLM-4. We analyzed each model's performance in relation to these security risks and provided recommendations for improvement. Experimental results demonstrate varying levels of effectiveness across models, with GPT-4o exhibiting the best overall performance. Moreover, the relationship between security enhancement and model capa-bility is nonlinear, indicating that improving safety requires a multifaceted approach, considering various factors in both development and application.
Keywords
Large language models (LLMs); security risks; security benchmark; Multi-Security Hazard Dataset (MSHD)
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.