Version 1
: Received: 29 June 2024 / Approved: 1 July 2024 / Online: 2 July 2024 (10:12:19 CEST)
How to cite:
Dou, R. Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints2024, 2024070120. https://doi.org/10.20944/preprints202407.0120.v1
Dou, R. Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints 2024, 2024070120. https://doi.org/10.20944/preprints202407.0120.v1
Dou, R. Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints2024, 2024070120. https://doi.org/10.20944/preprints202407.0120.v1
APA Style
Dou, R. (2024). Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints. https://doi.org/10.20944/preprints202407.0120.v1
Chicago/Turabian Style
Dou, R. 2024 "Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts" Preprints. https://doi.org/10.20944/preprints202407.0120.v1
Abstract
We present a novel benchmarking methodology for Large Language Models (LLMs) to evaluate their susceptibility to hallucinations, thereby determining their reliability for real-world applications involving greater responsibilities. This method, called Deception-Based Benchmarking, involves testing the model with a task that requires composing a short paragraph. Initially, the model performs under standard conditions. Then, it is required to begin with a misleading sentence. Based on these outputs, the model is assessed on three criteria: accuracy, susceptibility, and consistency. This approach can be integrated with existing benchmarks or applied to new ones, thus facilitating a comprehensive evaluation of models across multiple dimensions. It also encompasses various forms of hallucination. We applied this methodology to several small opensource models using a modified version of MMLU, DB-MMLU1 . Our findings indicate that most current models are not specifically designed to self-correct when the random sampling process leads them to produce inaccuracies. However, certain models, such as Solar-10.7B-Instruct, exhibit a reduced vulnerability to hallucination, as reflected by their susceptibility and consistency scores. These metrics are distinct from traditional benchmark scores. Our results align with TruthfulQA, a widely used benchmark for hallucination. Looking forward, DB-benchmarking can be readily applied to other benchmarks to monitor the advancement of LLMs.
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.