Preprint Article Version 1 This version is not peer-reviewed

Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts

Version 1 : Received: 29 June 2024 / Approved: 1 July 2024 / Online: 2 July 2024 (10:12:19 CEST)

How to cite: Dou, R. Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints 2024, 2024070120. https://doi.org/10.20944/preprints202407.0120.v1 Dou, R. Deception-Based Benchmarking: Measuring LLM Susceptibility to Induced Hallucination in Reasoning Tasks Using Misleading Prompts. Preprints 2024, 2024070120. https://doi.org/10.20944/preprints202407.0120.v1

Abstract

We present a novel benchmarking methodology for Large Language Models (LLMs) to evaluate their susceptibility to hallucinations, thereby determining their reliability for real-world applications involving greater responsibilities. This method, called Deception-Based Benchmarking, involves testing the model with a task that requires composing a short paragraph. Initially, the model performs under standard conditions. Then, it is required to begin with a misleading sentence. Based on these outputs, the model is assessed on three criteria: accuracy, susceptibility, and consistency. This approach can be integrated with existing benchmarks or applied to new ones, thus facilitating a comprehensive evaluation of models across multiple dimensions. It also encompasses various forms of hallucination. We applied this methodology to several small opensource models using a modified version of MMLU, DB-MMLU1 . Our findings indicate that most current models are not specifically designed to self-correct when the random sampling process leads them to produce inaccuracies. However, certain models, such as Solar-10.7B-Instruct, exhibit a reduced vulnerability to hallucination, as reflected by their susceptibility and consistency scores. These metrics are distinct from traditional benchmark scores. Our results align with TruthfulQA, a widely used benchmark for hallucination. Looking forward, DB-benchmarking can be readily applied to other benchmarks to monitor the advancement of LLMs.

Keywords

LLM, NLP, Hallucination, AI assistant reliability, Benchmarking, Deception-based benchmarking, MMLU, DB-MMLU, TruthfulQA, Accuracy, Susceptibility, Consistency

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.