Version 1
: Received: 6 November 2024 / Approved: 7 November 2024 / Online: 7 November 2024 (08:48:07 CET)
How to cite:
Cadena-Bautista, Á.; López-Ponce, F.; Ojeda-Trueba, S.; Sierra, G.; Bel-Enguix, G. Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?. Preprints2024, 2024110502. https://doi.org/10.20944/preprints202411.0502.v1
Cadena-Bautista, Á.; López-Ponce, F.; Ojeda-Trueba, S.; Sierra, G.; Bel-Enguix, G. Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?. Preprints 2024, 2024110502. https://doi.org/10.20944/preprints202411.0502.v1
Cadena-Bautista, Á.; López-Ponce, F.; Ojeda-Trueba, S.; Sierra, G.; Bel-Enguix, G. Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?. Preprints2024, 2024110502. https://doi.org/10.20944/preprints202411.0502.v1
APA Style
Cadena-Bautista, Á., López-Ponce, F., Ojeda-Trueba, S., Sierra, G., & Bel-Enguix, G. (2024). Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?. Preprints. https://doi.org/10.20944/preprints202411.0502.v1
Chicago/Turabian Style
Cadena-Bautista, Á., Gerardo Sierra and Gemma Bel-Enguix. 2024 "Exploring the Behavior and Performance of Large Language Models: Can LLMs Infer Answers to Questions Involving Restricted Information?" Preprints. https://doi.org/10.20944/preprints202411.0502.v1
Abstract
In this paper various LLMs are tested in a specific domain using a Retrieval-Augmented Generation (RAG) system. The study focuses on the performance and behaviour of the models and was conducted in Spanish. A questionnaire based on The Bible, which consist of questions that vary in complexity of reasoning, was created in order to evaluate the reasoning capabilities of each model. The RAG system matches a question with the most similar passage from The Bible and feeds the pair to each LLM. The evaluation aims to determine whether each model can reason solely with the provided information or if it disregards the instructions given and makes use on its pretrained knowledge.
Keywords
RAG; Large Language Models; Information Retrieval; Bible corpus
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.