Version 1
: Received: 11 July 2024 / Approved: 11 July 2024 / Online: 12 July 2024 (02:58:16 CEST)
How to cite:
Garcia-Carmona, A. M.; Prieto, M.-L.; Puertas, E.; Beunza, J.-J. Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports. Preprints2024, 2024070986. https://doi.org/10.20944/preprints202407.0986.v1
Garcia-Carmona, A. M.; Prieto, M.-L.; Puertas, E.; Beunza, J.-J. Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports. Preprints 2024, 2024070986. https://doi.org/10.20944/preprints202407.0986.v1
Garcia-Carmona, A. M.; Prieto, M.-L.; Puertas, E.; Beunza, J.-J. Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports. Preprints2024, 2024070986. https://doi.org/10.20944/preprints202407.0986.v1
APA Style
Garcia-Carmona, A. M., Prieto, M. L., Puertas, E., & Beunza, J. J. (2024). Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports. Preprints. https://doi.org/10.20944/preprints202407.0986.v1
Chicago/Turabian Style
Garcia-Carmona, A. M., Enrique Puertas and Juan-Jose Beunza. 2024 "Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports" Preprints. https://doi.org/10.20944/preprints202407.0986.v1
Abstract
This study presents a strategic approach to developing applications focused on implementing Large Language Models using the Langchain framework in Python. Three language models are highlighted: GPT-3.5 (turbo mode), LLaMA 2, and Vicuna 7B, each with their distinctive features and capabilities. The methodology used is described in detail, including data extraction from medical reports using zero-shot prompting data extraction techniques, interaction with language models, and structured storage of results. The performance of the models in data extraction is evaluated, presenting metrics such as precision, recall, and F1 score. The results demonstrate high model capability in extracting information, although areas for improvement are identified, particularly in data extraction precision. In conclusion, the efficacy of the models in extracting information from medical histories is not considerably acknowledged, with an emphasis on the importance of improving precision and increasing the volume of trained data for future research in healthcare digitalization.
Keywords
Large Language Models; Langchain framework; Electronic Health Records; Data mining; Model Evaluation; Healthcare; Digitalization
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.