Preprint Article Version 1 This version is not peer-reviewed

Navigating Complexity: A Tailored Question-Answering Approach for PDFs in Finance, Bio-Medicine, and Science

Version 1 : Received: 16 October 2024 / Approved: 17 October 2024 / Online: 17 October 2024 (17:04:22 CEST)

How to cite: Kumar, T.; Bhujbal, R.; Raj, K.; Roy, A. M. Navigating Complexity: A Tailored Question-Answering Approach for PDFs in Finance, Bio-Medicine, and Science. Preprints 2024, 2024101395. https://doi.org/10.20944/preprints202410.1395.v1 Kumar, T.; Bhujbal, R.; Raj, K.; Roy, A. M. Navigating Complexity: A Tailored Question-Answering Approach for PDFs in Finance, Bio-Medicine, and Science. Preprints 2024, 2024101395. https://doi.org/10.20944/preprints202410.1395.v1

Abstract

Understanding complex Portable Document Format (PDF) files, such as research papers, clinical reports, and scientific manuals, is often a time-consuming endeavor. While significant progress has been made in developing question-answering (QA) systems that yield contextually relevant responses, the creation of a comprehensive end-to-end machine learning model capable of addressing intricate questions remains a formidable challenge. These systems typically rely on substantial labeled training data to effectively train their foundational models for specific tasks. However, assembling such datasets is particularly challenging for complex documents, including annual reports from major technology companies. In this paper, we address this issue by developing a QA system specifically designed for PDF documents, focusing on the domains of finance, biomedicine, and scientific literature. We manually curated datasets from these areas for evaluation purposes and utilized pre-trained Bidirectional Encoder Representations from Transformers (BERT) models from the Hugging Face library. The models were evaluated using the F1 score, achieving a notable score of 44% with the BERT Large model.

Keywords

Question-Answering; Bidirectional Encoder Representations; Bio-medicine; Finanace; Transformers)

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.