Preprint Article Version 1 This version is not peer-reviewed

Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks

Version 1 : Received: 1 August 2024 / Approved: 1 August 2024 / Online: 2 August 2024 (09:05:38 CEST)

How to cite: Tufail, H.; Naseer, A.; Tamoor, M.; Ali, A. R. Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints 2024, 2024080108. https://doi.org/10.20944/preprints202408.0108.v1 Tufail, H.; Naseer, A.; Tamoor, M.; Ali, A. R. Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints 2024, 2024080108. https://doi.org/10.20944/preprints202408.0108.v1

Abstract

The detection and recognition of tables in image-based documents is a challenging and essential task for automated data extraction and anal- ysis. This paper introduces a methodology that utilizes Convolutional Neural Networks (CNNs) to address this problem. By harnessing the ro- bust visual recognition abilities of CNNs, the proposed method effectively detects and extracts tables from image-based documents. Additionally, an Encoder-Dual Decoder (EDD) architecture is employed to extract the textual content from table cells, enabling the transformation of visual data into structured, machine-readable formats. This structured data is then further processed using a Vector Space Model (VSM)-based language modeling technique for query-based table retrieval. The methodology’s accuracy is assessed by analyzing the spatial relationships between the extracted cells and comparing them against a pre-defined table structure. The results provide an evidence to the effectiveness of the proposed ap- proach, with an impressive 85% accuracy rate and a corresponding error rate of 15%. This accurate extraction and recognition of tables from image-based documents can significantly enhance applications in data analysis, information retrieval, and knowledge extraction.

Keywords

Convolutional Neural Networks,Encoder- Dual Decoder,Extensible Markup Language, annotations, Deep Learning, Transfer Learning, Vector Space Model.

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.