Version 1
: Received: 1 August 2024 / Approved: 1 August 2024 / Online: 2 August 2024 (09:05:38 CEST)
How to cite:
Tufail, H.; Naseer, A.; Tamoor, M.; Ali, A. R. Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints2024, 2024080108. https://doi.org/10.20944/preprints202408.0108.v1
Tufail, H.; Naseer, A.; Tamoor, M.; Ali, A. R. Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints 2024, 2024080108. https://doi.org/10.20944/preprints202408.0108.v1
Tufail, H.; Naseer, A.; Tamoor, M.; Ali, A. R. Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints2024, 2024080108. https://doi.org/10.20944/preprints202408.0108.v1
APA Style
Tufail, H., Naseer, A., Tamoor, M., & Ali, A. R. (2024). Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks. Preprints. https://doi.org/10.20944/preprints202408.0108.v1
Chicago/Turabian Style
Tufail, H., Maria Tamoor and Abbas Raza Ali. 2024 "Advancements in Query-Based Tabular Data Retrieval: Detecting Image Data Tables and Extracting Text using Convolutional Neural Networks" Preprints. https://doi.org/10.20944/preprints202408.0108.v1
Abstract
The detection and recognition of tables in image-based documents is a challenging and essential task for automated data extraction and anal- ysis. This paper introduces a methodology that utilizes Convolutional Neural Networks (CNNs) to address this problem. By harnessing the ro- bust visual recognition abilities of CNNs, the proposed method effectively detects and extracts tables from image-based documents. Additionally, an Encoder-Dual Decoder (EDD) architecture is employed to extract the textual content from table cells, enabling the transformation of visual data into structured, machine-readable formats. This structured data is then further processed using a Vector Space Model (VSM)-based language modeling technique for query-based table retrieval. The methodology’s accuracy is assessed by analyzing the spatial relationships between the extracted cells and comparing them against a pre-defined table structure. The results provide an evidence to the effectiveness of the proposed ap- proach, with an impressive 85% accuracy rate and a corresponding error rate of 15%. This accurate extraction and recognition of tables from image-based documents can significantly enhance applications in data analysis, information retrieval, and knowledge extraction.
Keywords
Convolutional Neural Networks,Encoder- Dual Decoder,Extensible Markup Language, annotations, Deep Learning, Transfer Learning, Vector Space Model.
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.