Version 1
: Received: 28 August 2024 / Approved: 28 August 2024 / Online: 29 August 2024 (03:05:04 CEST)
How to cite:
Soni, A. Enhancing Multilingual Table-to-Text Generation with QA Blueprints: Overcoming Challenges in Low-Resource Languages. Preprints2024, 2024082032. https://doi.org/10.20944/preprints202408.2032.v1
Soni, A. Enhancing Multilingual Table-to-Text Generation with QA Blueprints: Overcoming Challenges in Low-Resource Languages. Preprints 2024, 2024082032. https://doi.org/10.20944/preprints202408.2032.v1
Soni, A. Enhancing Multilingual Table-to-Text Generation with QA Blueprints: Overcoming Challenges in Low-Resource Languages. Preprints2024, 2024082032. https://doi.org/10.20944/preprints202408.2032.v1
APA Style
Soni, A. (2024). Enhancing Multilingual Table-to-Text Generation with QA Blueprints: Overcoming Challenges in Low-Resource Languages. Preprints. https://doi.org/10.20944/preprints202408.2032.v1
Chicago/Turabian Style
Soni, A. 2024 "Enhancing Multilingual Table-to-Text Generation with QA Blueprints: Overcoming Challenges in Low-Resource Languages" Preprints. https://doi.org/10.20944/preprints202408.2032.v1
Abstract
Limiting training data in low-resource languages is a barrier to Natural Language Processing (NLP). Regardless, languages with thousands of users worldwide require improved NLP capabilities. The Table-to-Text task—typically develops natural language descriptions using data tables, tests model reasoning abilities—although it is especially tough in multi-language settings. System output frequently lack credit to their underlying data. Intermediate planning strategies—which include Question-Answer (QA) blueprints—improving summarization tasks by presenting related QA pair prior to verbalization. Therefore, this study analyses whether QA blueprints improve multilingual Table-to-Text output attribution. An enlarged multilingual dataset, encompassing African languages, contains QA blueprints that were produced and filtered heuristically. This dataset is used to fine-tune sequence-to-sequence model (transformers)—both with and without blueprints. Two configurations are evaluated—English reference blueprints with language target verbalization and modified blueprints. These results show that English-only models benefit from blueprints, whereas multi language models cannot. Errors in machine-translating blueprints provide challenges, as do models that rely on their generated blueprints.
Keywords
African; English; Natural Language Processing; Question-Answer
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.