Altmetrics
Downloads
7
Views
6
Comments
0
This version is not peer-reviewed
Submitted:
20 November 2024
Posted:
21 November 2024
You are already at the latest version
Increased wheat production is crucial for addressing food security concerns caused by limited resources, extreme weather, and population expansion. However, breeders have challenges due to fragmented information in multiple research articles, which slows progress in generating high-yield, stress-resistant, and high-quality wheat. This study presents WGIE (Wheat Germplasm Information Extraction), a wheat research article abstract-specific data extraction workflow based on conversational large language models (LLMs) and rapid engineering. WGIE employs zero-shot learning, multi-response polling to reduce hallucinations, and a calibration component to ensure optimal outcomes.Validation on 443 abstracts yielded 0.8010 Precision, 0.9969 Recall, 0.8883 F1 Score, and 0.8171 Accuracy, proving the ability to extract data with little human effort. Analysis found that irrelevant text increases the chance of hallucinations, emphasizing the necessity of matching prompts to input language. While WGIE efficiently harvests wheat germplasm information, its effectiveness is dependent on the consistency of prompts and text. Managing conflicts and enhancing prompt design can improve LLM performance in subsequent jobs.
Zhejun Zhang
et al.
,
2023
Hasan Mahmood Aminul Islam
et al.
,
2024
© 2024 MDPI (Basel, Switzerland) unless otherwise stated