Current address: 875 Perimeter Drive, Moscow, Idaho 83844, USA
Version 1
: Received: 14 October 2024 / Approved: 14 October 2024 / Online: 15 October 2024 (02:38:29 CEST)
How to cite:
Jamil, H. M. Ad hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa. Preprints2024, 2024101061. https://doi.org/10.20944/preprints202410.1061.v1
Jamil, H. M. Ad hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa. Preprints 2024, 2024101061. https://doi.org/10.20944/preprints202410.1061.v1
Jamil, H. M. Ad hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa. Preprints2024, 2024101061. https://doi.org/10.20944/preprints202410.1061.v1
APA Style
Jamil, H. M. (2024). Ad hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa. Preprints. https://doi.org/10.20944/preprints202410.1061.v1
Chicago/Turabian Style
Jamil, H. M. 2024 "Ad hoc Data Foraging in a Life Sciences Community Ecosystem Using SoDa" Preprints. https://doi.org/10.20944/preprints202410.1061.v1
Abstract
Biologists often set out to find relevant data in an ever changing landscape of interesting databases. While leading journals publish descriptions of databases, they are usually not current and do not frequently update the list to discard defunct or poor quality database. These indices usually include databases that are proactively requested to be included by their authors. The challenge for individual biologists then is to discover, explore and select databases of interest from a large unorganized collection and effectively use them in analysis without too large an investment. The FAIR data principle being advocated to improve searching, finding, accessing, and inter-operating among these diverse information sources to increase usability is proving to be a difficult proposition and consequently, a large number of data sources are not FAIR compliant. Since linked open data does not guarantee FAIRness, biologists are now left on a solo hunt for information on the open network. In this paper, we propose SoDa, for intelligent data foraging on the internet by biologists. SoDa helps biologists discover resources based on analysis requirements, generate resource access plans, and stores cleaned data and knowledge for community use. A secondary search index is also supported for community members to find archived information in a convenient way.
Keywords
Large language model; intelligent user interface; FAIR; wrapper generation; interoperability; ecosystem
Subject
Computer Science and Mathematics, Information Systems
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.