Version 1
: Received: 11 June 2024 / Approved: 12 June 2024 / Online: 13 June 2024 (09:42:13 CEST)
How to cite:
Athukoralage, D.; Atapattu, T.; Thilakaratne, M.; Falkner, K. Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models. Preprints2024, 2024060860. https://doi.org/10.20944/preprints202406.0860.v1
Athukoralage, D.; Atapattu, T.; Thilakaratne, M.; Falkner, K. Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models. Preprints 2024, 2024060860. https://doi.org/10.20944/preprints202406.0860.v1
Athukoralage, D.; Atapattu, T.; Thilakaratne, M.; Falkner, K. Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models. Preprints2024, 2024060860. https://doi.org/10.20944/preprints202406.0860.v1
APA Style
Athukoralage, D., Atapattu, T., Thilakaratne, M., & Falkner, K. (2024). Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models. Preprints. https://doi.org/10.20944/preprints202406.0860.v1
Chicago/Turabian Style
Athukoralage, D., Menasha Thilakaratne and Katrina Falkner. 2024 "Tweets Classification for Digital Epidemiology of Childhood Health Outcomes Using Pre-Trained Language Models" Preprints. https://doi.org/10.20944/preprints202406.0860.v1
Abstract
This paper presents our approaches for the SMM4H’24 Shared Task 5 on the binary classification of English tweets reporting children’s medical disorders. Our first approach involves fine-tuning a single RoBERTa-large model, while the second approach entails ensembling the results of three fine-tuned BERTweet-large models. We demonstrate that although both approaches exhibit identical performance on validation data, the BERTweet-large ensemble excels on test data. Our best-performing system achieves an F1-score of 0.938 on test data, outperforming the benchmark classifier by 1.18%.
Keywords
digital epidemiology, childhood health, pre-trained language models, ensemble models, natural language processing, tweets classification
Subject
Computer Science and Mathematics, Artificial Intelligence and Machine Learning
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.