Preprint Article Version 1 This version is not peer-reviewed

Investigating the Predominance of Large Language Models in Low-Resource Bangla Language Over Transformer Models for Hate Speech Detection: A Comparative Analysis

Version 1 : Received: 15 October 2024 / Approved: 16 October 2024 / Online: 17 October 2024 (10:06:19 CEST)

How to cite: Faria, F. T. J.; Baniata, L. H.; Kang, S. Investigating the Predominance of Large Language Models in Low-Resource Bangla Language Over Transformer Models for Hate Speech Detection: A Comparative Analysis. Preprints 2024, 2024101348. https://doi.org/10.20944/preprints202410.1348.v1 Faria, F. T. J.; Baniata, L. H.; Kang, S. Investigating the Predominance of Large Language Models in Low-Resource Bangla Language Over Transformer Models for Hate Speech Detection: A Comparative Analysis. Preprints 2024, 2024101348. https://doi.org/10.20944/preprints202410.1348.v1

Abstract

The rise of abusive language on social media is a significant threat to mental health and 1 social cohesion. For Bengali speakers, the need for effective detection is critical. However, current 2 methods fall short in addressing the massive volume of content. Improved techniques are urgently 3 needed to combat online hate speech in Bengali. Traditional machine learning techniques, while 4 useful, often require large, linguistically diverse datasets to train models effectively. This paper 5 addresses the urgent need for improved hate speech detection methods in Bengali, aiming to fill the 6 existing research gap. Contextual understanding is crucial in differentiating between harmful speech 7 and benign expressions. Large language models (LLMs) have shown state-of-the-art performance in 8 various natural language tasks due to their extensive training on vast amounts of data. We explore the 9 application of LLMs, specifically GPT-3.5 Turbo and Gemini 1.5 Pro, for Bengali hate speech detection 10 using Zero-Shot and Few-Shot Learning approaches. Unlike conventional methods, Zero-Shot 11 Learning identifies hate speech without task-specific training data, making it highly adaptable to new 12 datasets and languages. Few-Shot Learning, on the other hand, requires minimal labeled examples, 13 allowing for efficient model training with limited resources. Our experimental results show that 14 LLMs outperform traditional approaches. In this study, we evaluated GPT-3.5 Turbo and Gemini 1.5 15 Pro on multiple datasets. To further enhance our study, we considered the distribution of comments 16 in different datasets and the challenge of class imbalance, which can affect model performance. The 17 BD-SHS dataset consists of 35,197 comments in the training set, 7,542 in the validation set, and 7,542 18 in the test set. The Bengali Hate Speech Dataset v1.0 & v2.0 includes comments distributed across 19 various hate categories: personal hate (629), political hate (1,771), religious hate (502), geopolitical hate 20 (1,179), and gender abusive hate (316). The Bengali Hate Dataset comprises 7,500 non-hate and 7,500 21 hate comments. GPT-3.5 Turbo achieved impressive results with 97.33%, 98.42%, and 98.53% accuracy. 22 In contrast, Gemini 1.5 Pro showed lower performance across all datasets. Specifically, GPT-3.5 Turbo 23 excelled with significantly higher accuracy compared to Gemini 1.5 Pro. These outcomes highlight a 24 6.28% increase in accuracy compared to traditional methods, which achieved 92.25%. Our research 25 contributes to the growing body of literature on LLM applications in natural language processing, 26 particularly in the context of low-resource languages.

Keywords

Hate speech detection; Bengali language; Low resource language; Large language models; Few-shot learning; Zero-shot learning; Natural language processing

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.