Preprint Article Version 1 This version is not peer-reviewed

Bias and Cyberbullying Detection and Data Generation with Transformer AI Models and top LLMs

Version 1 : Received: 3 July 2024 / Approved: 4 July 2024 / Online: 4 July 2024 (08:33:50 CEST)

How to cite: Kumar, Y.; Huang, K.; Perez, A.; Yang, G.; Li, J. J.; Morreale, P.; Kruger, D.; Jiang, R. Bias and Cyberbullying Detection and Data Generation with Transformer AI Models and top LLMs. Preprints 2024, 2024070411. https://doi.org/10.20944/preprints202407.0411.v1 Kumar, Y.; Huang, K.; Perez, A.; Yang, G.; Li, J. J.; Morreale, P.; Kruger, D.; Jiang, R. Bias and Cyberbullying Detection and Data Generation with Transformer AI Models and top LLMs. Preprints 2024, 2024070411. https://doi.org/10.20944/preprints202407.0411.v1

Abstract

Despite significant advancements in Artificial Intelligence (AI) and Large Language Models (LLMs), detecting and mitigating bias remains a critical challenge, particularly within social media platforms like X (formerly Twitter) and in addressing cyberbullying present on them. This research investigates the effectiveness of leading LLMs in generating synthetic biased and cyberbullying data and evaluates the proficiency of Transformer AI models in detecting bias and cyberbullying within both authentic and synthetic contexts. The study involves semantic analysis and feature engineering on a dataset of over 48,000 sentences related to cyberbullying collected from Twitter (before it became X). Leveraging state-of-the-art LLMs such as ChatGPT-4o, Pi AI, Claude 3 Opus, and Gemini-1.5, synthetic biased, cyberbullying, and neutral data were generated to deepen the understanding of bias in human-generated data. AI models including DeBERTa, Longformer, BigBird, HateBERT, MobileBERT, DistilBERT, BERT, RoBERTa, ELECTRA, and XLNet were initially trained to classify Twitter cyberbullying data and subsequently fine-tuned, optimized, and quantized for multilabel classification (detecting both biases and cyberbullying). The study's outcomes include a prototype of a hybrid application that combines a Bias Data Detector and a Bias Data Generator, validated through extensive testing.

Keywords

Synthetic Data; Bias Data Generator; Large Language Models (LLMs); Cyberbullying Detection; Inherent Biases; Transformer Models; Bias Detection Tokens; Swarm of AI Agents.

Subject

Computer Science and Mathematics, Artificial Intelligence and Machine Learning

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.