Altmetrics
Downloads
20
Views
26
Comments
0
This version is not peer-reviewed
Submitted:
11 December 2024
Posted:
12 December 2024
You are already at the latest version
Kaggle is an online platform for data scientists, machine learning engineers, and researchers to access datasets, compete in machine learning competitions, collaborate with other data scientists, and develop and showcase their data science skills. Bot accounts can cause a variety of issues, including inflating the popularity of certain content artificially, simulating user activity to affect rankings or ratings, spreading spam, stealing data, or carrying out cyberattacks. Despite Kaggle's prominent focus on data science and its robust community of data scientists, the platform has been notably neglected in terms of addressing the pervasive issue of bot activity within the platform. Recognizing this gap, this study embarks on a comparative investigation of supervised machine learning algorithms tailored for detecting bot accounts effectively within the Kaggle ecosystem. The dataset consists of 799 users, of which 400 were labeled as bots, and 399 were labeled as real users. The study found that the Random Forest classification algorithm had the best evaluation metrics compared to other algorithms used in detecting bots. Feature importance analysis was also conducted to identify the most relevant features in differentiating between bot and real accounts. Overall, the study provides a useful framework for identifying bot accounts on Kaggle, which can be applied in other similar platforms to improve their user verification and security systems.
© 2024 MDPI (Basel, Switzerland) unless otherwise stated