Byte-Pair and N-Gram Convolutional Methods of Analysing Automatically Disseminated Content on Social Platforms

Houjun Liu

doi:10.20944/preprints202004.0214.v1

Preprint

Article

Byte-Pair and N-Gram Convolutional Methods of Analysing Automatically Disseminated Content on Social Platforms

This version is not peer-reviewed.

Houjun Liu^*

This version is not peer-reviewed.

Downloads

469

Views

552

Comments

Submitted:

07 April 2020

Posted:

13 April 2020

You are already at the latest version

Abstract

In this experiment, an efficient and accurate network of detecting automatically disseminated (bot) content on social platforms is devised. Through the utilisation of parallel convolutional neural network (CNN) which processes variable n-grams of text 15, 20, and 25 tokens in length encoded by Byte Pair Encoding (BPE), the complexities of linguistic content on social platforms are effectively captured and analysed. With validation on two sets of previously unexposed data, the model was able to achieve an accuracy of around 96.6% and 97.4% respectively — meeting or exceeding the performance of other comparable supervised ML solutions to this problem. Through testing, it is concluded that this method of text processing and analysis proves to be an effective way of classifying potentially artificially synthesized user data — aiding the security and integrity of social platforms.

Keywords:

bot detection

;

machine learning

;

natural language processing

;

computation linguistics

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Downloads

469

Views

552

Comments

Subscription

Notify me about updates to this article or when a peer-reviewed version is published.