Preprint
Article

Automatic Gender Identification from Text

Altmetrics

Downloads

7

Views

6

Comments

0

This version is not peer-reviewed

Submitted:

21 November 2024

Posted:

22 November 2024

You are already at the latest version

Alerts
Abstract

Gender identification of authors in literary texts is a compelling area of research within computational linguistics and natural language processing. Analyzing the gender of authors can uncover biases and socio-cultural dynamics of the past, deepening our understanding of historical texts. Inspired by the historical context where women often used male pseudonyms to navigate the literary world, this study seeks to determine an author's gender, relying on their written works using various classifiers, including language models. Our contributions include compiling a large-scale dataset of literary texts and conducting extensive experiments with different classification models. Our results show that the best-performing model, GPT2, achieved an impressive accuracy of 0.925.

Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated