Preprint
Article

Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features

Altmetrics

Downloads

1069

Views

2410

Comments

0

A peer-reviewed article of this preprint also exists.

Submitted:

02 January 2018

Posted:

03 January 2018

You are already at the latest version

Alerts
Abstract
Wikipedia is the most popular and the largest user-generated source of knowledge on the Web. Quality of the information in this encyclopedia is often questioned. Therefore, Wikipedians have developed an award system for high quality articles, which follows the specific style guidelines. Nevertheless, more than 1.2 million articles in Polish Wikipedia are unassessed. This paper considers over 100 linguistic features to determine the quality of Wikipedia articles in Polish language. We evaluate our models on 500,000 articles of Polish Wikipedia. Additionally, we discuss the importance of linguistic features for quality prediction.
Keywords: 
Subject: Computer Science and Mathematics  -   Information Systems
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated