Preprint
Article

Natural Language Processing Application on Commit Messages: A Case Study on HEP Software

Altmetrics

Downloads

198

Views

64

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

19 September 2022

Posted:

20 September 2022

You are already at the latest version

Alerts
Abstract
Version Control and Source Code Management Systems, such as GitHub, contain large amount ofunstructured historical information of software projects. Recent studies have introduced Natural Language Processing (NLP) to help software engineers retrieve information from very large collection of unstructured data. In this study, we have extended our previous study by increasing our datasets and ML and clustering techniques. Method: We have followed a complex methodology made up of various steps. Starting from the raw commit messages we have employed NLP techniques to build a structured database. We have extracted their main features and used as input of different clustering algorithms. Once labelled each entry, we have applied supervised machine learning techniques to build a prediction and classification model. Results: We have developed a machine learning-based model to automatically classify commit messages of a software project. Our model exploits a ground-truth dataset which includes commit messages obtained from various GitHub projects belonging to the HEP context. Conclusions: The contribution of this paper is two-fold: it proposes a ground-truth database; it provides a machine learning prediction model. They automatically identify the more change-proneness areas of code. Our model has obtained a very high average precision, recall and F1-score.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated