Preprint Article Version 1 This version is not peer-reviewed

Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques

Version 1 : Received: 25 September 2024 / Approved: 25 September 2024 / Online: 25 September 2024 (12:07:22 CEST)

How to cite: Torcal, J.; Moreno, V.; Llorens, J.; Granados, A. Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints 2024, 2024092000. https://doi.org/10.20944/preprints202409.2000.v1 Torcal, J.; Moreno, V.; Llorens, J.; Granados, A. Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints 2024, 2024092000. https://doi.org/10.20944/preprints202409.2000.v1

Abstract

UML (Unified Modeling Language) diagrams are graphical representations used in software engineering which play a vital role in the design and development of software systems and various engineering processes. Large, good-quality datasets containing UML diagrams are essential for different areas in the industry, research and teaching purposes, however few exist in the literature and it is common to find duplicate elements in the existing datasets. This might affect the evaluation of the models obtained when using these datasets. This paper addresses the challenge of creating a ground truth dastaset of UML diagrams, including semi-automated inspection to remove duplicates and ensure the correct labeling of all UML diagrams contained in the dataset. In particular, a dataset of six UML diagram classes has been assembled, comprising a total of 2,626 images (426 activity diagrams, 636 class diagrams, 352 component diagrams, 357 deployment diagrams, 435 sequence diagrams, and 420 use case diagrams). Importantly, unlike other existing datasets, ours contains no duplicate elements and all diagrams are correctly labeled. Our curated dataset is a valuable and unique resource for the research community because it serves as a foundation for training and evaluating various models. In this paper, we demonstrate this by training and testing several deep learning models using our dataset, achieving highly satisfactory results compared to those presented in other works in the literature. Additionally, our experimental results highlight the potential of Visual Transformers for UML diagram classification, setting our approach apart from others that predominantly used Convolutional Neural Networks for similar tasks.

Keywords

UML diagram dataset; UML diagram classification; deep learning; convolutional neural networks; vision transformers

Subject

Computer Science and Mathematics, Computer Science

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.