Version 1
: Received: 25 September 2024 / Approved: 25 September 2024 / Online: 25 September 2024 (12:07:22 CEST)
How to cite:
Torcal, J.; Moreno, V.; Llorens, J.; Granados, A. Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints2024, 2024092000. https://doi.org/10.20944/preprints202409.2000.v1
Torcal, J.; Moreno, V.; Llorens, J.; Granados, A. Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints 2024, 2024092000. https://doi.org/10.20944/preprints202409.2000.v1
Torcal, J.; Moreno, V.; Llorens, J.; Granados, A. Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints2024, 2024092000. https://doi.org/10.20944/preprints202409.2000.v1
APA Style
Torcal, J., Moreno, V., Llorens, J., & Granados, A. (2024). Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques. Preprints. https://doi.org/10.20944/preprints202409.2000.v1
Chicago/Turabian Style
Torcal, J., Juan Llorens and Ana Granados. 2024 "Creating and Validating a Ground Truth Dataset of UML Diagrams Using Deep Learning Techniques" Preprints. https://doi.org/10.20944/preprints202409.2000.v1
Abstract
UML (Unified Modeling Language) diagrams are graphical representations used in software engineering which play a vital role in the design and development of software systems and various engineering processes. Large, good-quality datasets containing UML diagrams are essential for different areas in the industry, research and teaching purposes, however few exist in the literature and it is common to find duplicate elements in the existing datasets. This might affect the evaluation of the models obtained when using these datasets. This paper addresses the challenge of creating a ground truth dastaset of UML diagrams, including semi-automated inspection to remove duplicates and ensure the correct labeling of all UML diagrams contained in the dataset. In particular, a dataset of six UML diagram classes has been assembled, comprising a total of 2,626 images (426 activity diagrams, 636 class diagrams, 352 component diagrams, 357 deployment diagrams, 435 sequence diagrams, and 420 use case diagrams). Importantly, unlike other existing datasets, ours contains no duplicate elements and all diagrams are correctly labeled. Our curated dataset is a valuable and unique resource for the research community because it serves as a foundation for training and evaluating various models. In this paper, we demonstrate this by training and testing several deep learning models using our dataset, achieving highly satisfactory results compared to those presented in other works in the literature. Additionally, our experimental results highlight the potential of Visual Transformers for UML diagram classification, setting our approach apart from others that predominantly used Convolutional Neural Networks for similar tasks.
Computer Science and Mathematics, Computer Science
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.