Version 1
: Received: 29 July 2024 / Approved: 29 July 2024 / Online: 30 July 2024 (00:24:48 CEST)
How to cite:
Chanchí, G.; Barrera, D.; Barreto, S. Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints2024, 2024072325. https://doi.org/10.20944/preprints202407.2325.v1
Chanchí, G.; Barrera, D.; Barreto, S. Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints 2024, 2024072325. https://doi.org/10.20944/preprints202407.2325.v1
Chanchí, G.; Barrera, D.; Barreto, S. Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints2024, 2024072325. https://doi.org/10.20944/preprints202407.2325.v1
APA Style
Chanchí, G., Barrera, D., & Barreto, S. (2024). Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints. https://doi.org/10.20944/preprints202407.2325.v1
Chicago/Turabian Style
Chanchí, G., Dayana Barrera and Sandra Barreto. 2024 "Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm" Preprints. https://doi.org/10.20944/preprints202407.2325.v1
Abstract
(1) Background: The rapid evolution of the internet and technological infrastructure has led to a surge in data generation across various contexts, increasing the use of machine learning tools to extract valuable information. Clustering, particularly using the K-means algorithm, is a common technique. However, determining the optimal number of clusters in K-means is challenging, as traditional methods like the elbow method can be imprecise and subjective. This study proposes a more accurate and objective method to identify the optimal number of clusters. (2) Methods: The proposed method utilizes the numerical derivative of cluster inertias and the maximum value of the ratio between contiguous derivatives. Implemented in Python using sklearn, numpy, and matplotlib, the method was validated with synthetic datasets generated by artificial intelligence, where cluster numbers are clearly distinguishable. (3) Results: The method proved to be more precise and less subjective than the traditional elbow method, accurately identifying the optimal number of clusters in all tested synthetic datasets. Additionally, it demonstrated computational efficiency with minimal RAM usage and execution time, making it suitable for practical data analysis applications. (4) Conclusions: This new mathematical and computational method significantly improves the determination of the optimal number of clusters in K-means, offering a more accurate and objective alternative to traditional techniques. Future work will extend this method to hierarchical clustering and develop a cloud service for wider accessibility
Computer Science and Mathematics, Computational Mathematics
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.