Preprint Article Version 1 This version is not peer-reviewed

Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm

Version 1 : Received: 29 July 2024 / Approved: 29 July 2024 / Online: 30 July 2024 (00:24:48 CEST)

How to cite: Chanchí, G.; Barrera, D.; Barreto, S. Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints 2024, 2024072325. https://doi.org/10.20944/preprints202407.2325.v1 Chanchí, G.; Barrera, D.; Barreto, S. Proposal of a Mathematical and Computational Method for Determining the Optimal Number of Clusters in the K-Means Algorithm. Preprints 2024, 2024072325. https://doi.org/10.20944/preprints202407.2325.v1

Abstract

(1) Background: The rapid evolution of the internet and technological infrastructure has led to a surge in data generation across various contexts, increasing the use of machine learning tools to extract valuable information. Clustering, particularly using the K-means algorithm, is a common technique. However, determining the optimal number of clusters in K-means is challenging, as traditional methods like the elbow method can be imprecise and subjective. This study proposes a more accurate and objective method to identify the optimal number of clusters. (2) Methods: The proposed method utilizes the numerical derivative of cluster inertias and the maximum value of the ratio between contiguous derivatives. Implemented in Python using sklearn, numpy, and matplotlib, the method was validated with synthetic datasets generated by artificial intelligence, where cluster numbers are clearly distinguishable. (3) Results: The method proved to be more precise and less subjective than the traditional elbow method, accurately identifying the optimal number of clusters in all tested synthetic datasets. Additionally, it demonstrated computational efficiency with minimal RAM usage and execution time, making it suitable for practical data analysis applications. (4) Conclusions: This new mathematical and computational method significantly improves the determination of the optimal number of clusters in K-means, offering a more accurate and objective alternative to traditional techniques. Future work will extend this method to hierarchical clustering and develop a cloud service for wider accessibility

Keywords

Machine Learning; KMeans; Mathematical Method; Elbow Method

Subject

Computer Science and Mathematics, Computational Mathematics

Comments (0)

We encourage comments and feedback from a broad range of readers. See criteria for comments and our Diversity statement.

Leave a public comment
Send a private comment to the author(s)
* All users must log in before leaving a comment
Views 0
Downloads 0
Comments 0
Metrics 0


×
Alerts
Notify me about updates to this article or when a peer-reviewed version is published.
We use cookies on our website to ensure you get the best experience.
Read more about our cookies here.