Preprint
Article

Converting Ensemble Clustering Problem to a Mathematical Optimization Problem and Providing an Approach to Solve Based on Optimization Toolbox

Altmetrics

Downloads

427

Views

429

Comments

0

This version is not peer-reviewed

Submitted:

17 April 2018

Posted:

17 April 2018

You are already at the latest version

Alerts
Abstract
Nowadays, we live in a world in which people are facing with a lot of data that should be stored or displayed. One of the key methods to control and manage this data refers to grouping and classifying them in clusters. Today, clustering has a critical role in information retrieval methods for organizing large collections inside a few significant clusters. One of the main motivations for the use of clustering is to determine and reveal the hidden and inherent structure of a set of data. Ensemble clustering algorithms combine multiple clustering algorithms to finally reach an overall clustering system. Ensemble clustering methods by lack of information fusing utilize several primary partitions of data to find better ways. Since various clustering algorithms look at the different data points, they can produce various partitions from such data. It is possible to create a partition with high performance by combining the partitions obtained from different algorithms, even if the clusters to be very dense from each other. Most studies in this area have examined all the initial clusters. In this study, a new method is used in which the most sustainable clusters are utilized instead of all primary produced clusters. Consensus function based on co-association matrixes used to select more stable clusters. The most stable clusters selection method is done by cluster stability criterion based on F-measure. Optimization functions are used to optimize the obtained final clusters. The genetic algorithm is the optimizer used in this article to find the ultimate clusters participated in a consensus. Experimental results on several datasets show that the output of proposed method is various clusters with high stability.
Keywords: 
Subject: Computer Science and Mathematics  -   Data Structures, Algorithms and Complexity
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated