Preprint
Article

Attribute Reduction Algorithm of Knowledge Granularity for Incomplete System under the Background of Clustering

Altmetrics

Downloads

66

Views

19

Comments

0

A peer-reviewed article of this preprint also exists.

This version is not peer-reviewed

Submitted:

25 December 2023

Posted:

26 December 2023

You are already at the latest version

Alerts
Abstract
The phenomenon of missing data can be seen everywhere in reality. Most typical attribute reduction models are only suitable for complete systems. But for incomplete systems, we can’t obtain the effective reduction rules. Even if there are a few reduction approaches, the classification accuracy of their reduction sets still needs to be improved. In order to overcome these shortcomings, this paper firstly defines the similarities of intra-cluster objects and inter-cluster objects based on the tolerance principle and the mechanism of knowledge granularity. Secondly, attributes are selected on the principle that the similarity of inter-cluster objects is small and the similarity of intra-cluster objects is large, and then the knowledge granularity attribute model is proposed under the background of clustering; Then, the IKAR algorithm program is designed. Finally, a series of comparative experiments about reduction size, running time and classification accuracy are conducted with twelve UCI data sets to evaluate the performance of IKAR algorithms, then the stability of Friedman test and Bonferroni-Dunn test are conducted. The experimental results indicate that the proposed algorithms are efficient and feasible.
Keywords: 
Subject: Computer Science and Mathematics  -   Artificial Intelligence and Machine Learning

1. Introduction

Rough set theory (RST) [1], initiated by Pawlak, is an effectively mathematical tool to deal with imprecise, fuzzy and incomplete data. RST has been successfully applied in machine learning [2,3,4], knowledge discovery [5,6], expert system [7], disease diagnostics [8,9,10], decision support [11,12] and other areas [13,14,15]. Attribute reduction is one of the research hotspots in RST. As an important technology in the process of data preprocessing, attribute reduction has captured researchers’ attention in big data and knowledge discovery [16,17,18]. The main objective of attribute reduction is to remove some irrelevant or non-important attributes while keeping the original distinguishing ability unchanged. In this way, the effect of data dimension reduction can be achieved, and a lot of time and space resources can be saved for the process of knowledge discovery and rule extraction.
With the rapid development of network and information technology, we have gradually entered the era of big data. The data sets have the characteristics of large volume, rapid change, and diverse data forms [19]. At the same time, due to the influence of the collection method and environment during the data collection process, there are a large number of missing data or wrong data in the data sets. Because of the existence of these disturbed data, it will seriously affect the decision-making and judgement of big data, and even mislead decision makers. After a long period of unremitting endeavor, scholars have achieved outstanding results in attribute reduction [20,21,22,23,24,40]. For example, Dai [20] proposed a semi-supervised attribute reduction based on attribute indiscernibility. Cao [21] put forward a three-way approximate reduction approach by using information-theoretic measure. Yang [22] presented a novel incremental attribute reduction method via quantitative dominance-based neighborhood self-information. Lin [16] et al. developed a feature selection way by using neighborhood multi-granulation fusion. In variable precision rough set, Yu [24] et al. raised a novel attribute reduction based on local attribute significance. In literature [40], Devi proposed a new dimension reduce technology by considering the picture of fuzzy soft matrices in the decision-making process. Above these classic reduction models are suitable only for complete systems.
The phenomenon of missing valuable data set in reality is ubiquitous, and the traditional reduction models can’t obtain effective reduction rules. In order to reduce the incomplete system, Zhang and Chen proposed a lambda-reduction approach based on the similarity degree respect to a conditional attribute subset for incomplete set-valued information systems [25]. For incomplete interval-valued information systems (in short IIIS), Li [35] proposed the concept of similarity degree and tolerance relation between two information values of a given attribute. Then three reduction algorithms based on theta-discernibility matrix, theta-information entropy and theta-significance are designed. Liu introduced a new attribute reduction approach by using conditional entropy based on the fuzzy alpha-similarity relation [28]. Subsequent, Dai [36] proposed interval-valued fuzzy min-max similarity relations and designed two attribute reduction algorithms based on interval-valued fuzzy discernibility pairs model. Song[38] put forward the similarity degree between information values on each attribute and an attribute reduction method is designed by using information granulation and information entropy. Zhou presented a heuristic attribute reduction algorithm with a binary similarity matrix and attribute significance as heuristic knowledge under incomplete information systems [26]. Zhang[37] presented a novel approach for knowledge reduction by using the discernibility techniques in multi-granulation rough set model. He and Qu [27] put forward the fuzzy-rough iterative computation model based on symmetry relations for an incomplete categorical decision information system. Srirekha et al. proposed an attribute reduction on SE-ISI concept Lattice based on concept of object ranking for incomplete information systems [29]. Cornelis et al. put forward a generalized model of attribute reduction using fuzzy tolerance relation within the context of fuzzy rough set theory [30]. Liu applied the concept of accurate reduction and reduced invariant matrix for reducing attribute under information systems [31]. To reduce unnecessary tolerance classes for the original cover, Nguyen [39] introduced a new concept of stripped neighborhood covers and proposed an efficient heuristic algorithm in mixed and incomplete decision tables.
Although the above reduction algorithms can effectively reduce the incomplete information systems, the classification accuracy of the reduced set is not ideal. The main reason is that only the importance of attributes is considered when selecting attributes, and the impact of attributes on classification is not considered. Usually, people take the best result of clustering as a reference standard for classification work, and classify similar samples into the same cluster. In order to solve the problems mentioned above, this paper proposes an attribute reduction method from the perspective of clustering.
At present, the studies of attribute reduction on using the clustering idea to construct a feature selection model are relatively infrequent. In order to avoid or reduce the loss of some original information after discretization of continuous values, Zhang [44] proposed a feature selection method based on fuzzy clustering, but this method has no reliable theoretical support. Jia [41] proposed a spectral clustering method based on neighborhood information entropy feature selection, which uses feature selection method to remove redundant features before clustering. In order to take the classification effect of the data set into consideration when selecting features, Zhao proposed a Fuzzy C-Means clustering fuzzy rough feature selection method [43], which can improve the classification accuracy of the reduced set-in certain degree, but the effect is not obvious. Jia proposed a similarity attribute reduction in the context of clustering in literature [42], which can greatly improve the classification accuracy of the reduced set, but it needs to continuously adjust the parameters to achieve the best classification effect. Such a feature set has certain random limitations, and increase the time consumption of the system. Therefore, it is necessary to design a stable model with high classification accuracy for data preprocessing.
Although these existing approaches can effectively reduce incomplete systems, they only consider the importance of the attributes themselves, and do not consider the correlation between attributes. The influence of conditional attributes on decision classification is not considered. In order to improve the classification accuracy of reduction set, we apply the idea of clustering.
Based on the principle that the similarity of samples within a cluster is as large as possible and the similarity of samples between clusters is as small as possible, an attribute reduction algorithm for an incomplete system is designed under the background of clustering. First, according to the principle of tolerance and the theory of knowledge granularity, we define the similarity of intra-cluster and inter-cluster for an incomplete system. Secondly, a formula for calculating the similarity of intra-cluster and inter-cluster objects is designed. After normalizing the two similarities, we define the similarity of objects. Then, according to the corresponding similarity mechanism, a new attribute reduction algorithm for incomplete system is proposed. Finally, a series of experiments have verified that the algorithm proposed in this paper is significantly better than other similar algorithms in terms of running time, accuracy and the stability of algorithm is analyzed by using Friedman test and Bonferroni-Dunn test in statistics.
The contribution of this paper is embodied in the following four aspects:
1) A tolerance class calculation in incomplete information
systems is proposed and applied to knowledge granularity calculation.
2) Knowledge granules is used as a measure of sample similarity to measure the similarity of inter-cluster samples and intra-cluster samples.
3) A knowledge granularity reduction algorithm based on clustering context is designed in incomplete information systems.
4) Lots of experiments have been done to verify the validity of the algorithm proposed in this paper, and the stability of the algorithm is verified by mathematical statistics.
The rest parts of this paper are constructed as follows. The principle of tolerance and related concepts of knowledge granularity are recalled in section Ⅱ. In Section Ⅲ, we propose a similarity measure of intra-cluster and inter-cluster objects, and discuss the reduction mechanism according to the clustering background for missing data set. We normalize the similarity of inter cluster and intra-cluster, then design the corresponding reduction model in section Ⅸ. In section Ⅴ, a series of experiments are conducted and the performance of the algorithm is evaluated from the reduction size, running time, classification accuracy and stability. Then the feasibility and effectiveness of the algorithm are verified. Finally, the advantages and disadvantages of the proposed algorithm in this paper are concluded and unfolded in the future work.

2. Preliminaries

In this section, we review some basic concepts in rough set theory, the definitions of tolerance class, knowledge granularity, clustering metrics, and the significance of attribute for incomplete decision systems.

2.1. Basic concept of RST

A decision information system is a quadruple D S = ( U , A ,   V , f ) , where U is a non-empty finite set of objects and A is a finite nonempty attribute sets; If A = C D , where C is the conditional attribute sets and D is the decision attribute set; V is the union of attribute domains, V = a A V a , V a is the value set of attribute a, called the domain of a; f : U ×   A V is an information function with f ( x , a ) = V a for each a A and x U . For every attribute subset B C , a indiscernibility relation is defined as follows:
I N D ( B ) = { ( x , y ) U × U | a B , f ( x , a ) = f ( y , a ) }
By the relation I N D ( B ) , we can get the partition of U denoted by U / I N D ( B ) or U / B . If B A X U , the upper approximation denotes as
B ¯ ( X ) = { x U | [ x ] P X }
The lower approximation of X can be denoted as
B _ ( X ) = { x U | [ x ] P X }
Where the objects in B ¯ ( X ) may belong to X, while the objects in B _ ( X ) must belong to X.
Definition 1.
Given an incomplete decision system IDS=  ( U , C D , V , f ) , P C , the binary tolerance relations between objects that are indiscernible in terms of value of attribute in P is defined as following:
T ( P ) = { a P , f ( a , x ) = f ( a , y ) f ( a , x ) = * f ( a , y ) = * }
Where * represents missing value. T ( B )  is symmetric and reflexive, but not transitive.
Definition 2
Given an incomplete decision syste  I D S =   ( U , C D , V , f ) , P C , and  o U , the tolerance class of object  o  with respect to attribute P is defined as follows:
T P ( o ) = { y | ( o , y ) T ( P ) }

2.1. Basic concept of knowledge granularity

Definition 3.
Suppose is an incomp-lete decision system,  P C . T P ( o i )  is the tolerance class of object  o i  with respect to P. The knowledge granularity of P on U is defined as follows:
G K U ( P ) = 1 | U | 2 i = 1 | U | | T P ( o i ) |
Where | U | represents the number of objects in data set U. Since the reflexivity and symmetry of T P ( o i ) , there are lot of repeated calculations when calculating i = 1 | U | | T P ( o i ) | . In order to reduce the amount of calculation, we propose the definition 4 as following.
Definition 4.
Suppose  I D S = ( U , C D , V , f )  is an inco-mplete decision system,  P C . C T P ( o i )  is the simplified tolerance relation of object  o i  with respect to P is defined as follows:
C T P ( o i ) = { o j | a P   , ( f ( a , o i ) = f ( a , o j )     (   f ( a , o i ) = * f ( a , o j ) = * ) ( i < j ) )   ( f ( a , o i ) = * f ( a , o j ) * ) }
Definition 5.
Given an incomplete decision system IDS=  ( U , C D , V , f ) , P C and o U , the simplified tolerance class of object  o  with respect to attribute P is defined as follows:
C T P ( o ) = { y | ( o , y ) C T ( P ) }
We delete the symmetric element pair and reflexive element pair from definition 3 and obtain definition 5, so the definition 5 does not have the characteristics of symmetry and reflexivity.
Definition 6.
Given an incomplete decision system  I D S =   ( U , C D , V , f ) , P C , o U ,   C T P ( o ) , is the simplified tolerance class of object  o  with respect to attribute P. The equal knowledge granularity of P on U is defined as following:
E G K U ( P ) = 1 | U | 2 o U   | C T P ( o ) |
Theorem 1.
Given an incomplete decision system  I D S =   ( U , C D , V , f ) , P C . Let U / P = { X 1 , X 2 , , X l } , X i   U / P , | X i | = n i  where  1 i l . All objects in subdivision X l  are missing value on attribute P, | X l | = n * . For the convenience of the following description, we mark X l  as X * . Objects with all non-missing values on attribute P are marked with X * ¯ . E G K U ( P )  represents the equal knowledge granularity of P on U, we have:
E G K U ( P ) = 1 | U | 2 ( i = 1 l C n i 2 + | X * | | X * ¯ | )
Where C n 2 = n ( n 1 ) 2 .
Proof. 
Suppose that o X i and o * X * . According to definition 4, we can get o X i | C T P ( o ) | = n i ( n i 1 ) 2 + n i n * and o * X * | C T P ( o * ) | = n * ( n * 1 ) 2 . Suppose o U , according to definition 5, then we can get E G K U ( P ) = 1 | U | 2 o U | C T P ( o ) | = 1 | U | 2 ( i = 1 l 1 o X i | C T P ( o ) | + o * X * | C T P ( o * ) | ) = 1 | U | 2 · ( i = 1 l 1 n i ( n i 1 ) 2 + n * i = 1 l 1 n i + n * ( n * 1 ) 2 ) = 1 | U | 2 ( i = 1 l n i ( n i 1 ) 2 + n * ( | U | n * ) ) . Since C n i 2 = n i ( n i 1 ) 2 , C n * 2 = n * ( n * 1 ) 2 and | X * ¯ | = | U | n * , we can get E G K U ( P ) = = 1 | U | 2 ( i = 1 l C n i 2 + | X * | | X * ¯ | ) )
Property 1.
Given an incomplete decision system IDS=  ( U , C D , V , f ) , P C . If the knowledge granularity of P is  G K U ( P )  on U and the equal knowledge granularity of P is   E G K U ( P ) , then we have:
E G K U ( P ) = ( G K U ( P ) 1 | U | ) / 2
Proof. 
Let U / P = { X 1 , X 2 , , X l 1 , X } , | X i | = n i , | X * | = n * where X i is the i-th subdivision of U / P and X * stands for the subdivision of missing values on attribute P, | X i | stands for the number of object in subdivision X i . We can get | U | i = 1 l 1 | X i | = | X * | . According to definition 2, suppose that o X i and T P ( o ) = { x | x X i x X * } , we can get | T P ( o ) | = n i + n * . Since the | T P ( o ) | value of each object in X i is n i + n * , we can get o X i | T P ( o ) | = n i ( n i + n * ) . In the same way, we can get o X * | T P ( x ) | = n * | U | . According to definition 3, we can get G K U ( P ) = 1 | U | 2 ( i = 1 l 1 o X i   | T P ( o ) | + o X * | T P ( o ) | ) = 1 | U | 2 [ i = 1 l 1 n i ( n i + n * ) + n * | U | ] ,then G K U ( P ) 1 | U | = 1 | U | 2 ( i = 1 l 1 ( n i 2 n i ) + i = 1 l 1 n i + n * i = 1 l 1 n i + n * | U | | U | ) = 1 | U | 2 ( i = 1 l 1 ( n i 2 n i ) n * 2 n * + 2 n * | U | ) = 2 | U | 2 ( i = 1 l 1 ( n i 2 n i ) 2 + n * 2 n * 2 + n * ( | U | n * ) ) = 2 E G K U ( P ) . Let | X l | = | X * | , we can get G K U ( P )   1 | U | = 2 | U | 2 ( i = 1 l ( n i 2 n i ) 2 + n * ( | U | n * ) ) = 2 E G K U ( P )
Due to the time complexity of calculating T P ( o ) is | U | , we know that the time complexity of calculating G K U ( P ) is | U | 2 . However, the time complexity of C T P ( o ) is | U / P | , the time complexity of calculating E G K U ( P ) is | U / P | 2 and | U / P | 2 | U | 2 . In addition, in the process of calculating E G K U ( P ) , the sub-division with a cardinality of 1 is constantly pruned, which further speeds up the calculation. Therefore, the time of calculating E G K U ( P ) is less than G K U ( P ) for the same data set.
Example 1.
Example of computing equivalent knowledgegranularity. Let  I D S = ( U , C D , V ,   f ) , U = { o 1 , o 2 , o 3 , o 4 , o 5 , o 6 , o 7 , o 8 , o 9 } , C = { a , b , c , e } , D = { d } . The detail data shown in Table 1. Let  P = { a , b } , we can get  f ( o 6 , P ) = * ,   f ( o 9 , P ) = * . We use the following two methods to calculate  E G K U ( P ) .
According to definition 5, we obtain that  C T P ( o 1 ) = { o 2 , o 6 , o 9 } , C T P ( o 2 ) = { o 6 , o 9 } , C T P   ( o 3 ) = { o 4 , o 5 , o 6 , o 9 } , C T P ( o 4 ) = { o 5 , o 6 , o 9 } , C T P ( o 5 ) = { o 6 , o 9 } , C T P ( o 6 ) = { o 9 } , C T P ( o 7 ) = { o 6 , o 8 , o 9 }   , C T P ( o 8 )   = { o 6 , o 9 } , C T P ( o 9 ) = { } . According to definition 6, we can get  E G K U ( P ) = 1 | U | 2 ·   i = 1 9 | C T P ( o i ) |   1 9 2 ( 3 + 2 + 4 + 3 + 2 + 3 + 2 + 1 + 0 ) = 20 81 .
2) Since  U / P = { { o 1 , o 2 } , { o 3 , o 4 , o 5 } , { o 7 , o 8 } , { o 6 ,   o 9 } } , let  X 1 = { o 1 , o 2 } , X 2 = { o 3 , o 4 , o 5 } , X 3 =   { o 7 , o 8 } , X * =   { o 6 , o 9 } , then  | X * | = 2 , | X * ¯ | = | U | 2 = 7 . According to theorem 1, we can get  E G K U ( P ) = 1 9 2   · [ C 2 2 + C 3 2 + C 2 2 + C 2 2 + 2 · 9 2 ) ] = 20 81 .
Although the above two methods get the same results, the calculation time is different. Since method 1 needs to scan the data set multiple times, it consumes more time. However, method 2 only needs to scan the data set one time and obtain each subdivision of U/P. According to the number of objects in each subdivision, we can get the combination value quickly, then the value of equivalent knowledge granularity is calculated.

3. The mechanism of knowledge granularity attribute reduction in the background of clustering

Most of traditional attribute reduction models use equivalence class relation to compute the importance of conditional attribute. Although these methods can effectively deal with complete decision-making information systems, they can not obtain correct reduction rules in incomplete ones. In order to deal with the loss of information effectively, this paper focuses on the reduction of incomplete decision systems.
Table 1. Incomplete information system.
Table 1. Incomplete information system.
U a b c e d
o1 0 0 0 1 0
o2 0 0 1 * 1
o3 0 1 0 1 0
o4 0 1 * 0 1
o5 0 1 0 * 1
o6 * * 1 1 1
o7 1 0 * 0 2
o8 1 0 1 0 2
o9 * * 0 1 2
The traditional reduction model does not consider the impact on the classification of the data set when deleting redundant attributes. If there are inconsistent objects in the data set, the classification accuracy of the reduced set will be affected. In order to improve the data quality, this paper uses the idea of clustering. Clustering is to divide all objects in the data set into different clusters according to a certain standard when the target category is unknown. Objects within a cluster are as similar as possible, and objects between clusters are as dissimilar as possible. Classification is to classify all objects in the data set according to a certain nature and level when the object category is known. Good clustering results can be used as a reference standard for accurate classification. The desired results of clustering are the objects of the same class are gathered in intra-clustering, otherwise they will be gathered in different inter-clustering. This paper studies the labeled data objects decision information system. Therefore, we use the results of the classification to guide the process of clustering the data objects. When the data objects are clustered, they follow the principle that the objects of intra-clustering are as close as possible and the objects of inter-clustering are as far away as possible. Next, we discuss how to measure the distance of intra-clustering and inter-clustering objects.

3.1. The intra-cluster similarity for incomplete systems

Generally, there are two approaches for clustering calculations: distance and similarity. The closer is the distance between two different objects, the weaker is their ability to distinguish. On the contrary, the farther is the distance, the stronger is the ability to distinguish. In this paper, similarity method is used to measure the distinguish ability of objects. Since the knowledge granularity can measure the similarity between objects, the coarser is the knowledge granularity, the stronger is the distinguishing ability. The better is the knowledge granularity, the weaker is the distinguishing ability. Next, we discuss how to use knowledge granularity information to measure the similarity of objects in an incomplete system.
Definition 7 (The similarity of intra-cluster objects).
Given an incomplete decision system  I D S = ( U , C D , V , f ) , U / D = { D 1 , D 2 , , D n } . For sake of convenience, let  U / D = π D , D i π D , P C . Suppose the equivalence division relationship of  D i  under attribute set P is  P = { X 1 , X 2 , , X m } , the similarity of objects in the cluster of   D i  about attribute P is defined as following( Where  o D i ),
S I n t r a D i ( P ) = E G K D i ( P ) = 1 | D i | 2 o D i | C T P ( o ) |
Definition 8 (The average similarity of intra-cluster objects).
Given an incomplete decision systems  I D S = ( U , C D , V , f ) , U / D = { D 1 , D 2 , , D n } . P C , o D i . The Knowle-dge granularity similarity of intra-clustering objects for subdivision  D i  with respect to attribute P is  S I n t r a D i ( P ) , then the average intra-clustering similarity is defined as following,
A S I n t r a π D ( P ) = 1 n i = 1 n S I n t r a D i ( P )
The desired effect of clustering is that the similarity of intra-clustering is high, and the similarity of inter-clustering is low.
Property 2.
Given an incomplete system  I D S = ( U , C D , V , f ) , U / D = π D , D i π D , P , Q C . If  P Q , we have
A S I n t r a π D ( P ) A S I n t r a π D ( Q )
Proof. 
Let D i / P = { X 1 , X 2 , , X k X k + 1 , X k + 2 , , X n , X * Y } , where X * , Y represents the object sets with missing values on attribute set P. Since P Q , we can get D i / Q D i / P . This to say, each subdivision of D i / Q is a subset of some subdivision of D i / P . Let D i / Q = { X 1 , X 2 , , X k , X k + 1 , X k + 2 , , X n , Y , X * } . According to theorem and definition 7, we can get S I n t r a D i ( P ) = 2 | D i | 2 ( j = 1 k 1 C | X j | 2 + C | X k | + | X k + 1 | 2 + j = k + 2 n C | X j | 2 + C | X * | + | Y | 2 + ( | X * | + | Y | ) ( | D i | | X * | | Y | ) ) = 2 | D i | 2 ( j = 1 k 1 C | X j | 2 + C | X k | 2 + C | X k + 1 | 2 + | X k | | X k + 1 | + j = k + 2 n C | X j | 2 + C | X * | 2 + C | Y | 2 + | X * | | Y | + ( | X * | + | Y | ) ( | D i | | X * | | Y | ) ) = 2 | D i | 2 ( j = 1 n C | X j | 2 + | X k | | X k + 1 | + C | X * | 2 + C | Y | 2 + | X * | | Y | + ( | X * | + | Y | ) ( | D i | | X * | | Y | ) ) . Since S I n t r a D i ( Q ) = 2 | D i | 2 ( j = 1 n C | X j | 2 + C | X * | 2 + C | Y | 2 + | X * | ( | D i | | X * | ) ) , then 2 | D i | 2 ( | X k | | X k + 1 | + | Y | | D i | | X * | | Y | | Y | 2 ) = 2 | D i | 2 ( | X k | | X k + 1 | + | Y | ( | D i | | X * | | Y | ) ) . Since | D i | > = | X * | + | Y | , | X k | | X k + 1 | 0 So, the results of S I n t r a D i ( P ) S I n t r a D i ( Q ) and i = 1 n S I n t r a D i ( P ) i = 1 n S I n t r a D i ( Q ) are obtained. Above all, we can get A S I n t r a π D ( P ) A S I n t r a π D ( Q )
According to property 2, we conclude that the intra-cluster similarity is monotonic when the conditional attribute set changes.
Example 2.
In Table 1, let P = { a , b } ,we have D 1 = { o 1 , o 3 } , D 2 = { o 2 , o 4 , o 5 , o 6 } , D 3 = { o 7 , o 8 , o 9 } , S I n t r a D 1 ( P ) = 0 , S I n t r a D 2 ( P ) = 0 + C 2 2 + 1 × 3 4 2 = 1 4 , S I n t r a D 3 ( P ) = C 2 2 + 1 × 2 3 2 = 1 3 , A S I n t r a π D ( P ) = 1 3 i = 1 3 S I n t r a D i ( P ) = 7 36

3.2. The inter-cluster similarity for incomplete systems

Definition 9.
(The inter-cluster similarity for incomplete systems) Given  I D S = ( U , C D , V , f )  be an incomplete decision system, Let  π D = { D 1 , D 2 , , D n } . Suppose  P C , D i , D j U / D , then the inter-cluster similarity of  D i  and  D j  respect of attribute set P for incomplete systems is defined as following,
S I n t e r D i , D j ( P ) = 1 ( | D i | + | D j | ) 2 ( o D i D j | C T P ( o ) | o D i | C T P ( o ) | o D j | C T P ( o ) | )
Assuming that D i and D j are objects of two different clusters, the inter-cluster similarity between D i and D j is calculated in two steps. The first step is to calculate the similarity after the two clusters are merged, and the second step is to remove the similarity information of the same cluster. The rest is the similarity information between objects in different clusters.
Property 3.
Given an incomplete decision system  I D S = ( U , C D , V , f ) , π D = { D 1 , D 2 , , D n } P C . Let  D i / P = { X 1 , X 2 , , X k , X k + 1 , , X m , X * } , X * ¯ = D i X * , D j / P = { Y 1 , Y 2 , , Y k , Y k + 1 , , Y n , Y * } , Y * ¯ = D j Y * , D i D j / P = { X 1 Y 1 , X 2 Y 2 , , X k Y k , X * Y * , X k + 1 , , X m , Y k + 1 , , Y n } , where  o X * Y * , f ( o , P ) = * ,’*’ is the flag of missing value. We have :
S I n t e r D i , D j ( P ) = 1 ( | D i | + | D j | ) 2 ( l = 1 k | X l | | Y l | + + | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | )
Proof. 
According definition 5 and theorem 1, we can get o D i D j | C T P ( o ) | = l = 1 k C | X l | + | Y l | 2 + C | X * | + | Y * | 2 + l = k + 1 m C | X i | 2 + l = k + 1 n C | Y i | 2 +   ( | X * | + | Y * | ) ( | Y * ¯ | + | X * ¯ | ) , o D i | C T P ( o ) | = l = 1 m C | X l | 2 + C | X * | 2 + | X * | | X * ¯ | , o D j | C T P ( o ) | =   l = 1 n C | Y l | 2 + C | Y * | 2 + | Y * | | Y * ¯ | . Since l = 1 k C | X l | + | Y l | 2 = l = 1 k ( | X l | 2 | X l | + | Y l | 2 | Y l | 2 + | X l | | Y l | ) = l = 1 k C | X l | 2 + l = 1 k C | Y l | 2   + l = 1 k | X l | | Y l | , according definition 9, we can conclude that ( | D i | + | D j | ) 2 S I n t e r D i , D j ( P ) =   o D i D j | C T P ( o ) | o D i | C T P ( o ) |   o D j | C T P ( o ) | = l = 1 k | X l | | Y l | + | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | . Then we have S I n t e r D i , D j ( P ) = 1 ( | D i | + | D j | ) 2 ( l = 1 k | X l | | Y l | + | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | )
Property 4.
Given an incomplete decision system  I D S =   ( U , C D , V , f ) , π D = { D 1 , D 2 , , D n } , P Q C , D i , D j U / D , then we have  S I n t e r D i , D j ( P ) S I n t e r D i , D j ( Q ) .
Proof. 
Let D i / Q = { X 1 , X 2 , , X k , X k + 1 , , X m , X ,   X * X } , X * X ¯ = D i X * + X , D j / Q   = { Y 1 , Y 2 , ,   Y k , Y k + 1 , , Y n , Y , Y * Y } , Y * Y ¯ = D j Y * + Y . Since P Q C , then D i / Q  is a refinement of D i / P , D j / Q  is a refinement of D j / P . Let D i / P = { X 1 , X 2 , , X k X k + 1 , ,   X m , X * } , X * ¯ = D i X * , D j / P = { Y 1 , Y 2 , , Y k Y k + 1 , , Y n , Y * } , Y * ¯ = D j Y * . Suppose D i D j / Q   = {   X 1 Y 1 , X 2 Y 2 , , X k Y k , X k + 1 Y k + 1 , X Δ Y Δ , ( X * X Δ ) ( Y * Y Δ ) , X k + 2 , , X m , Y k + 2 ,   , Y n } , D i D j / P = { X 1 Y 1 , X 2 Y 2 , , X k Y k , X k + 1 Y k + 1 , X * Y * , X k + 2 , , X m , Y k + 2 , , Y n } , then S I n t e r D i , D j ( P ) = l = 1 k 1 | X l | | Y l | + ( | X k | + | X k + 1 | ) ( | Y k | + | Y k + 1 | ) + | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | , S I n t e r D i , D j   ( Q ) = l = 1 k 1 | X l | | Y l | + | X k | | Y k | + | X k + 1 | | Y k + 1 | + | X Δ | | Y Δ | + | X X Δ | | Y Y Δ | + | X * X Δ | | Y * Y ¯ Δ | + | X * X Δ ¯ |   | Y * Y Δ | . S I n t e r D i , D j ( P ) S I n t e r D i , D j ( Q ) = | X k | | Y k + 1 | + | X k + 1 | | Y k | + | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | | X Δ |   | Y Δ | | X X Δ | | Y Y Δ | | X * X Δ | | Y * Y ¯ Δ | + | X * X Δ ¯ | | Y * Y Δ | . Since | X * ¯ | = | D i | | X * | , | Y * ¯ | = | D j |   | Y * | , | X * X Δ ¯ | = | D i | | X * | + | X Δ | , | Y * Y Δ ¯ | = | D j | | Y * | + | Y Δ | , | X k | | Y k + 1 | + | X k + 1 | | Y k | 0 ,then S I n t e r D i , D j ( P )   S I n t e r D i , D j ( Q ) | X * | | Y * | + | X * | | Y * ¯ | + | X * ¯ | | Y * | | X Δ | | Y Δ | | X X Δ | | Y Y Δ | | X * X Δ | | Y * Y ¯ Δ | +   | X * X Δ ¯ | | Y *   Y Δ |   = | X | · | D j | + | Y Δ | | D i | | X * | | Y Δ | | X Δ | | Y * | = | X Δ | ( | D j | | Y * | ) + | Y Δ | · ( | D i | | X * | ) .
Since | D j |   | Y * | , | D i |   | X * | , then we have S I n t e r D i , D j ( P ) S I n t e r D i , D j ( Q ) 0 .
From property 4, it can be concluded that the similarity of inter-clusters has mono-tonicity with the change of conditional attributes.
Definition 10 (The average similarity of inter-cluster objects).
Given an incomplete decision system  I D S = ( U ,   C D , V , f ) , π D = { D 1 , D 2 , , D n } , P C . In n different clusters, the inter-cluster similarity is calculated between every two clusters, and the time of comparisons is  1 2 n ( n 1 ) , then the average knowledge granularity similarity of inter-cluster in data set U respect of attribute set P is defined as following,
A S I n t e r π D ( P ) = 2 n ( n 1 ) i = 1 n 1 j = i + 1 n S I n t e r D i , D j ( P )
Example 3.
In Table 1, the same conditions as example 2, let  P = { a , b } , we have  D 1 = { o 1 , o 3 }   , D 2 = { o 2 , o 4 , o 5 , o 6 } , D 3 = { o 7 , o 8 , o 9 } , S I n t e r D 1 , D 2 ( P ) = 1 × 1 + 1 × 2 + 2 × 1 ( 2 + 4 ) 2 = 5 36 , S I n t e r D 1 , D 3 ( P ) =   0 + 0 + 1 × 2 ( 2 + 3 ) 2 = 2 25 , S I n t e r D 2 , D 3 ( P ) = 0 + 1 × 2 + 3 × 1 ( 4 + 3 ) 2 = 5 49 , A S I n t e r π D ( P ) = 1 3 i = 1 2 j = i + 1 3 S I n t e r D i , D j ( P )   = 14153 132300 .

4. Attribute reduction of knowledge granularity for incomplete systems

Traditional attribute reduction methods are mostly aimed at data sets with no missing data. Various data sets in reality are often incomplete due to various subjective or objective factors. Therefore, we research information systems with missing data and propose corresponding algorithms to improve the data quality of incomplete system reduction sets.
This paper discusses how to design a reduction method with the idea of clustering. For an ideal clustering effect, the objects of inter-cluster should be far away, and the objects of intra-cluster should be close together. Here, similarity is used to measure the distance between two different objects. The higher is the similarity, the closer are the objects. Conversely, the lower is the similarity, the farther are the objects. Based on the above analysis, we design a formula to measure the importance of attributes as following,
S I M R = S A I n t r a + λ · ( 1 A S I n t e r )
Where λ is the weight, 1 A S I n t r a is the dissimilarity of intra-cluster objects. We can set the importance of the intra-cluster similarity and inter-cluster similarity by using the size of the parameter λ . This method needs to adjust the parameters continuously, which consumes a lot of time. To this end, we first normalize A S I n t r a and A S I n t e r . Then, the two similarity calculations can be measured within a unified range, avoiding the parameter adjustment process.

4.1. Normalization of inter-cluster similarity and intra-cluster similarity

Given an incomplete decision system I D S = ( U , C D , V , f ) , π D = { D 1 , D 2 , , D n } , P   C , D i , D j U / D . Since the number of elements in each sub-division may be different, then the value range of its equivalent knowledge granularity may also be different. To calculate the average similarity of all subdivisions, it must be calculated in the same domain. For the sake of generality, we normalize the inter-cluster similarity and intra-cluster similarity. According to definition 7 and theorem 1, we can get S I n t r a D i ( P ) = E G K D i ( P ) . When all data objects in the subdivision D i are indistinguish-able with respect to the attribute set P, E G K D i ( P ) = 1 2 ( 1 1 | D i | ) takes the maximum value.
When all data objects in D i can be distinguished from each other, E G K D i ( P ) = 0 takes the minimum value. So, the result of E G K D i ( P ) [ 0 , | D i | 1 2 | D i | ] is obtained. If the value of E G K D i ( P ) is mapped to the range [0,1], the correction formula of S I n t r a D i ( P ) is defined as
S I n t r a D i ( P ) ' = E G K D i ( P ) / ( | D i | 1 2 | D i | ) = E G K D i ( P ) 2 | D i | | D i | 1
The average similarity of intra-cluster objects is corrected as following,
A S I n t r a π D ( P ) = 1 n i = 1 n S I n t r a D i ( P ) '
D i and D j are two object sets of different clusters. Suppose D i / P = { X 1 , X 2 , , X n , X * } , D j / P = { Y 1 , Y 2 , , Y m , Y * } . If f ( X l , P ) = f ( Y l , P ) ( 1 l k ) , then D i D j / P = { X 1 Y 1 , X 2 Y 2 ,   , X k Y k , , X k + 1 , , X n , Y k + 1 , , Y m , X * , Y * } . According to property 3, the similarity of inter-cluster respect to D i and D j denotes as S I n t e r D i , D j ( P ) = 1 ( | D i | + | D j | ) 2 ( l = 1 k | X l | | Y l | + | X * | | Y * ¯ | + | X * ¯ | | Y * | ) .
When i = 1 k X i Y i = ,   X * = , Y * = } , S I n t e r D i , D j ( P ) = 0 , takes minimum value. When all data objects in D i and D j are indistinguishable, S I n t e r D i , D j ( P ) = | D i | | D j | ( | D i | + | D j | ) 2 takes maximum value. Then, we can get S I n t e r D i , D j ( P ) [ 0 , | D i | | D j | ( | D i | + | D j | ) 2 ] . The normalized formula of S I n t e r D i , D j ( P ) is as following,
S I n t e r D i , D j ( P ) ' = S I n t e r D i , D j ( P ) ( | D i | + | D j | ) 2 | D i | | D j |
The definition of the average similarity of inter-cluster objects is revised as following,
A S I n t e r π D ( P ) = n ( n 1 ) 2 i = 1 n 1 j = i + 1 n S I n t e r D i , D j ( P ) '
After the similarities of inter-cluster and intra-cluster objects are normalized S I M R = A S I n t e r + λ · ( 1 A S I n t r a ) is revised as following,
S I M R = A I n t r a π D ( P ) + 1 A I n t e r π D ( P )
Since A S I n t e r π D ( P ) represents the similarity of intra-cluster objects, then the dissimilarity is 1- A S I n t e r π D ( P ) . If you use formula of S I M R to measure the effect of clustering, the larger is the value of S I M R , the better is the effect.

4.2. The knowledge granularity attribute reduction algorithm for incomplete systems(IKAR)

In section Ⅳ.A, we discussed the similarities of inter-cluster and intra-cluster objects from the perspective of clustering, which provided a clear goal for the next step of attribute selection.
Definition 11 (Equal knowledge granularity attribute reduction).  Given an incomplete decision system I D S = ( U ,
C D , V , f ) , an attribute subset R C  is an equal know-ledge granularity attribute reduction if and only if:
1 )   R = m i n P C { S I M P } 2 ) R ' R , S I M R ' > S I M R
In this definition 11, condition (1) is the jointly sufficient condition that guarantees the equal knowledge granularity value induced from the reduction is minimal, and condition (2) is the individual necessary condition that guarantees the reduction is minimal.
According to definition 11, we can find a smaller reduc-tion set with an ideal classification effect. Property 2 proves that when the attribute decreases, the similarity of intra-cluster increases monotonically. Property 4 proves that the similarity of inter-cluster also increases when decreasing the attribute, so that the dissimilarity will decreases. Obviously, the formula S I M R cannot determine its monotonicity. To find the R when the S I M R is the largest in the conditional attribute C is a combinatorial optimization problem, and trying the methods one by one is not the best way to solve the problem, so we use a heuristic method to find the smallest reduced set. For any attribute a C , its inner significance is defined as follows:
S i g C a = S I M C S I M C a
The bigger is the value of S i g C a , the more important is the attribute.
In order to get the optimal reduction set quickly, we adopt the deletion strategy. Firstly, the importance S i g C a of attribute a is defined by the formula of S I M ,then sort the different S i g C a . Secondly, let R=C, the value of S I M C is used as the initial condition, which ensures that the clustering effect after reduction is better than the raw data set. Then, remove the unimportant attribute a from the remaining attributes C R and calculate the value of S I M R a . If S I M R a S I M R , delete the attribute a and continue, otherwise the algorithm terminates. The detail of IKAR algorithm is shown in Figure 1.
Example 4.
In Table 1, Since  D 1 = { o 1 , o 3 } , D 2 = { o 2 , o 4 ,   o 5 , o 6 } , D 3 = { o 7 , o 8 , o 9 } . Let  R = C , accord -ing the definition of  S I M R , we can get  S I M R = 72 75 , S I M C a = 71 72 , S I M C b = 100 72 , S I M C c = 60 72  and  S I M C e = 80 72 , then  S i g C a = 4 72 , S i g C b = 25 72 , S i g C c = 15 72 , S i g C e = 5 72 . Since  S I M C b = 100 72 > S I M R , we delete the attribute b from R and let  R = R b , S I M R = 100 72 . In the same way, we obtain that  S I M R a   = 94 72 , S I M R c = 92 72 , S I M R e = 108 72 , S i g R a = 6 72 , S i g R c = 8 72  and  S i g R e = 8 72 . Since  S I M R e = 108 72 > S I M R , we delete the attribute e from R and let   R = R e . We calculate the value of S I M R a , S I M R c , get the results of  S I M R a = 82 72  and   S I M R c = 100 72 . Now, we have  S i g R a = 18 72  and  S i g R c = 8 72 . If the attribute c is deleted   S I M R c = 100 72 < S I M R , algorithm is terminated. We have   R = { a , c } .

4.3. Time complexity analysis

The time complexity of Step 1 is O ( 1 ) . In Step 2, we calculate the value of S I M C which include A S I n t r a π D ( C ) and A S I n t e r π D ( C ) . The computational time complexity of intra-cluster S I n t r a D i ( C ) similarity is O ( | C | | D i | ) , then the time complexity of calculating A S I n t r a π D ( C ) is O ( | U | | C | ) . Since the time complexity of calculating the similarity S I n t e r D i , D j ( C ) about inter-cluster D i and D j is O ( ( | D i | + | D j | ) | C | ) , then time complexity of A S I n t e r π D ( C ) is O ( i = 1 | U / D | 1 j = i + 1 | U / D | ( | D i | + | D j | ) | C | ) = O ( ( | U / D | 1 ) | U | | C | ) . We can get the time complexity of Step 2 is O ( | U / D | | U | | C | ) ; In Step 3, we consume time is O ( | U / D | | U | | C | 2 ) . Since Step 4 utilizes the results of Step 3, the time complexity is O ( 1 ) . Step 5 is to sort the importance S i g R a of each attribute, and the time complexity is O ( 1 2 | C | 2 ) . Since the time complexity of calculating S i g R a is O ( | U / D | | U | | R | ) , then the time complexity of Step 5 is O ( | U / D | | U | | R | ) + O ( 1 2 | C | 2 ) . In Step 6, the time complexity of deleting a redundant attribute is O ( | U / D | | U | | R | 2 ) , and Step 6 needs to be executed | C | | R | times, R C , then the time complexity of Step 6 is O ( | U / D | | U | ( | C | 3 | R | 3 ) ; In summary, the time complexity of IKAR is max { O ( | U / D | | U | | C | 2 ) , O ( | U / D | | U | ( | C | 3 | R | 3 ) } .

5. Experiments results analysis

In order to evaluate the feasibility and effectiveness of the proposed algorithm in this paper, the complete data set is preprocessed to obtain incomplete information systems, and many different attribute reduction algorithms are used for reduction. The reduction set obtained in the previous stage is classified and analyzed by multiple classifiers in the Weka Tool. It is compared with other three existing algorithms in terms of reduction set size, running time, accuracy and algorithm stability. The specific experimental framework is shown in Figure 2.
All of the datasets are displayed in Table 2. The twelve datasets selected were downloaded from UCI. In Table 2, | U | , | C | and | D | represents the number of objects, conditional attributes and the categories of decision attribute respectively. In order to generate an incomplete information system, we delete 12% attribute values from the raw datasets, and the missing values are randomly and uniformly distributed. The missing value of each attribute is removed with equal probability, which eliminates the impact on the classification accuracy of the later reduction set due to different attribute selection. The three classifiers J48, SMO of the weka 3.8 software platform are used to demonstrate the classification effect of the reduction set. In subsequent experiments, the reduced sets of each dataset were analyzed for accuracy using the cross method. All the objects in the reduced set are randomly divided into 10 equal parts, one of which is used as the test set, and the remaining 9 are used as the training set. In this way, each reduction set is repeated 10 times for the cross experiment. Finally, the size of reduction set, running time and classification accuracy obtained by the 10 experiments are averaged. We execute all experiments on a PC with Windows 10, Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz, 1.61 GHz and 8GB memory. Algorithms are code in python and the software being used is PyCharm Community Edition 2020.2.3 x64.

5.1. Reduction set size and running time analysis

This section mainly verifies the feasibility of IKAR for data set reduction from perspective of reduction size and calculation time. Here, we have selected other three representative incomplete system attribute reduction approaches. For the convenience of the following description, these three methods are referred to as NGLE[32], IEAR[33] and PARA[34] respectively. NGLE is the neighborhood multi-granulation rough sets-based attribute reduction using Lebesque and entropy, and the method considers both algebraic view and information view. IEAR is the information entropy attribute reduction for incomplete set-valued data using the similarity degree function, and proposed λ -infor- mation entropy. PRAR is the positive region attribute reduction using indiscernibility and discernibility relation.
The attribute reduction size of four algorithms is shown in Table 3. Table 4 shows the time consumed by the reductio process of the four algorithms in seconds. Ave represents the average value, Best in Table 3 represents the number of minimum reduction sets obtained, and Best in Table 4 stands for the number of times which the running time is the shortest. From Table 3, it can be seen that the average reduction set size of the IKAR algorithm is 11.833, and the average reduction set size obtained by the NGLE, LEAR and PARA algorithms are 11.917,11.00 and 12.083, respectively. IKAR obtained the minimum reduction set on the Ches, Spli, Mush and Shut data sets, the reduction effect is slightly better than the NGLE, PARA, but not so ideal as the LEAR algorithm. In the 12 data sets in Table 2, the LEAR algorithm obtains the minimum reduction set in 11 data sets.
From the consumption time of Table 4, we can find that the reduction advantage obtained by the IKAR algorithm is not obvious, but the IKAR algorithm is obviously better than the other three algorithms. When calculating the reduction set of 12 data sets, the average time required by IKAR is 15.20 seconds, the average time consumed by the other three algorithms are (155.40, 2003.56, 260.38) seconds respectively. From the experimental results in Table 4, it can also be found that the IKAR algorithm only needs 25.30 seconds to reduce the Shut data set, and the running time of the other three algorithms of NGLE, LEAR, and PRAR is (886.08, 3974.02, 133.25) seconds. When reducing the Mush dataset, IKAR takes 4.45 seconds, and the other three algorithms take (411.09, 4544.04, 827.16) seconds. Of course, the IKAR algorithm also has shortcomings, when the number of sample categories is more, the algorithm is more time-consuming. For example, if there are 10 different categories of samples in the Hand dataset, the calculation time of the IKAR algorithm takes 92.93 seconds. At this time, the time-consuming of the NGLE and PARA algorithms are (81.61, 91.28) seconds, respectively. It takes less time than that of the IKAR algorithm.
From the above analysis, we can get that the algorithm IKAR can effectively reduce the data set, and it is obviously better than similar algorithms in terms of reduction speed.

5.2. Changes of the classification accuracy when missing value

It is not enough to evaluate the overall performance of a reduction algorithm only from the size and running time of the reduction set. Here we further analyze the performance of the above four algorithms from the perspective of the classification accuracy of the reduction set. In order to find out the influence of missing values on IKAR algorithm, we selected four data sets in Table 2, including Heas, Prom, Hepa, Stai. During the experiment, we divided the missing values into 5 different levels, 2%, 4%, 6%, 8%, and 10%. First, 2% of the data objects in the original data set were randomly selected, random deletion is conducted in these data objects of attribute values for different attributes to generate an incomplete data set. In order to reduce the bias of attribute selection that may be caused by random data deletion, we use the 2% data just obtained as a basis, and then select other attributes with the same probability, and then delete 2% data, generate an incomplete data set with a missing value of 4%, and so on, generate a data set with a missing value of 6%, 8%, and 10% respectively. After the incomplete data set is ready, we use IKAR, NGLE, LEAR, PRAR four algorithms for reduction, and use SMO, J48 classifiers to analyze the classification accuracy of the reduced set. The specific results are shown in Figure 3 and Figure 4. The horizontal axis in Figure 3 and Figure 4 represents the proportion of deletion data obj。ects in the data set, and the vertical axis represents the classification accuracy.
Figure 3 and Figure 4 show the change trend diagrams obtained by using the SMO and J48 classifiers to analyze the accuracy of the reduced data set. From the results of Figure 3 and Figure 4, it can be seen that when increasing of missing data, the classification accuracy of the above four algorithms has a downward trend. For example, under the SMO classifier, when the proportion of missing values in the data set Heas changes from 2% to 4%, the accuracy changes of IKAR and other NGLE, IEAR and PRAR algorithms are (84.01→83.69),(80.32→79.69),(77.01→76.86),(79.24→78.92).Under the J48 classifier, when the proportion of missing values in the data set Heas changes from 2% to 4%, the IKAR and other NGLE, IEAR, PRAR, the accuracy changes are (81.43→80.94), (77.85→76.28), (74.79→73.33), (76.24→74.86) respectively. The main reason for this phenomenon is that as the proportion of missing data increasing, more and more data objects cannot be distinguished, resulting in a decrease in classification accuracy. From the trend diagrams in Figure 3 and Figure 4, it can be seen that the IKAR algorithm changes smoothly. The other three algorithms have a greater impact on the classification accuracy of the reduced set with the proportion of missing data increasing. For example, under the J48 classifier, when the missing proportion of the Hepa dataset changes from 2% to 10%, the accuracy of IKAR changes to 2.09, while the accuracy changes of the NMLE, IEAR and PRAR algorithms are 3.2, 3.19 and 4.49 respectively. Under the SMO classifier, the classification accuracy of the Heas dataset, the IKAR algorithm changes value is 1.12, and the other three algorithms are 3.37, 2.28 and 2.93 respectively.

5.3. Classification accuracy analysis

The previous experiments have compared the changes of the classification accuracy when missing value. With the change of the proportion of missing value, the accuracy of the IKAR not only can change smoothly, but also get better classification effect on multiple data sets. We use the SMO and J48 classifier to analyze the accuracy of the previous reduction set, the details content is shown in Table 5 and Table 6 respectively. Under the classifier SMO, the IKAR algorithm obtained an average accuracy of 90.14 and the average accuracy obtained by the other three algorithms NGLE, IEAR and PRAR was only (83.05,82.03,82.98) respectively. Among the 12 data sets of Table 2, 11 times of highest accuracy are achieved with the IKAR algorithm. For example, the classification accuracy of IKAR is (94.76,99.68,93.86) in the data sets of Hand, Shut and Spli. The classification accuracy of IEAR is (81.59,85.93,81.53) and the average accuracy is 10 percentage points lower than IKAR. Similarly, on the J48 classifier, the average classification accuracy of the IKAR is 90.16, and the other three algorithms were obtained (84.79,83.53,84.98) respectively. Through the analysis in section 5.2, we can conclude that when the proportion of missing value changes, the trend of accuracy of IKAR algorithm is relatively stable, and it has a higher classification accuracy than other similar reduction models, which can effectively improve the classification quality of the reduction set.

5.4. Lgorithm stability analysis

To indicate the statistical significance of classification results, non-parametric test methods Friedman test and Bonferroni-Dunn test [68] are used to analyze classification accuracy of each classifier with different methods in 5.2 section, where Friedman test is a statistical test that uses the ranking of each method on each dataset. The Friedman statistic is described as following:
χ F 2 = 12 N t ( t + 1 ) i = 1 s R i 2 3 t ( t + 1 )
F F = ( N 1 ) χ F 2 N ( t 1 ) χ F 2
Where N is the number of data sets, t is the categories of algorithms. R i represents the average rank of the classification effect ranking of the i-th algorithm on all data sets, and the statistics F F obey the Fisher distribution with t-1 and (t-1)(N-1) degrees of freedom. If the value of F F is bigger than F α ( t 1 , N 1 ) , then the original hypothesis does not hold.
The F F test value can judge where these algorithms are different, but it cannot indicate the superiority of the algorithm. In order to explore which algorithm is better, we use the Bonferroni-Dunn test to calculate the critical value range of the average sequence value difference, which is defined as following,
C D α = q α t ( t + 1 ) 6 N
If the difference between the average ranking values of the two algorithms exceeds the critical region C D α , the hypoth-esis that ‘the performance of the two algorithms is the same’ will be rejected with the corresponding confidence. Otherwise, the two algorithms perform differently, and the algorithm with the higher average rank is statistically better than the algorithm with the lower average rank. Generally, we set α =0.05.
In order to compare the stability of the IKAR algorithm in this paper, we choose three other similar algorithms NGLE, LEAR and PRAR to reduce the data sets in Table 2. Then the previous reduction results are analyzed by the classifiers SMO and J48 in the Weka tool. The classification accuracy is detected by Friedman test and Bonferroni-Dunn test. When t=4 and N=12, then χ F 2 =23.4 and F F =20.429. Under the classifier SMO, the classification accuracy of the four algorithm reduction sets is sorted, and the number 1 represents the most ideal. The detail of sorting results is shown in Table 5. The average ranking values of IKAR, NMLE, IEAR and PRAR algorithms are 1.08,2.75,3.58 and 2.58 in turn.
Since the critical value of F 0.05 ( 3 , 33 ) is 2.892 and F F 2.892 ,we can reject the original hypothesis at α = 0.05 under the Friedman test. So, there are statistical differences in classification accuracy among the above four algorithms. Next, q 0.05 = 2.569 and C D 0.05 = 1.354, the results of Bonferroni-Dunn test for these four algorithms at α = 0.05 is shown in Figure 5. The accuracy of the algorithm on the left side of the coordinate axis is relatively high in Figure 5. From Figure 5, we know that the average accuracy ranking of IKAR is s1=1.08. Among the other three algorithms, the better-ranking algorithm PRAR has an average ranking value of s2=2.58. Since | s 1 s 2 | =1.5 and | s 1 s 2 | C D 0.05 , we can find that the IKAR algorithm is significantly superior to the PRAR. In the same way, the IKAR is better on accuracy than NGLE and LEAR. However, there is no obvious difference in the ranking of the NMLE, LEAR and PRAR algorithms.
For the same reason, under the classifier J48, the average ranking values of the classification accuracy of the above four algorithms are 1.00, 2.83, 3.67 and 2.50 as shown in Table 6. Since χ F 2 =26.8 and F F =32.043, then F F F 0.05 . We can get that the classification accuracy of these four algorithms is significantly different in the statistical sense under the J48 classifier. From Figure 6, we can get that the IKAR algorithm is significantly different from the NGLE classification, while the ranking of the NGLE, LEAR and PRAR algorithms have no obvious differences among each other.
Therefore, all the test results demonstrate that there is no consistent evidence to denote statistical differences between any two of four approaches under the SMO and J48 classifier. In general, the IKAR model is better than the other models in stability.

6. Conclusion

In the face of incomplete systems, most of the traditional attribute reduction models cannot obtain effective reduction rules, and affect the classification accuracy of the reduced set. Therefore, we propose a new attribute reduction method IKAR based on the clustering background for incomplete system. IKAR uses the tolerance principle to calculate the information of knowledge granularity, and to measure the similarity of data objects. When selecting attributes, the similarity of intra-cluster objects should be as large as possible, and the similarity of inter-cluster objects should be as small as possible. The innovations of this paper are manifested in the following four aspects: (1) Use the tolerance principle to quickly calculate knowledge granul-arity; (2) Use knowledge granularity to calculate the similarity of inter-cluster and intra-cluster objects; (3) Use the idea of clustering to calculate the importance of attributes; (4) In addition to conventional time, space and precision analysis, it also analyzes the stability of data sets with different percent missing value. All experiments show that the IKAR algorithm is not only superior in reducing time compared with the other three algorithms, it also has an excellent performance in terms of accuracy and stability. Of course, the IKAR algorithm also has some shortcomings. For example, it is unsuitable for data sets with multiple decision values, complex data type and the dynamical changing of the data sets are not considered.
For further researches, we focus on attribute reduction of incomplete systems with mixed data types, especially how to deal with missing data? In addition, we will integrate the incremental learning method into the knowledge granularity reduction model in the background of clustering. Even to address big data sets more efficiently, applying GPU and MapReduce technologies to design parallel attribute reduction model or acceleration strategy is a very popular research topic.

Author Contributions

Conceptualization, B.H.L.; Supervision, R.Y.H.; investigation, E.L.J.; Writing-review and editing, L.F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Natural Science Foundation of China (61836016), the Key Subject of Chaohu University (kj22zdxk01), the Quality Improvement Project of Chaohu university on Discipline Construction(kj21gczx03), the Provincial Natural Science Research Program of Higher Education Institutions of Anhui province (KJ2021A1030).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pawlak, Z. Rough sets. Int. J. Comput. Inform. Sci. 1982, 11(5),341–356.
  2. Wang,R.; Wang, X.; Kwong, S.; Xu,C. Incorporating diversity and informative-ness in multiple-instance active learning. IEEE Trans Fuzzy Syst.2017 ,25(6), 1460-1475.
  3. Wang,X. ; Tsang, E.C.C.; Zhao, S; Chen, D.G.; et al. Learning fuzzy rules from fuzzy samples based on rough set technique. Inf. Sci. 2017, 177(20), 4493-4514.
  4. Wang, X.; Xing, H.; Li ,Y; et al. A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning. IEEE.Trans.Fuzzy.Syst.2015,23(5),638-1654.
  5. Liu,X.; Qian, Y.; Liang, J. A rule-extraction framework under multi-granulation rough sets. Int. J.Mach.Learn.Cybern 2014,5(2), 319-326.
  6. X. Zhang, C. Mei, D. Chen, J. Li, Multi-confidence rule acquisition and confidence-preserved attribute reduction in interval valued decision systems. Int. J. Approx. Reason 2014, 55(8),1787-1804.
  7. Hu,Q.H.; Yu,D.R.; Z.X. Xie. Neighborhood classifiers. Expert. Syst. Appl 2008,34,866-876.
  8. Cheruku,R.; Edla,D.R.; Kuppili,V.; Dharavath,R. RST-Bat-Miner: a fuzzy rule miner integrating rough set feature selection and bat optimization for detection of diabetes disease. Appl.Soft.Comput 2018,67,764-780.
  9. Hamouda, S.K.M.; Wahed,M.E.; Alez,R.H.A.; Riad,K. Robust breast cancer prediction system based on rough set theory at National Cancer Institute of Egypt. Comput Methods Programs, Biomed 2018,153, 259-268.
  10. Jothi,G.; Inbarani,H.H. Hybrid tolerance rough set-firefly based supervised feature selection for MRI brain tumor image classification. Appl.Soft.Comput 2016, 46, 639-651.
  11. Hao, C.; Li,J. ; Fan,M.; et al. Optimal scale selection in dynamic multi-scale decision tables based on sequential three-way decisions. Inf. Sci. 2017, 415, 213-232.
  12. Liang,D. ; Xu,Z.; Liu,D. Three-way decisions based on decision-theoretic rough sets with dual hesitant fuzzy information. Inf.Sci. 2017,396,127-143.
  13. Lei, L. Wavelet neural network prediction method of stock price trend based on rough set attribute reduction. Appl.Soft.Comput., 2018,62, 923-932.
  14. Singh,A.K.; Baranwal,N. ; Nandi GC,G.C. A rough set based reasoning approach for criminal identification. Int.J.Mach.Learn. Cybern 2019,10(3), 413-431.
  15. Wang,X.; Wang,R.; Xu,C. Discovering the relationship between generalization and uncertainty by incorporating complexity of classification. IEEE. Trans. Cybern 2018,48(2),703-715.
  16. Lin,Y.J.; Li,J.J.; Lin,P.R.; et al. Feature selection via neighborhood multi-granulation fusion. Know.Based Syst 2014,67,162-168.
  17. L. Sun, J.C. Xu, “A granular computing approach to gene selection,” Bio-Med. Mater. Eng. 2014,24 (1),1307-1314.
  18. Fujita,H.; Gaeta,A.; Loia,V.; Orciuoli, F. Resilience Analysis of Critical Infrastructures: a cognitive approach based on Granular Computing. IEEE Trans. Cybern 2019,49(5), 1835-1848.
  19. Qian, H.Y. Granulation Mechanism and Data Modeling of Complex Data. Dept. Comp. Inf.Eng, Shan xi Univ., Taiyuan, China,2011.
  20. Dai,J.H.; Wang,W.S.; Zhang,C.C.; et al. Semi-supervised attribute reduction via attribute indiscernibility. Int.J.Mach.Learn.Cybern 2023,14(4)1445-1464.
  21. Gao,C.; Wang, Z.C; Zhou, J. Three-way approximate reduct based on information-theoretic measure. Int.J.Appro.Reas. 2022, 142, 324-337.
  22. 22. Yang,L.; Qin, K.Y.; San,B.B. ;et al. A novel incremental attribute reduction by using quantitative dominance-based neighborhood self-information. Know.-Based Sys. 2022,261, 110200. [CrossRef]
  23. Wang ,Z.H. ; Zhang,X.P. The granulation attribute reduction of multi-label data. Applied intelligence, 2023,. [CrossRef]
  24. Yu,B.; Hu, Y.; Kang,Y.; et al. A novel variable precision rough set attribute reduction algorithm based on local attribute significance. Int. J.Appr.Reas 2023,157, 88-104.
  25. Zhang, Y.X.; Chen, Z.Z. Information Entropy-Based Attribute Reduc-tion for incomplete Set-valued data. IEEE Access 2022,10, 8864-8882.
  26. Zhou,Y.; Bao, Y.L. A Novel Attribute Reduction Algorithm for Incomplete Information Systems Based on a Binary Similarity Matrix. Symmetry 2023,15,647. DOI: 2023. doi.org/10.3390/sym.15030674.
  27. He,J.L.; Qu,L.D.; Wang,Z.H. ; et al. Attribute reduction in an incomplete categorical decision information system based on fuzzy rough sets. Artifical Intelligence review 2022,55,7, 5313-5348.
  28. Liu,X.F.; Dai, J.H.; Chen, J.L.; Zhang,C.C. A fuzzy-similarity relation-based attribute reduction approach in incomplete interval-valued information systems. Applied soft computing 2021,109, 107593. doi.org/10.1016/j.asoc.2021.107593.
  29. Srirekha, B.; Sathish, S.; Devi, R.M. Attribute reduction on SE-ISI Concept Lattice for an Incomplete Context using object ranking. Mathematics 2023,11(7),1585.
  30. Cornelis, C.; Jensen, R.; Martin, G.H.; Slezak, D. Attribute selection with fuzzy decision reducts. Inf. Sci. 2010,180, 209-224.
  31. Liu,G.L.; Feng, Y.B.; Yang,J.T. A common attribute reduction form for information systems.Know.Based Syst 2020, 193,105466.
  32. Zhang ,Y.; Chen, Z. Information Entropy-Based Attribute Reduction for Incomplete Set-Valued Data. IEEE Access 2022,10,8864-8882.
  33. Shu, W.H.; and Qian,W.B. A fast approach to attribute reduction from perspective of attribute measures in incomplete decision systems. Know.-Based Sys. 2014, 72, 60-71.
  34. Sun,L.; Wang,L.Y.; Ding,W.P.; et al. Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems. Know.Based Sys. 2020,192, 105373,2020. [CrossRef]
  35. Li, Z.W.; Liao, S.M.; Song, Y.; et al. Attribute selection approaches for incomplete interval-value data. Journal of intelligent & fuzzy systems, 2021,40,(5),8775-8792.
  36. Dai, J.H.; Wang, Z.Y.; Huang, W.Y. Interval- valued fuzzy discernibility pair approach for attribute reduction in incomplete interval-valued information systems. Inf.Sci. 2023,.642. [CrossRef]
  37. Zhang, C.L.; Li, J.J. ; Lin, Y.D. Knowledge reduction of pessimistic multigranulation rough sets in incomplete information systems.Soft Computing 2021,25 (20),12825-12838.
  38. Song, Y.; Luo, D.M.; Xie, N.X.; et al. Uncertainty measurement for incomplete set-valued data with application to attribute reduction. Int.J.Mach.Learn.Cybern 2022,13(10),3031-3069.
  39. N.T. Nguyen ,; Sartra, W. A novel feature selection Method for High-Dimensional Mixed decision tables. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022,33(7), 3024-3037.
  40. Devi, P.; Kizielewica, B.; Guleria, A.; et al. Dimensionality reduction technique under picture fuzzy environment and its application in decision making. International Journal of Knowledge-based and Intelligent Engineering Systems 2023, 27(1),87-104.
  41. Jia, H.J.; Ding, S.F.; Ma, H.; Xing, W.Q. Spectral Clustering with Neighborhood Attribute Reduction Based on Information Entropy. Journal of Computers (in Chinese) 2014, 9(6),1316-1324.
  42. Jia, X.Y. ; Rao, Y.Y. ; Shang, L.; Li,T.G. Similarity-based attribute reduction in rough set theory: a clustering perspective. Int.J.Mach. Learn.Cybern.2020,11,1047-1060.
  43. Zhao, R.N. ; Gu,L.Z. and Zhu, X.N. Combining Fuzzy C-Means Clustering with Fuzzy Rough Feature Selection. Appl. Sci. 2019,9, 679. [CrossRef]
  44. Zhang, M.; Chen, D.G. ; Yang,Y.Y. A new algorithm of attribute reduction based on fuzzy clustering. Proceedings of the 2013 International Conference on Machine Learning and Cybernetics, vol.7,pp.14-17,2013.
Figure 1. The execution process of algorithm (IKAR).
Figure 1. The execution process of algorithm (IKAR).
Preprints 94373 g001
Figure 2. The frame work chart of experiments.
Figure 2. The frame work chart of experiments.
Preprints 94373 g002
Figure 3. Variation of classification accuracy of different percentage of missing values with classifiers SMO.
Figure 3. Variation of classification accuracy of different percentage of missing values with classifiers SMO.
Preprints 94373 g003
Figure 4. Variation of classification accuracy of different percentage of missing values with classifiers J48.
Figure 4. Variation of classification accuracy of different percentage of missing values with classifiers J48.
Preprints 94373 g004aPreprints 94373 g004b
Figure 5. Bonferroni-Dunn test with SMO.
Figure 5. Bonferroni-Dunn test with SMO.
Preprints 94373 g005
Figure 6. Bonferroni-Dunn test with J48.
Figure 6. Bonferroni-Dunn test with J48.
Preprints 94373 g006
Table 2. Description of twelve data sets.
Table 2. Description of twelve data sets.
ID Data sets Abbreviation |U| |C| |D|
1 Promoters Prom 106 57 2
2 Heart-statlog Hear 270 13 2
3 Hepatitis Hepa 155 19 2
4 HandWritten Hand 5620 64 10
5 Chess kr-kp Ches 3196 36 2
6 Splice Spli 3190 61 3
7 Letters Lett 20000 17 26
8 Vote Vote 435 16 2
9 Mushroom Mush 8124 22 2
10 Qsar Qsar 1055 42 2
11 Shuttle Shut 43500 9 7
12 Satimage Sati 6435 36 6
Table 3. The attribute reduction size with the four methods on the twelve UCI data sets.
Table 3. The attribute reduction size with the four methods on the twelve UCI data sets.
Data sets IKAR NGLE LEAR PRAR
Prom 5 4 5 5
Hear 10 10 9 10
Hepa 9 10 7 10
Hand 12 11 10 11
Ches 29 30 29 30
Spli 9 10 9 9
Lett 9 9 8 9
Vote 10 9 8 10
Mush 4 5 5 5
Qsar 31 30 29 31
Shut 4 5 4 5
Sati 10 10 9 11
Ave 11.833 11.917 11.00 12.083
Best 4 1 11 1
Table 4. The run time with the four methods on the twelve UCI datasets.
Table 4. The run time with the four methods on the twelve UCI datasets.
Data sets IKAR NGLE LEAR PRAR
Prom 0.98 2.042 3.231 1.553
Hear 0.03 0.16 2.09 0.33
Hepa 0.06 0.11 2.13 0.25
Hand 92.93 81.61 3222.89 91.28
Ches 6.35 140.98 3075.25 590.79
Spli 35.43 205.86 6557.61 309.79
Lett 3.98 6.93 117.87 8.47
Vote 0.06 0.38 6.09 1.01
Mush 4.45 411.09 4544.04 827.16
Qsar 3.84 4.97 98.76 6.61
Shut 8.98 886.08 3974.70 1133.25
Sati 25.30 125.61 2322.02 154.04
Ave 15.20 155.49 1993.89 260.38
Best 11 1 0 0
Table 5. Comparison of the classification accuracies on reduced data sets with SMO(%).
Table 5. Comparison of the classification accuracies on reduced data sets with SMO(%).
Data sets IKAR NMLE IEAR PRAR
Prom 90.34(1) 83.33(2) 81.37(4) 82.15(3)
Heas 84.25(1) 80.02(3) 79.88(4) 81.39(2)
Hepa 85.81(1) 76.34(4) 84.62(4) 80.42(3)
Hand 94.76(1) 88.73(2) 81.59(4) 83.17(3)
Krkp 87.34(2) 81.29(4) 87.86(1) 82.83(3)
Spli 93.86(1) 84.86(2) 81.53(4) 82.493()
Lett 80.78(1) 73.49(3) 70.29(4) 74.26(2)
Vote 95.64(1) 90.21(2) 86.82(4) 89.33(3)
Mush 100.00(1) 98.34(3) 97.49(4) 98.96(2)
Qsar 82.95(1) 73.58(3) 70.61(4) 72.45(2)
Shut 99.68(1) 87.92(3) 85.93(4) 88.89(2)
Sati 86.27(1) 78.47(3) 76.39(4) 79.39(3)
Ave 90.14(1) 83.05(2) 82.03(4) 82.98(3)
AveRank 1.08 2.75 3.58 2.58
Table 6. Comparison of the classification accuracies on reduced data sets with J48(%).
Table 6. Comparison of the classification accuracies on reduced data sets with J48(%).
Data sets IKAR NMLE IEAR PRAR
Prom 83.02(1) 80.46(2) 76.94(4) 78.23(3)
Heas 81.46(1) 75.76(3) 73.39(4) 76.83(2)
Hepa 83.87(1) 76.49(4) 79.21(2) 78.52(3)
Hand 94.69(1) 90.43(2) 88.36(4) 89.55(3)
Krkp 91.50(1) 85.31(3) 83.96(4) 86.45(2)
Spli 93.54(1) 88.51(2) 85.57(4) 87.37(3)
Lett 89.23(1) 83.86(3) 81.39(4) 84.07(2)
Vote 95.63(1) 87.91(2) 85.86(4) 86.99(3)
Mush 99.89(1) 98.75(3) 97.83(4) 98.92(2)
Qsar 83.96(1) 80.39(2) 78.27(4) 79.41(3)
Shut 97.92(1) 88.53(4) 89.61(3) 90.58(2)
Sati 87.24(1) 81.08(4) 81.92(3) 82.79(2)
Ave 90.16(1) 84.79(3) 83.53(4) 84.98(2)
AveRank 1.00 2.83 3.67 2.50
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

© 2024 MDPI (Basel, Switzerland) unless otherwise stated