Abstract
Feature Selection (FS) in data mining is one of the most challenging and most important activities in pattern recognition. The problem of choosing a feature is to find the most important subset of the main attributes in a specific domain, and its main purpose is removing additional or unrelated features, and ultimately improving the accuracy of the classification algorithms. As a result, the problem of FS can be considered as an optimization problem, and use metaheuristic algorithms to solve it. In this paper, a new hybrid model combining whale optimization algorithm (WOA) and flower pollination algorithm (FPA) is presented for the problem of FS based on the concept of Opposition based Learning (OBL) which name is HWOAFPA. In our proposed method, using natural processes of WOA and FPA, we tried to solve the problem of optimization of FS; and on the other hand, we used an OBL method to ensure the convergence rate and accuracy of the proposed algorithm. In fact, in the proposed method, WOA create solutions in their search space using the prey siege and encircling process, bubble invasion and search for prey methods, and try to improve the solutions for the FS problem; along with this algorithm, FPA improves the solution of the FS problem with two global and local search processes in an opposite space with the solutions of the WOA. In fact, we used all of the possible solutions to the FS problem from both the solution search space and the opposite of solution search space. To evaluate the performance of the proposed algorithm, experiments were carried out in two steps. In the first stage, the experiments were performed on 10 FS datasets from the UCI data repository. In the second step, we tried to test the performance of the proposed algorithm in terms of spam e-mails detection. The results obtained from the first step showed that the proposed algorithm, performed on 10 UCI datasets, was more successful in terms of the average size of selection and classification accuracy than other basic metaheuristic algorithms. Also, the results from the second step showed that the proposed algorithm which was run on the spam e-mail dataset, performed much more accurately than other similar algorithms in terms of accuracy of detecting spam e-mails.