Version 1
: Received: 5 August 2024 / Approved: 5 August 2024 / Online: 5 August 2024 (23:25:32 CEST)
How to cite:
Lartey, C.; Liu, J.; Asamoah, R. K.; Greet, C.; Zanin, M.; Skinner, W. Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms. Preprints2024, 2024080344. https://doi.org/10.20944/preprints202408.0344.v1
Lartey, C.; Liu, J.; Asamoah, R. K.; Greet, C.; Zanin, M.; Skinner, W. Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms. Preprints 2024, 2024080344. https://doi.org/10.20944/preprints202408.0344.v1
Lartey, C.; Liu, J.; Asamoah, R. K.; Greet, C.; Zanin, M.; Skinner, W. Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms. Preprints2024, 2024080344. https://doi.org/10.20944/preprints202408.0344.v1
APA Style
Lartey, C., Liu, J., Asamoah, R. K., Greet, C., Zanin, M., & Skinner, W. (2024). Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms. Preprints. https://doi.org/10.20944/preprints202408.0344.v1
Chicago/Turabian Style
Lartey, C., Massimiliano Zanin and William Skinner. 2024 "Effective Outlier Detection for Ensuring Data Quality in Flotation Data Modelling Using Machine Learning (ML) Algorithms" Preprints. https://doi.org/10.20944/preprints202408.0344.v1
Abstract
Froth flotation, a widely used mineral beneficiation technique, generates substantial volumes of data, offering the opportunity to extract valuable insights from these data for production line analysis. The quality of flotation data is critical to designing accurate prediction models and process optimisation. Unfortunately, industrial flotation data are often compromised by quality issues such as outliers that can produce misleading or erroneous analytical results. A general approach is to preprocess the data by replacing or imputing outliers with data values that have no connection with the real state of the process. However, this does not resolve the effect of outliers, especially those that deviate from normal trends. Outliers often occur across multiple variables and their values may occur in normal observation ranges, making their detection challenging. An unresolved challenge in outlier detection is how far is far enough for an observation to be considered an outlier. Existing methods rely on domain experts’ knowledge which is difficult to apply when experts encounter large volumes of data with complex relationships. In this paper, we propose an approach to conduct outlier analysis on a flotation dataset. The approach uses a 2σ rule as the threshold to find quasi-outliers and multiple machine learning (ML) algorithms including k-Nearest Neighbour (kNN), Local Outlier Factor (LOF), and Isolation Forest (ISF) to identify true outliers. The approach then analyses the mutual coverage between quasi-outliers and outliers from the ML algorithms to identify the most effective outlier detection algorithm. We found that the outliers by kNN cover outliers of other methods. We use the experimental results to show that outliers affect model prediction accuracy and excluding outliers from training data can reduce the average prediction errors.
Keywords
froth flotation; outlier detection; prediction error; data quality
Subject
Engineering, Mining and Mineral Processing
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.