1. Introduction
Renewable energies have become a fundamental part of the current energy mix. These energy sources are the most sustainable and eco-friendly, in contrast to fossil fuels, which release greenhouse gases and contribute to climate change. In this context, Energy Observer Development (EODev) aims to make their use more widespread, particularly that of hydrogen as an energy carrier for a low-carbon society. The GEH2 zero-emission electro-hydrogen generator (
Figure 1) is the most compact and efficient electro-hydrogen generator on the market, in terms of power output. The GEH2 uses proton exchange membrane fuel cell (PEMFC) running on di-hydrogen to perform its functions. The latter requires auxiliary systems to operate, such as pumps, cooling systems and power supplies. To optimize the PEMFC’s service life and meet customer requirements, the GEH2 also incorporates a 47kWh battery. Power conversion and control systems ensure smooth operation. With the aim of reducing its environmental impact, durability, reliability and efficiency are key development priorities for the design and use of the GEH2. To ensure the smooth operation of GEH2 equipment, EODev is currently following a systematic maintenance plan comprising over 50 operations. EODev would therefore like to move towards a more predictive type of maintenance, so as to deploy actions only in relation to the actual state of a GEH2 unit. To provide answers to this industrial challenge, the goal of this work is to develop prognostic approaches for predicting the state of health of key components of the GEH2 electro-hydrogen unit. Since PEMFC is one of the critical comopsants of GEH2. In this paper, we present a machine learning-based approach for predicting the performance of PEMFC.
The PEMFC is a promising technology in fuel cells, offering a clean and efficient alternative to traditional energy sources. It is operated in conjunction with a battery (as the case of the zero-emission electro-hydrogen generator) or a supercapacitor module to meet the efficiency requirements. In fact, evaluating the performance of a PEMFC typically involves measuring a polarization curve which represents the relationship between the current density and the voltage of the fuel cell. This curve is significantly affected by various operating variables of the PEMFC, such as current, temperature, pressure, etc. According to [
8,
24], the polarization curve is selected as the focal point for the performance prediction model due to its ability to encompass crucial properties of PEMFC, including current density, voltage, and other significant factors. Currently, there are three main kinds of approaches employed to analyze the performance of PEMFC: model-driven approach, hybrid approach, and data-driven approach.
The model-driven approach forecasts the PEMFCs’ performance based on physical and mathematical models of the electrochemical, transport, and thermal processes that occur. These models can simulate PEMFC performance in a range of operational conditions and do not require a large amount of data to construct the model. They depend on a thorough understanding of the underlying operational mechanisms and interactions between components and incorporate temporal and spatial elements into their analyses. Kishimoto et al. [
34] created a numerical methodology for predicting the electrochemical characteristics that takes into account various features, such as current-voltage behavior, macroscopic properties, and impedance. Talukdar et al. [
34] explored the correlation between electrode performance and drying techniques. They achieved this by constructing a dynamic two-dimensional physical continuum model that incorporates the sensitivity of catalyst layer microstructure parameters. Danilov and Tade [
36] have developed a new technique for estimating cathodic and anodic charge transfer coefficients from PEMFC voltage-current curves. In [
37], an equation was formulated to fit the cell potential to current density data for PEMFCs in different conditions. This equation includes an exponential term to take into consideration the effects of mass transport, which allows for the capture of slope changes and a rapid potential drop. Guinea et al. [
38] have developed another voltage-current model that takes into account the electron leakage current density, so that accurate matching performance can be achieved using gradient optimization methods and rotational.
The hybrid approach predicts the PEMFCs’ performance based on both physical models and historical data. For example, Bressel et al. [
39] proposed a novel approach based on an Extended Kalman Filter-based observer to accurately estimate both the health status and degradation dynamics. Wang et al. [
8] presented a new method that combines the benefits of machine learning methods and semi-empirical models to predict the degradation of a PEMFC system with 300 cells. Hu et al. [
41] proposed a hybrid method for predicting the probability of performance degradation in PEMFCs, with the goal of extending service life and reducing maintenance costs. Pan et al. [
42] introduced a hybrid methodology that combines a model-based adaptive Kalman filter with a data-driven NARX neural network to predict the degradation of PEMFCs. The overall degradation trend is captured through an empirical aging model and an adaptive Kalman filter, while the intricate degradation specifics are depicted using the NARX neural network. Zhou et al. [
31] combined a physical aging model and time-delay neural networks to forecast the deterioration of a PEMFC. The physical aging model was used to remove the non-stationary trend from the original data, and the linear component was filtered with an autoregressive and moving-average model. The remaining non-linear model was then used to train the delayed neural networks, which were used to make the final prediction. Cheng et al. [
43] proposed a method to enhance the precision of prognostic results when characterization is uncertain. They used the least square support vector machine (LSSVM) for initial prognostics and subsequently employed a regularized particle filter (RPF) to determine the final probability distribution of Remaining Useful Life (RUL) for PEMFC.
The application of the model-driven and hybrid approaches requires a certain level of physical knowledge about the system behavior, leading to some difficulties in some real and complex applications. In this context, the data-driven approach predicting the PEMFC purely based on historical data has been extensively developed thanks to their remarkable flexibility and their strong predictive capabilities. For example, Wilberforce et al. [
28] employed an ANN to predict the current and voltage of PEMFC, minimizing the power required for fuel pumping and thus reducing net losses in the cell. Legala et al. [
24] carried out a comparative study between ANN and SVR for predicting variables such as cell voltage, membrane resistance, etc. The study showed that ANN is better than SVR, particularly in multivariate output regression tasks. However, SVR shows its strength in simpler regressions, offering reduced computational load while maintaining accuracy. Han et al. [
19] combined ANN and SVM to predict the PEMFC stack performance, considering the influence of different operating conditions of the PEMFC. In [
20], they used a deep belief network (DBN) to build a model to predict the performance and maximize the power density of a PEMFC. Also, in [
22] they used a long short-term memory (LSTM) to predict the performance of PEMFC under dynamic conditions, especially for vehicle applications. Chen et al. [
10] applied a gradient backpropagation neural network to anticipate the aging evolution of PEMFCs. The parameters of this model were adjusted by an evolutionary algorithm, including a mental evolutionary algorithm (MEA), particle swarm optimization (PSO), and genetic algorithm (GA). Zou et al. [
47] have advanced an RNN model with an attention mechanism to optimize prognostic and health management predictions, thus promoting more accurate anticipation of output voltage deterioration in PEMFC. He et al. [
48] proposed an auto-encoder (AE)-LSTM network model to predict PEMFC degradation progress and mechanisms, during vehicle operation. This strategy employs a health indicator to represent PEMFC degradation states, followed by LSTM analysis.
Nevertheless, a significant limitation of the existing models is that they are not generic enough to be adapted to different PEMFC configurations, as in the case of GEH2 PEMFC. Also, most of the machine learning models used to predict PEMFC performance have not been evaluated on real data sets or they do not take into account the dynamic operating conditions of PEMFC. Additionally, some models, such as deep neural networks, can be considered black boxes in the sense that they provide predictions without offering a clear explanation of the underlying causal relationships. This can make it difficult to interpret the results and understand the factors influencing the polarization curve. In short, a new modeling approach is needed to provide a better prediction of PEMFC performance. Therefore, we propose in this paper an efficient prediction approach based on XGBRegressor and Tree-structured Parzen Estimator. In addition, Kernel Principal Component Analysis and Mutual Information are jointly used to better select relevant features. The proposed approach allows considering the dynamic operating conditions of the PEMFC. To test and validate the robustness of the proposed approach and also to cover different operating conditions, a real industrial data set of ten PEMFCs was used. Furthermore, a comparison study with other machine learning models, such as artificial neural networks and support vector machine regressions, is investigated. The main contributions of this study can be summarized as follows:
A new feature selection method based on KPCA and Mutual Information was developed to select the relevant features that control the PEMFC.
A novel performance prediction method based on XGBRegressor and Tree-structured Parzen Estimator was proposed to predict the polarization curve of the PEMFC.
A comparison study between the proposes model and traditional machine learning models has been carried out on a real dataset.
The rest of the paper is organized as follows:
Section 2 describes the studied data and PEMFC feature selection.
Section 3 presents the proposed prediction model based on XGBRegressor and Tree-structured Parzen Estimator. In
Section 4, the performance of the proposed method is evaluated using actual polarization curve data of ten PEMFCs. Furthermore, a comparison study with two popular machine learning regressors widely used to predict the polarisation curve: artificial neural network (ANN) and support vector machine regressor (SVR) is also given. Finally, this work’s conclusions are discussed in
Section 5.
4. Results and Discussions
In this section, we apply the proposed model to predict the polarization curve based on data collected from ten different PEMFCs of different zero-emission electro-hydrogen generators. To validate and compare the performance of the proposed model, we also applied two popular machine learning regressors widely used to predict the polarisation curve: artificial neural network (ANN) and support vector machine regressor (SVR).
ANN is derived from biological neural networks that develop the structure of a human brain. Similar to the human brain with neurons interconnected to one another, artificial neural networks also have neurons interconnected to one another in various layers of the networks. The performance of ANN has been proven for many applications, including regression problems [
17]. The defined ANN has been designed with an input layer of 6 variables, two hidden layers of 64 and 32 neurons respectively, and an output layer with a single neuron. The activation function for the first two layers is ReLU, and linear for the third layer. The chosen loss function is rmse and the Adam optimizer is used to minimize it. The training data is divided into batches of size 32 and the model is trained over 50 epochs. A schematic of the artificial neural network, where the variables selected in the previous selection are used as feature vectors (inputs) to predict the PEMFC polarization curve, is shown in
Figure A1.
SVR is an extension of SVM applied to regression analysis [
31]. It seeks to find a regression function that predicts continuous values by maximizing the margin between predictions and actual values while controlling the complexity of the model. Focuses on the data points closest to the margin, known as support vectors, to construct the regression function. SVR can use different kernel functions, like the radial basis function, to capture non-linear relationships. It solves an optimization problem that balances prediction errors and model regularization, similar to Support Vector Machines for classification. In this case, the SVR used to predict the polarization curve has a Gaussian kernel (RBF), an epsilon error tolerance of 0.025, a regularization parameter C of 5, and a kernel independent term coef0 of 0.01.
Figure A1 shows a schematic of the SVR model used to predict the polarization curve of the PEMFC.
As discussed in
Section 2.2.3, to better select relevant features, Kernel Principal Component Analysis and Mutual Information are jointly used. To demonstrate the effectiveness of this method, we compared it to different feature selection methods. The results of this comparison are shown in
Table 3. As we can observe, the proposed feature selection method gives better results than that provided by the other methods presented in the table, and this is for both prediction models XGBRegressor and ANN.
4.1. Hyper-parameters Tuning
The different hyper-parameters of the proposed XGBRegressor model such as number estimators, max depth, learning rate, and colsample bytree were first optimized. The obtained results are reported in
Table 4.
4.2. Prediction and Evaluation
In order to validate and evaluate the quality of the proposed model, we compare its prediction results with experimental data as well as with two other machine learning models (ANN and SVR). Taking into account the operating conditions, we present the prediction results for four PEMFCs chosen to represent the ten PEMFCs. Note that each PEMFC has a serial number starting with 2. The results are shown in
Figure 7,
Figure 8,
Figure 9 and
Figure 10.
The obtained results confirm that our proposed model outperforms the ANN and SVR models. Indeed, our predicted values are close to the measured voltages, while the curves predicted by ANN and SVR are sometimes far from the real ones with significant deviations, especially in the cases of large values of the current density (greater than 200 mA/cm). We can also notice that our proposed model is more robust than ANN and SVR since its performance is always guaranteed when it is applied to the different PEMFCs.
In order to estimate more precisely the performance of our model, the three evaluation metrics (RMSE, MAE, and R
) were used and the obtained results are shown in
Table 5. Finally, the box plots in
Figure 11 are also used to reinforce our conclusions.
As we can observe from the results presented in
Table 5 and
Figure 11, our method always provides better results than that provided by the ANN and SVR. For example, the mean value of R2 of our method is
, while that of SVR and ANN is only
and
respectively.