Akshaya Ajith
Mentored by Corey Oses
Hydrogen fuel cells are a promising clean energy alternative to traditional electricity generation methods. They are more efficient than their traditional counterparts and produce no carbon emissions, outputting only water. However, achieving large-scale adoption across industries requires a cost reduction. Most hydrogen fuel cells utilize a platinum catalyst to facilitate the electrochemical reactions, and platinum’s scarcity contributes to its high cost – a notable 42% of the cost of producing a hydrogen fuel cell can be attributed to the platinum catalyst.
Developing a non-precious metal catalyst would cut costs and promote the implementation of hydrogen fuel cells in more industries. Recent studies have suggested that materials with higher disorder from additional components see an increase in catalytic performance and durability. This project seeks to develop a predictive machine learning algorithm that identifies cost-effective High-Entropy Alloys (HEAs) that can substitute for platinum-based catalysts.
Our initial efforts focused on creating an optimization algorithm to accelerate the discovery of stable HEAs using the AFLOW database. The aflow-POCC framework allows us to estimate the stability of HEAs by analyzing their Entropy-Forming Ability (EFA), with higher EFA values indicating greater stability based on Density Functional Theory (DFT) calculations.
To develop an EFA prediction model, we employed a random forest algorithm trained on 436 HEAs derived from a database of five-component metal alloys. The model utilized 20 parameters describing material properties, such as ionic character, melting temperature, and electronegativity. A five-fold cross-validation yielded an initial R² value of 0.57 on the testing set for the model’s predictions. By incorporating 12 additional parameters, we improved the R² to 0.62.
To improve the R² further, the project turned to identifying parameters with higher feature importance for EFA prediction of HEAs. We developed a feature selection program to generate feature vectors based on parameters calculated from JSON files containing the molecular information of each HEA in our training set. The goal was to identify parameters with high predictive power and eliminate those with low variance. This process identified 415 additional parameters of interest, which were subsequently incorporated into the model. Training the random forest model with the expanded set of 415 features and on all 436 HEAs resulted in a R2 value of 0.58.
The project’s immediate objective is to enhance the predictive accuracy (R²) of the model by refining feature selection and dividing the heterogeneous HEA dataset into specialized subgroups. This specialization will reduce overfitting and increase the model’s accuracy in predicting the stability and performance of different compound types.