1. Introduction
In 2011, the Obama administration of the United States officially proposed the "Material Genome Project", which utilizes high-throughput computing and experiments to obtain massive material data, combined with data analysis technology by artificial intelligence for new material development. The goal is to shorten the cycle of new materials development and applications, as well as reduce the costs for materials research and development, so that the United States can continue to maintain a leading position in manufacturing technology. In 2016, the US government released the "First Five Years of the Materials Genome Initiative: Accommodations and Technical Highlights" report, which pointed out that during the five years of the implementation of the Materials Genome Engineering program, federal research institutions such as the Department of Energy, the Department of Defense, the Natural Science Foundation, the National Bureau of Standards and Technology, and the National Aeronautics and Space Administration have invested over 500 million US dollars, establishing computational materials research and development centers including the National Network for Virtual High throughput Preparation (NIST&NREL) and the Center for Cross scale Material Design and Multi scale Materials Research (NIST, ANL, ARL), forming three major computational materials databases: the Materials Project (MP) [
1], AFLOW [
2], and OQMD [
3,
4], several auxiliary databases such as Materials Data Repository (MDR), Materials Resource Registry, Energy Materials Network, as well as databases related analysis tools.
Shortly after the proposal of the Materials Genome Project by the United States, the European Science Foundation launched the Accelerated Metallurgy (ACCMET) program, which costs over 2 billion euros, with the aim of keeping up with the pace of the United States. The European Commission funded the Horizon 2020 project NoMatD, led by the Max Planck Institute in German, for a period of three years in 2015. The project aims to use the "centralized data warehouse" method to involve various research groups and provide data related to computational materials science, with the aim of building a "Encyclopedia of Materials" and a tool for analyzing big data on materials. In the UK, the government has also implemented the e-science program, with its funding, to carry out high-throughput material computing simulations and the construction of material computing basic databases, such as eMinerals and the "Material Grid" project. The Swiss EPFL University has led the development of the European Materials Database AiiDA [
5].
Nowadays, with the vigorous development of big data and artificial intelligence technology, the material genome project research characterized by high-throughput experiments, high-throughput computing, and artificial intelligence big data analysis is in full swing, and has shown astonishing advantages in many materials fields. The paper "Machine-learning-assisted materials discovery using failed experiments" published in Nature in May 2016 [
6] showed that based on years of accumulated experimental data, various catalytic new materials can be discovered using artificial intelligence (AI) technology. This work indicates that AI will profoundly transform the research methods in the field of materials. The centuries long history of human scientific development has formed three research paradigms: experimental, theoretical, and computational. However, in the fields of complex systems such as biology, astronomy, and materials, there are very complex interactions involved, coupled with a large number of variables, which greatly limits the effectiveness of theoretical and computational research models and requires the combination of big data and AI as the "fourth paradigm". In 2017, AlfaGo defeated the human Go master, but Google disbanded the DeepMind team responsible for developing the program, and then formed an AI research and development team engaged in material genome engineering. At present, American high-tech companies including Apple, Google, IBM, Tesla, etc. are all laying out the use of AI for the research and development of new materials based on material genomics methods. The fourth paradigm of materials science requires the ability to generate and process massive amounts of data, thus obtaining massive amounts of material data has become a key aspect of the Materials Genome Project. With the improvement of computing power, the accumulation of material data based on high-throughput computing is receiving more and more attention, and its application in the research and development of new thermoelectric materials is expected to greatly accelerate its application process.
The performance of thermoelectric materials is described by the figure of merit ZT, which can be expressed as follows:
where
is the Seebeck coefficient,
is the conductivity,
is the temperature,
and
is the thermal conductivity contributed by carriers and phonons, respectively. These parameters of
,
and
are coupled with each other, and it is difficult to independently regulate them. For example, for semiconductor materials, increasing doping concentration can increase conductivity, while at the same time reducing the Seebeck coefficient and increasing carrier thermal conductivity. At present, the three major material databases, Materials Project, AFLOW, and OQMD, have data on several common physical quantities, including atomic and band structure, and other physical properties are also being added. However, thermoelectric performance of materials, due to their particularity and the complexity in calculating electrical and thermal transport properties, generally require a large amount of computation.
Here we selects Materials Project as the structural source for constructing a thermoelectric material database. Specifically, we employed the atomic structure files POSCAR and CIF (currently 19952 materials) in MP materials with id-number below 100000 through the Materials Project API as the initial materials for building present thermoelectric material database——
Wenzhou TE. We have built deformation potential modules, elastic properties modules, and BoltzTrap electronic transport modules. And then, we collect data by Python scripts and display it on a web site,
https://hezhu2024.github.io, for others to use.
2. Methodology
2.1. Clustering (K-Means)
At present, the excellent thermoelectric materials obtained in experiments are mainly semiconductors with narrow-bandgaps, then we choose bandgap as a major feature for material screening. At the same time, we selected free energy, volume, density, and average atomic energy as other features from the descriptors obtained from the MP database. They form five featured variables for the K-means clustering algorithm.
Here is a brief introduction to the K-means principle [
7]. K-means is a clustering algorithm that divides data into K classes. Firstly, K class random points are randomly generated, denoted as
. Assuming that the
j-th feature of the
i-th data is represented as
, the distance from the
i-th data sample to the
l-th class random point is:
Among them,
J represents a total of
J features in the data. The random class point with the smallest distance represents the same class. After the first iteration, each data sample will be classified into a certain class. Then, we calculate the average value of each class of data as the new random class point. The new random class point can be represented as:
among them,
.
Then we re-calculate these distances, and reclassify them. And such process is repeated until convergence achieved. And finally the data will be classified into K classes. In present work, we also standardize the data before classification. In order to illustrate how many categories are most reasonable, we could assume that the formula for the total loss as follows:
where n represents the number of samples. This formula represents the sum of distances from all sample points to their random class points. When there is a significant inflection point on the line of Loss with respect to class K, the value of K at the inflection point should be considered as a reasonable classification. Through the K-means method, we divided the initial materials from MP into 5 categories. Their quantities are 6602, 5425, 3770, 2800, and 1355, respectively.
2.2. Deformation Potential Theory (DPT)
The deformation potential theory was proposed by Bardeen and Shockley [
8] in the 1950s to describe charge transfer in non-polar semiconductors. The charge mobility can be expressed as
, where the relaxation time for bulk materials could be written as follows [
8,
9]
where
is the elastic constant,
,
is the deformation potential energy, which is the difference between the energy level of the
i-th energy band and the energy level of the deep nuclear state, and
is the effective mass.
2.3. Elastic and Thermal Properties
We can obtain elastic properties, group velocity, Poisson’s ratio, Debye temperature, Grüneisen coefficients, and lattice thermal conductivity, by after calculating the elastic constant of materials [
10], which could be easily achieved for the high-throughput calculation.
In the case of uniform deformation for a crystal, the generalized form of Hooke’s law of stress-strain [
11] is:
where
and
is a homogeneous second-order stress tensor and a strain tensor, respectively [
12].
represents the fourth order elastic stiffness tensor. Using matrix representation, we can abbreviate the stiffness tensor
of four suffixes to the stiffness tensor
of two suffixes, which can be represented as follows:
Similarly, the elastic flexibility tensor
can be written as:
The Voigt [
13] Bulk modules can be calculated by
And the Shear modulus can be obtained by
The Reuss [
14] Bulk and Shear modulus can be calculated by,
and
In present work, we take the arithmetic mean of the boundaries between Voigt and Reuss Voigt-Reuss-Hill (VRH) [
15]:
The longitudinal (
), transverse (
), and average (
) elastic wave velocities can be calculated by
(15~17)
The Debye temperature (
) is obtained by:
And the Grüneisen coefficient is calculated by:
where
is the Poisson’s ratio.
According to the Slack formula [
16,
17], the lattice thermal conductivity can be expressed as:
where
is the average atomic mass,
is the Debye temperature,
is the volume of each atom,
is the number of atoms in the original cell,
is the Grüneisen coefficient,
is a constant of
, and
T is the temperature.
2.4. Methods for the First-Principles Calculations and Transport Properties
In the process of building a thermoelectric material database, first-principles calculations are done by the Vienna Ab initio Simulation Package (VASP) [
18,
19]. The calculation of electricity transportation requires the use of the Boltztrap program package [
20]. In order to minimize computational costs while ensuring data reliability, during optimizing calculations, we set the plane-wave energy cutoff to be 1.4 times the maximum ENMAX of POTCAR of composed elements, the electronic energy convergence to be 10
−4 eV, the force convergence for ions to be 10
−2 eV/Å, and the density k-mesh to be 0.04×2π Å
-1.
All the processed are controlled through Shell scripts. Data collection and calculation are implemented by Python scripts. These codes are home-made.
3. Capabilities and Workflow
3.1. The Application of K-Means on Datasets from MP
From
Figure 1a, it can be seen that the number of points with obvious inflection is 6, which means that the initial structures can be divided into 6 categories. Considering the reasonable distribution of the average-bandgap values, we ultimately divided it into 5 categories. The featured distribution map and various information of K-means are shown in
Figure 1c-g. The average value of bandgap for the first class is merely 0.025eV, so this class of material contains many metals. The second class with average bandgap value of 0.14eV mainly composed of semiconductors with narrow bandgaps. The third, fourth, and fifth categories are mainly composed of semiconductors and insulators with wide bandgaps. As a starting point, we focused on calculating the physical properties of candidate material sets for the first and second categories.
3.2. Computational Framework and Relaxation Process
After getting the structural file, we firstly perform structural relaxation and static calculation. Structural relaxation refers to the optimization process of atomic positions and lattice constants. We employed VASP software for the first-principles calculations. Actually several mainstream databases such as AFLOW, MP, OQMD, etc. are also calculated using VASP software.
For the first and second types of materials obtained through K-means initial screening, there are more than 12000 materials, many of which contain too many element types and numbers of atoms in the primitive cell. In present work, we firstly calculate the material system with a relatively simple structure. Therefore, a computational control process is employed during the structural relaxation to further screen them, and resulting in a total of more than 3000 materials with relatively simple structures in the first and second types. Nevertheless, conducting structural relaxation for so many materials is a computationally demanding task. In order to accelerate the calculation, we wrote several shell scripts to control the process of structural relaxation. The flowchart is shown in
Figure 2.
After performing relaxation calculations on the data of the first and second classes of materials, we screened 1915 and 1656 materials, respectively, for further calculations, as shown in
Figure 1b. In the first class, there are remained 3111 materials with atomic numbers greater than 10 or element types greater than 4, and other 1576 materials unrelaxed structures which are hard to get convergent relaxation in our present setup calculations. In the second category, there are also 2451 materials with atomic numbers greater than 10 or element types greater than 4, and 1318 materials that are difficult to be relaxed. After the relaxation calculation process, the convergent structures are saved for further calculations.
Then we perform the calculations of the parameters of deformation potential theory. Firstly, we performed an anisotropic property judgment on the material, and then we performed static calculations on the deformed structures in various directions.
3.3. Analysis of Results of Deformation Potential Theory (Using Si as an Example)
The deformation potential method considered acoustic phonons as the main scattering sources for electrons. The relaxation time obtained by ignoring the contributions of optical phonon branches and other scattering mechanisms could be larger than the real one, but the calculation of deformation potential is relatively simple, easily employed in high-throughput calculations. The coefficients for applying deformation to the lattice vector are {0.98, 0.99, 1.00, 1.01, 1.02} of relaxed volumes, respectively. Such calculations could ensure the reliability of fitting with the second-order function for the elastic constant and the first-order function for the elastic potential energy. Taking Si as an example, as shown in
Figure 3.
After calculating the deformation potential parameters, we could get the relaxation time of carriers by combing the effective masses.
Table 1.
Calculated deformation potential parameters, effective mases and relaxation time of carries for Si.
Table 1.
Calculated deformation potential parameters, effective mases and relaxation time of carries for Si.
|
Carrier |
|
|
|
|
Si |
Electron |
3.44 |
1.52 |
0.46 |
1141.9 |
Hole |
7.91 |
1.52 |
2.48 |
21.6 |
3.4. Energy Band and Effective Mass Calculation
There are many methods to obtain the band structure of a material. Here we compare three feasible schemes. The first scheme is VASP high symmetry point energy band calculation, the second one is using BoltzTrap2 [
20] to fit the band structure, and the third one is using maximally-localized Wannier function to interpolate the VASP results [
21]. Considering the accuracy and efficiency, the second scheme is chosen in our high-throughput calculations. As shown in
Table 2, three schemes for Si are presented.
The bandgap of Si in the MP database is 0.61eV, which is consistent with VASP calculation. The bandgap error calculated by Boltztrap is within 5%. Meanwhile, the effective mass of Si calculated by Boltztrap is smaller than that of the VASP scheme, indicating that the calculated relaxation time will be larger, as shown in
Table 1, where the relaxation time of electrons is 1141.9
. The energy band of Si by three schemes is shown in
Figure 4. From
Table 2, it can be seen that the Boltztrap calculation for band structure is most efficient, then it can help to accelerate the high-throughput calculation.
To facilitate high-throughput calculation, we use the formula
to calculate the effective mass. The effective masses of Si by the Boltztrap scheme is shown in
Figure 5. A series of effective masses of conduction and valence bands were obtained near the high symmetry points of Г and X. We selected the maximum values of 0.46
and 2.48
as the effective masses for the conduction band and valence band, respectively. In addition, our program is designed to automatically determine whether the band is degenerate and calculate the effective mass for each degenerate band. We note here that the reason for selecting the maximum effective mass is that the deformation potential overestimates the relaxation time. By selecting the maximum effective mass, the relaxation time can be effectively reduced to compensate for the shortcomings of the deformation potential theory. In high-throughput calculations, the program also selects representative effective masses for other materials such as the Si.
3.5. High-Throughput Electrical Transport Properties(Boltztrap)
Boltztrap is a program package calculating the semi-classic transport coefficients, based on a smoothed Fourier interpolation of the bands. Electrical transport properties such as Seebeck coefficient, electronic conductivity, and electronic thermal conductivity can be obtained at different temperatures and doping concentrations. The Boltztrap program has an input interface for VASP files, which can meet the needs of present high-throughput processes. After completing static calculations, the Boltztrap module can be performed. Meanwhile, Boltztrap based on Python can be well embedded into our high-throughput Python data processing scripts, which are written for quickly obtaining the calculated quantities such as Seebeck coefficient, electronic conductivity, and electronic thermal conductivity. Combined with the lattice thermal conductivities estimated from the elastic properties calculations, we could obtain the ZT values for the materials. We listed the top ten semiconductor materials with ZT values in
Table 3.
3.6. ZT Value and BE Value
As an example for the application of our database, we associate thermoelectric ZT values with the electronic quality factor. By
and
, the electronic quality factor
can be defined by [
22]:
where
. As shown in
Figure 6, the
values of most materials are positively correlated to its electronic quality factor
, so the
values can also serve as another criterion for judging excellent thermoelectric materials.
4. Conclusions
In this work, we builds a thermoelectric material database——Wenzhou TE. We designed several modules to obtain the electronic and heat transport parameters for materials, including structural screening, deformation potential, elastic constant, and Boltztrap electrical transport performance calculations module. And we write several Python scripts to collect data and process results. Furthermore, we built a webpage for the first-principles calculated thermoelectric materials database (
https://hezhu2024.github.io) (Wenzhou TE), which could be used for searching and viewing the physical properties of materials. Subsequently, we will continue the construction of the database to include more materials, and based on this, one can easily use these data for data mining and thermoelectric material development.
Author Contributions
Conceptualization, methodology, validation, formal analysis, investigation, data curation, Y.F. and H.S.; software, writing—original draft preparation, visualization, Y.F.; resources, writing—review and editing, supervision, project administration, funding acquisition, H.S.; All authors have read and agreed to the published version of the manuscript.
Funding
National Natural Science Foundation of China (52272006), Zhejiang Provincial Natural Science Foundation of China (LY22A040001), and Wenzhou Municipal Natural Science Foundation (G20210016).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Jain A, Ong S P, Hautier G, et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 2013, 1, 011002.
- Curtarolo S, Setyawan W, Hart G LW, et al. AFLOW: An automatic framework for high-throughput materials discovery. Computational Materials Science. 2012, 58, 218–226.
- Saal J E, Kirklin S, Aykol M, et al. Materials design and discovery with high-throughput density functional theory: The open quantum materials database (OQMD). JOM. 2013, 65, 1501.
- Kirklin S, Saal J E, Meredig B, et al. The Open Quantum Materials Database (OQMD): assessing the accuracy of DFT formation energies. Npj Comput. Mater. 2015, 1, 15010.
- Pizzi G, Cepellotti A, Sabatini R, et al. AiiDA: automated interactive infrastructure and database for computational science. Comp. Mat. Sci. 2016, 111, 218-230.
- Raccuglia P, Elbert K. C, Adler P. D. F, et al. Machine-learning-assisted materials discovery using failed experiments. Nature 2016, 533, 73-76.
- Selim S Z, Ismail M A. K-meanss-type algorithms: A generalized convergence theorem and characterization of local optimality. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1984, PAMI-6, 81–87.
- Bardeen J, Shockley W. Deformation potentials and mobilities in non-polar crystals. Phys. Rev. 1950, 80, 72–80.
- Lengeling B. S, Guzik A A. Inverse molecular design using machine learning: Generative models for matter engineering. Science 2018, 361, 360–365.
- Singh S, Lang L, Dovale-Farelo V, et al. Mechelastic: A python library for analysis of mechanical and elastic properties of bulk and 2d materials. Computer Physics Communications 2021, 267, 108068.
- Dobson P, J. Physical properties of crystals – their representation by tensors and matrices. Physics Bulletin 1985, 36, 506. [Google Scholar] [CrossRef]
- Mouhat, F.; Coudert, F.X. Necessary and sufficient elastic stability conditions in various crystal systems. Phys. Rev. B. 2014, 90, 224104. [Google Scholar] [CrossRef]
- Mavko G, Mukerji T, Dvorkin J. in The Rock Physics Handbook, Cambridge University Press, 2020; pp. 220-235.
- Reuss A. Berechnung der fließgrenze von mischkristallen auf grund der plastizitäts bedingung für einkristalle. ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik 1929, 9, 49–58.
- Hill, R. The elastic behaviour of a crystalline aggregate. Proceedings of the Physical Society. Section A 1952, 65, 349. [Google Scholar] [CrossRef]
- Nolas G S, Goldsmid H J. Thermal conductivity of semiconductors. in Thermal Conductivity: Theory, Properties, and Applications, edited by T. M. Tritt, Springer US, Boston, MA, 2004; pp, 105–121.
- Slack, G. Nonmetallic crystals with high thermal conductivity. Journal of Physics and Chemistry of Solids 1973, 34, 321–335. [Google Scholar] [CrossRef]
- Kresse G, Hafner J. Ab initio molecular dynamics for liquid metals, Phys. Rev. B. 1993, 47, 558.
- Kresse G. and Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set, Phys. Rev. B. 1996, 54, 11169.
- Madsen G K, Carrete J, Verstraete M J. Boltztrap2, a program for interpolating band structures and calculating semi-classical transport coefficients. Computer Physics Communications 2018, 231, 140–145.
- Pizzi G, Vitale V, Arita R, et al. Wannier90 as a community code: new features and applications. Journal of Physics: Condensed Matter 2020, 32, 165902.
- Zhang, X.; Bu, Z.; Shi, X. Electronic quality factor for thermoelectrics. Science Advances 2020, 6, eabc0726. [Google Scholar] [CrossRef] [PubMed]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).