Abstract
The discovery of high-dielectric materials is crucial to increasing the efficiency of electronic devices and batteries. Here, we report three previously unexplored materials with very high dielectric constants (69 < ϵ < 101) and large band gaps (2.9 < Eg(eV) < 5.5) obtained by screening materials databases using statistical optimization algorithms aided by artificial neural networks (ANN). Two of these new dielectrics are mixed-anion compounds (Eu5SiCl6O4 and HoClO) and are shown to be thermodynamically stable against common semiconductors via phase diagram analysis. We also uncovered four other materials with relatively large dielectric constants (20 < ϵ < 40) and band gaps (2.3 < Eg(eV) < 2.7). While the ANN training-data are obtained from the Materials Project, the search-space consists of materials from the Open Quantum Materials Database (OQMD)—demonstrating a successful implementation of cross-database materials design. Overall, we report the dielectric properties of 17 materials calculated using ab initio calculations, that were selected in our design workflow. The dielectric materials with high-dielectric properties predicted in this work open up further experimental research opportunities.
Similar content being viewed by others
Introduction
Dielectric materials are among the most vital components for microelectronic device manufacturing. They are used in memory devices, capacitor-based energy storage, field-effect transistors, etc1,2,3. The dielectric constant (denoted here as ϵ), more commonly referred to as the relative permittivity, is the factor by which the electric field strength decreases inside a material compared to the vacuum when it is placed near a finite electric charge. The ϵ values of commonly used dielectric materials range between 20 and 301,4,5—for example, Ta2O5 (ϵ ~ 23–27, Eg = 4.2 eV)1,2,6,7 and TiO2 (ϵ = 27, Eg = 3.5 eV)1,2,8. There is a high demand to find novel materials with high ϵ to increase the device performance and reliability. Typically, ϵ and Eg are inversely related2,9 in a compound. As a result, although several materials are reported to have even larger ϵ values, they often have a small Eg9,10,11,12, making the dielectric vulnerable to leakage currents under exposure to large electric fields1,2. Therefore, compounds with high ϵ and large band gaps are preferred while designing charge storage applications and microelectronic devices.
One of the methods to find high-ϵ compounds is to calculate the dielectric constants and band gaps of a large number of compounds that are available in large materials databases such as the Open Quantum Materials Database (OQMD)13,14, Materials Project (MP)15, etc using ab initio methods such as density functional theory (DFT). However, since the accurate calculation of dielectric properties using density functional perturbation theory16 (DFPT) is computationally very expensive, it would be practically unfeasible to estimate the dielectric constants of tens of thousands of materials available in those databases using high-throughput methods. In this work, we employ an advanced screening strategy to identify compounds with better dielectric properties. Thus, the goal of this work is to find dielectric materials with large values for both ϵ and Eg by screening materials databases but at the expense of conducting as few DFPT calculations as possible. To accomplish this task, we have employed a materials design strategy comprised of statistical optimization models and DFPT calculations on a small set of compounds. While our training set consists of a small amount of data (dielectric constants) from the MP, the search-space contains a vast set of compounds available in the OQMD.
Several online data repositories exist today that are dedicated to hosting large sets of open-sourced inorganic crystal structure data generated from high-throughput (HT) DFT calculations such as the MP15, OQMD13,14, and AFLOWLib17 among others18,19. The design and discovery of novel materials using statistical modeling has become an active research area20,21,22 in recent times, largely attributed to the availability of such HT datasets. Recently, multiple studies have reported HT-generation of dielectric data and subsequent analysis9,23,24. For example, Morita et al. reported25 machine learning modeling of data from MP11,12,15 to assess the reliability of the theoretical models currently available to describe the dielectric properties of crystals.
In this work, we use the MP dataset of 1864 dielectric tensors11,12 to train statistical models and subsequently identify dielectrics from the set of stable materials in the OQMD. Thus the MP data forms the training-data and the set of materials from OQMD forms the search-space for the materials design. This work is a successful demonstration of the scenario where the data obtained from multiple sources can be utilized to discover new compounds. The negligible difference found between the representation vectors, which are also called as feature vectors in machine learning, generated for equivalent materials in MP and OQMD made the cross-database design possible in this work. Overall, we conducted three design cycles which required us to perform dielectric calculations for just 17 materials using DFPT. We report the dielectric constant values of all the 17 materials among which three of them (HoClO, Eu5SiCl6O4, and Tl3PbBr5) have very large ϵ (69 < ϵ < 101) and Eg (2.9 eV < Eg < 5.5 eV) values making them part of the Pareto front of the known data, and four other materials (Sr2LuBiO6, Bi5IO7, Bi3ClO4, and Bi3BrO4) have moderately large ϵ (20 < ϵ < 40) and Eg (2.3 eV < Eg < 2.7 eV) values.
Results
Materials design strategy
Our objective is to find large band gap materials with optimal dielectric constants. Since the dielectric tensor of a compound has nine components, the optimization of all nine components leads to a nine-objective optimization problem which is difficult to solve with training-data of size ~2000. Thus, we specifically optimize the largest eigenvalue of the dielectric tensor, referred to from here onward as ϵ, via statistical modeling through the materials design workflow, as depicted in Fig. 1. The workflow is similar to the strategies that have been previously reported in literature26,27, where each design cycle consists of three steps—data processing, statistical modeling, and ab initio DFPT calculations. The largest eigenvalue of the total dielectric tensor is chosen as the property to be optimized because that is the highest possible dielectric behavior from a single crystal when it is aligned perfectly along the corresponding direction between two metallic plates. The total dielectric tensor is calculated as the sum of ionic and electronic dielectric tensors. The good agreement between dielectric tensor eigenvalues obtained from MP’s DFPT HT framework and experimentally measured dielectric constant values was reported by Petousis et al.28. We preferred the largest eigenvalue over the average of eigenvalues because the latter value may severely underestimate the highest possible dielectric behavior from a single crystal (Supplementary Fig. 1), even though it is a popular choice to estimate the polycrystalline dielectric constant12,28. The new data produced from DFPT calculations at the end of each cycle is fed into the next design cycle. In the first step, we collected the relevant data from the MP database (training-data) and OQMD (search-space). All materials in the training-data have a known value for ϵ and Eg, while the materials in the search-space have known values of Eg but their ϵ values are unknown. In the second step, Modeling, we created an ensemble of artificial neural network (ANN)29 models, fit on the training-data, which learn to predict the ϵ value of materials when their crystal structures and Eg values are known. Using this ANN ensemble, we predicted the ϵ of each material in the search-space. Since the prediction was done from an ensemble, the results were a distribution of ϵ values for each material, contrary to the usage of a single ANN model where a single prediction value is obtained. The trained ANN ensemble was used to predict the ϵ-distributions of 11,102 stable non-metallic materials in the search-space, obtained from the OQMD.
Further, the predicted distribution of ϵ was input into the Efficient Global Optimization (EGO)26 algorithm. EGO takes into account the distribution’s mean and standard deviation to rank the materials in search-space based on their potential to increase the chances of finding high-ϵ materials in this workflow within as few design cycles as possible. In this work, the optimization in dielectrics refers to the identification of dielectrics with large ϵ values. The reason for employing an EGO algorithm to explore the search-space is to account for the uncertainty in ANN model predictions when the available training-data may not have sampled the material space uniformly. The advantages of EGO-based optimization in materials design were first reported and benchmarked by Balachandran et al.26,30,31. In this work, we used the EGO algorithm to select the best candidates that are either predicted to have a high ϵ value or have a large uncertainty in their ANN-ensemble predictions. Materials that belong to the latter category are from the regions of materials yet to be sampled by the training-data. The DFPT characterization of such materials is expected to increase the reliability of ANN-ensemble predictions after each design cycle and eventually lead to better optimization of dielectrics during the course of this work.
The metric that is used to rank the materials is called expected improvement, or E(I). More details on how the E(I) is calculated, are provided in the “Methods” section. A few (5–6) materials were selected in this step with the highest values of E(I) and carried onto the next step—DFPT calculations. In this final step, the dielectric tensors of the selected materials were calculated using DFPT calculations. If DFPT results show that any of the materials have a high value of Eg and ϵ, we stop the design workflow at that point. Otherwise, a new design cycle is started after transferring the newly computed ϵ values and the corresponding materials to the training-data from the search-space. With an increased size of training-data, the ANN ensemble is expected to have less uncertainty in ϵ predictions in the new design cycle. The design cycle was repeated with feedback three times in total in this work until three materials with very large values for Eg and ϵ were found.
Data
A dataset containing information about crystal structures, chemical compositions, band gap energy values, and dielectric tensors of 1864 stable materials was obtained from the MP11,12,15 data repository. This dataset was used to generate the training-data. The target property, ϵ, was obtained for each material in this database from its calculated dielectric tensor. Another dataset consisting of 11,102 stable, non-metallic materials containing information about crystal structures, chemical compositions, and band gap energy values was obtained from OQMD13,14. This OQMD dataset was used to generate the search-space in which the search to find dielectrics was conducted. The dielectric tensor data of all crystals included in the search-space were unknown at the beginning of this work.
The materials need to be represented as vectors of uniform length in order to be input into a statistical model. We generated the material representations using the Magpie32 crystal property generator tool. Magpie generates a set of physical features (such as the mean electronegativity of constituent atoms, average coordination number inside the unit cell, etc.) from a given chemical composition and crystal structure. Within Magpie, the crystal’s structure-related features are generated by building Voronoi tessellations inside the crystal and finding the nearest neighbors of each individual atom33. Magpie generated 271 input features that include 145 composition-based, and 126 structure-based features to represent each material. In addition to these, the material’s DFT Eg value was also added as an extra feature to the representation vector since it is already known for all materials in both MP and OQMD datasets. The addition of Eg increased the size of the representation vector to 272, which was generated for each material in training-data and search-space. The input feature-vector size was further reduced to 100 using the widely-used feature reduction techniques such as principal component analysis and model-based selection, implemented in the Scikit-learn python library34. The set of material representation vectors of training-data and the search-space, in addition to the target values associated with the training-data, completes the first step of materials design as depicted in Fig. 1. The size of the training-dataset increases after each design cycle as a result of conducting DFPT calculations on new materials from the search-space.
Statistical modeling utilizing data from multiple computational material databases is prone to errors arising from the differences in the DFT parameters used at each database’s high-throughput calculation strategy. Here, we have investigated the difference in Magpie-generated features for equivalent materials in OQMD and MP, cross-referenced based on their associated Inorganic Crystal Structure Database35 (ICSD) Collection Codes. In total, 1717 out of 1864 materials in training-data had an ICSD Collection Code associated with them. The crystal structures from OQMD corresponding to all the 1717 ICSD materials were obtained, and their Magpie-generated features were compared against that of the structures obtained from MP as a part of the training-data. The results, as plotted in Fig. 2a, show negligible (≤2%) relative difference in 263 out of a total of 271 Magpie features, while the other eight features have low relative differences (≤7%). All 145 composition-based features are computed to be identical across the databases, as expected. The finite difference in some of the structure-based features originates because of the difference in the accuracy of crystal structural minimization across databases. Band gap, which joins the Magpie features to form the final material representation vector, was also compared between OQMD and MP for the 1717 equivalent materials, as shown in Fig. 2b. Band gap values showed a mean and median absolute deviation of 0.1 eV and 0.0 eV respectively, pointing toward a negligible difference between the calculations of band gap for materials included in the training-data across OQMD and MP. Overall, the materials representation vector considered in this design is generated in a cross-comparable manner across OQMD and MP structures with very low errors.
The ϵ values in the training-data obtained from MP are predominantly concentrated in the range of 0 to 25, making it difficult to model the data reliably for materials with large ϵ due to a possible bias toward smaller values. Less than 5% of the materials in the training-data have ϵ > 50. The median of ϵ values in the MP dataset is 12.2 while the mean and standard deviation are 20.2 and 42.8 respectively. The distribution of ϵ in training-data is shown in Supplementary Fig. 2. The large spread of ϵ values is decreased upon a log-scale transformation, as shown in Fig. 3a. A smaller spread of target values helps stabilize the machine learning model during the training by reducing the probability of excessive changes in internal parameters, such as the weights in an ANN. We also analyzed the correlation between ϵ and Eg values for the materials in the training-data, and it is given in Supplementary Fig. 3.
The original dataset downloaded from MP listed BeO (MP ID: mp-1794) as having large ab initio computed values for ϵ(=312) and Eg(=8.2 eV). This large value of ϵ is possibly caused by the improper relaxation of the primitive cell of BeO in MP that leads to a large volume change. Hence, the succeeding calculations on this compound such as DFPT may be incorrect. We conducted a separate DFT cell-relaxation and DFPT calculation for BeO using VASP starting with the MP’s initial structure and find that the computed ϵ value for the correctly relaxed structure is 4—well in agreement with the previously reported values in literature36. This compound was removed from the training-data before proceeding further. We looked up other materials in training-data with very high ϵ and smaller Eg individually and confirmed that they did not have a large cell-volume change upon relaxation in MP.
Statistical modeling
The predictions from trained machine learning models, such as ANNs, are often prone to errors arising from the insufficient sampling of material space by training-data. We needed to quantify the uncertainty associated with the ϵ value predictions even though the available ANN algorithms explicitly do not provide that value from a single ANN model. So we created an ensemble of ANNs, each of which was trained on a randomly chosen subset of the training-data, and has different architectures and internal parameters. An ANN ensemble containing 2000 independent ANN models was created and trained at each design cycle. Each ANN in the ensemble predicted a single ϵ value upon inputting a material-representation vector, resulting in a distribution of 2000 predicted ϵ values for each material in the search-space. The standard deviation of each of the predicted ϵ-distribution was defined as the uncertainty of ANN modeling for the corresponding material.
Further, a statistical single-objective optimization algorithm, called EGO26,37,38,39,40, was used in this work to evaluate the ϵ-distribution and quantify a measure of probable optimization associated with each material in the search-space. EGO is not a method to model the data and predict ϵ. Instead, EGO is an algorithm to select the best candidates from a given search-space, based on their ϵ-distributions predicted by the ANN ensemble, in order to discover as many high-ϵ materials from as few design cycles as possible. Here, the desired optimization is the maximization of ϵ among all the materials in the search-space. The quantified measure of predicted optimization in EGO is called expected improvement, denoted as E(I). Conceptually, the E(I) of a material in search-space is the quantified probability with which a DFPT calculation of ϵ for that material will lead to the identification of high-ϵ material in the design workflow within as few design cycles as possible. Figure 3a shows the results from an ANN model validation as a part of model training during the second design cycle. The values of E(I) computed for the same validation data split from the training-data are shown in Fig. 3b. A simplified illustration of E(I) with the help of an example is given below.
Example illustration of E(I)
Suppose the predicted ϵ-distribution belonging to a material M1 in the search-space has a large standard deviation. Then it is highly probable that the material M1 belongs to a part of the material representation vector space which was not sampled very well in the training set. Computing the ϵ of M1 using DFPT and feeding back that information to the training-data will lead to better ANN modeling in the subsequent design cycles. Thus, M1 will have a large value of E(I). Now consider another material M2 in search-space with a large mean and a small standard deviation for its predicted ϵ-distribution. The material M2 belongs to a part of the material representation vector space that was sufficiently sampled by the training-data. So it is highly probable that M2 will turn out to be a high-ϵ material upon DFPT calculations. Because of that, M2 will also have a large value of E(I).
In EGO, the calculation of E(I) for a general optimization problem proceeds as follows (also shown in Fig. 4).
Let Y be the target property to be maximized and φ(Y) be the predicted distribution of Y for a given search-space material. The value, φ(Y = y) is the probability when the value of Y is y. The largest value of the target property in the training-data is denoted as \({y}_{t}^{{\rm{max}}}\). The EGO algorithm, as formulated by Jones et al.38, computes the expected improvement, E(I), as:
As mentioned in Balachandran et al.26, if the predicted distribution is approximated as a normal (i.e., Gaussian) distribution with a mean μ and a standard deviation σ, the above equation can be re-written as:
where, \(z=\frac{\mu -{y}_{t}^{{\rm{max}}}}{\sigma }\), ϕ is the probability density function, and Φ is the cumulative distribution function38 of the normal distribution, φ(Y).
For dielectric design, Y is the dielectric constant (ϵ) of a candidate material, and \({y}_{t}^{{\rm{max}}}\) is the highest value of ϵ in the training-data obtained from DFPT calculations. In the MP dataset, the largest ϵ value is for TiO2 with ϵ = 988 and Eg = 1.8 eV. But our goal in this work is to find materials with large ϵ’s, not necessarily higher than 988 as long as the Eg’s are greater than 1.8 eV. Thus the \({y}_{t}^{{\rm{max}}}\) in this work was set at 100.0 for all design cycles, instead of setting it at 988.0, to consider the search-space materials whose ϵ values are predicted to be sufficiently high. The φ(Y) is approximated to be a normal distribution with the same mean, μ, and standard deviation, σ, as that of the original ϵ-distribution predicted by the ANN ensemble for each search-space material.
Design cycles with feedback
The ϵ values of a few materials selected from the statistical modeling are computed from DFPT calculations, as shown in the final segment of a design cycle in Fig. 1. The results from the DFPT calculations are used to determine whether to conduct any further design cycles. In this work, we conducted the design cycles until at least one high-ϵ dielectric with a large Eg is identified. When no such materials are found during a design cycle, all the selected materials along with their newly DFPT-estimated ϵ values are transferred from search-space to training-data, resulting in a feedback of information prior to the beginning of the next design cycle. The feedback is one of the most crucial parts of our material design workflow because it results in a better sampling of material representation vector space by training-data and thus, more reliable ANN model predictions during the next design cycle. The advantage of the feedback mechanism is prominent during the quantification of uncertainty which is used directly by the EGO algorithm to identify the best candidates for the next set of DFPT calculations. After the end of a design cycle, the uncertainty on predicting the ϵ values is decreased for the set of materials which are similar to the materials whose ϵ values were calculated using DFPT in the given cycle.
In addition to the feedback mechanism, another factor that influenced the candidate selection in the design workflow is the minimum cutoff imposed on the band gap values of materials when they are included in the search-space. The reason for implementing a cutoff is to externally introduce a character of multi-objective optimization in this work. Without explicitly setting a minimum band gap limit, the candidate selection process that is dictated by the EGO algorithm tries to optimize only a single objective, which is the ϵ value. We conducted three design cycles sequentially with feedback of the newly calculated data into training-data after each cycle. In the first design cycle, we set no band gap minimum cutoffs to allow the full exploration of the search-space that consists of 11,102 non-metals from OQMD. In the second design cycle, a minimum cutoff of 2.25 eV was set, leaving 6191 materials in the search-space. In the final cycle, the minimum cutoff was increased to 5 eV to limit the candidate selection only to the materials with very high Eg. Hence, the search-space size in the final cycle was reduced to 1046 materials. The workflow that we adopted in this work deviates from the ideal situation where a dedicated multi-objective optimization statistical algorithm will be used to find a material with high ϵ and large Eg values. Since the band gap values are already available for all materials in the search-space, the best approach here was to implement a statistical optimization algorithm to quickly find high-ϵ materials while the preference for large band gap values is achieved by manually setting a minimum cutoff. This work stands as an example for the modifications required to practically implement the statistical algorithms that are often benchmarked on idealistic scenarios.
New dielectric materials
The materials that are part of the Pareto front of MP data are listed in Table 1, while the Pareto front of training-data at each design cycle is plotted in Fig. 5. Since the maximization of ϵ and Eg values are considered as optimal in this study, each material in the Pareto front has a higher value of either ϵ or Eg than any other material in the corresponding training-data. Therefore, the modification of the training-data’s Pareto front by any of the newly calculated dielectric constants after each design cycle may indicate the identification of suitable, high-dielectric materials.
During the first design cycle, the EGO algorithm picked out the five most promising candidates with the largest E(I) values in the search-space. The ϵ values of these five selected materials were calculated using DFPT. Two materials among them turned out to have very high ϵ values (~370) but very low Eg (~0.5 eV). The low Eg values are not unexpected since the EGO algorithm implemented in this work aims to maximize only the ϵ values. None of the materials selected in this cycle modified the Pareto front of the MP dataset, as shown in Fig. 5a. The ϵ values of these five materials were appended to the training-data prior to starting the next design cycle.
Five materials were selected in the second cycle and their dielectric constants were calculated. Our calculations predict a large dielectric constant for one of the five new materials—tetragonal Tl3PbBr5 (ϵ = 101, Eg = 2.9 eV). Tl3PbBr5 joined the Pareto front, as shown in Fig. 5b. Three other new materials—Bi5IO7 (ϵ = 36, Eg = 2.7 eV), Bi3ClO4 (ϵ = 39, Eg = 2.3 eV), and Bi3BrO4 (ϵ = 39, Eg = 2.3 eV), have moderately large ϵ values, even though they did not improve the existing Pareto front. All the five new materials were appended into the training-data before proceeding to begin the third design cycle.
During the third and final design cycle consisting of only materials with very large Eg in search-space, seven new candidate materials were selected to do DFPT calculations. Two among them—Eu5SiCl6O4 (ϵ = 69, Eg = 5.5 eV) and HoClO (ϵ = 75, Eg = 5.2 eV) joined the Pareto front due to their large ϵ and Eg values, as shown in Fig. 5c. In total, three new dielectric materials in the Pareto front were discovered after three design cycles and 17 new DFPT calculations were performed in the entire workflow. No further design cycles were conducted since we have already identified multiple compounds with high ϵ and Eg, which remained unexplored experimentally.
The ϵ values of all 17 materials which were obtained in this work are given in Table 2. The ϵ and Eg of all materials belonging to the Pareto front of the MP dataset is listed in Table 1 for comparison. Among all the newly discovered dielectrics with large ϵ values, tetragonal HoClO and monoclinic Eu5SiCl6O4 stand out because of their very large DFT-calculated band gap energies (5.2 eV and 5.5 eV respectively). These two rare earth oxychlorides are reported to have been experimentally synthesized41,42,43,44 but their dielectric properties remained unstudied to the extent of our knowledge. Both of these compounds are mixed-anionic inorganic compounds—a class of emerging functional materials45. Interestingly, the monoclinic Eu5SiCl6O4 has 32 atoms in its primitive unit cell which often exceeds the maximum cutoff on the number of atomic sites in HT studies involving computationally expensive material properties11,19.
Thermodynamic stability of a dielectric when in contact with Si or other semiconductors is an important requirement for it to be used in electronic applications. Several of the high-ϵ dielectrics identified in the published literature were shown to be unstable while forming an interface with Si in subsequent experimental studies conducted at or above the room temperature. The formation of SiOx and other undesired metal oxides were reported at the interface between Si and the popular high-ϵ dielectrics such as Ta2O346,47,48, TiO249,50, BaTiO351, and SrTiO352,53. The thermodynamic stability between two compounds can be assessed from the phase diagram involving those compounds. In this work, the phase diagram is constructed by computing the convex hull54 of formation energies of all the materials that belong to a given phase space spanned by their constituent elements. Each of the compounds that form the convex hull not only has the lowest formation energy at its composition but also has lower energy than any linear combination of other materials in that phase space. The difference between the formation energy of a compound and energy at the convex hull for the same composition is called as the hull distance (Ehd). By definition, each material that is on the convex hull has a hull distance of zero (i.e., Ehd = 0) and is considered to be stable. On the other hand, every material that falls above the convex hull is considered as metastable (0 < Ehd ≤ 50 meV per atom) or unstable (Ehd > 50 meV per atom) depending on the magnitude of Ehd according to the heuristic conventions adopted in literature31,55,56,57,58. The presence of a tie-line between two compounds in a convex hull phase diagram indicates that they are thermodynamically stable phases when in contact with each other. Our thermodynamic stability analysis on Ta2O3, TiO2, BaTiO3, and SrTiO3 in OQMD using the qmpy API14 showed no tie-lines connecting any of them to Si, indicating they are unstable when in contact with Si. This is consistent with the published results46,47,48,49,50,51,52,53. We also analyzed Gd2O3, a high ϵ (~2059) that is proven to be stable against Si60, and found that a tie-line does exist between Si and Gd2O3. These phase diagram plots are provided in Supplementary Fig. 6. In Fig. 6, we report a phase diagram to assess the stability of newly discovered high-ϵ dielectrics—HoClO and Eu5SiCl6O4. The phase diagram shows that both these materials are thermodynamically stable with the semiconductors such as Si, Ge, GaAs, GaN, and SiC at 0K, a requirement for them to be used in microelectronic devices where an interface with one of the common semiconductors is often necessary61. The next most promising candidate, tetragonal Tl3PbBr5, has a very large ϵ (101) but possesses a relatively smaller band gap (2.9 eV) and is computed to be thermodynamically metastable at 0K (Ehd = 16 meV per atom) according to the data obtained from the OQMD. Tl3PbBr5 is also reported in the literature to have been experimentally synthesized62,63,64, without any mention of its dielectric properties.
Discussion
We report the identification of three dielectric materials that contain a combination of high-dielectric constant and large band gap—HoClO(ϵ = 75, Eg = 5.2 eV), Eu5SiCl6O4(ϵ = 69, Eg = 5.5 eV), and Tl3PbBr5(ϵ = 101, Eg = 2.9 eV). These compounds modify the Pareto front of previously known high-throughput dielectric constants data available from the MP database. Our screening strategy also uncovers four other dielectric materials with large Eg and moderately large ϵ—Sr2LuBiO6(ϵ = 24, Eg = 2.4 eV), Bi5IO7(ϵ = 36, Eg = 2.7 eV), Bi3ClO4(ϵ = 39, Eg = 2.3 eV), and Bi3BrO4(ϵ = 39, Eg = 2.3 eV)—at the cost of conducting only 17 DFPT calculations overall. We utilize the data available in the open-source databases (OQMD, MP) to build a statistical optimization model and use it to select the best candidates after searching among 11,102 stable non-metals that are available in the OQMD. Among the newly discovered dielectrics, two mixed-anionic materials—HoClO and Eu5SiCl6O4 are shown to have tie-lines with multiple, commonly used semiconductors on their phase diagrams, that indicate their thermodynamic equilibrium.
The presence of rare earth elements such as Ho and Eu in dielectrics can be a challenge for their use in practical applications. However, the ongoing efforts toward increasing their availability such as efficient recycling of rare earth materials65,66 can result in a sufficient supply of elements for mass production of small electronic components. In particular, Ho is an underutilized element in the industry67 even though it is more abundant in the earth’s crust than other widely mined elements such as Mo, Bi, and precious metals68. Eu is more abundant on earth’s crust than Ho and some of the heavily mined elements such as W and As68. Hence, an active exploration of cheaper and easier extraction methods for rare earth elements may make it feasible to include them in mass-produced electronics in the near future. The presence of toxic elements such as Pb and Tl can stand as a barrier against including Tl3PbBr5 in consumer electronics. Since mixed-anionic materials are an emerging class of functional materials, our identification of promising dielectric materials in this family opens up further research opportunities on rational design of high-performance dielectrics and their experimental characterizations.
We also assessed the thermodynamic stability of the new dielectrics by creating a large convex hull diagram containing the best two new dielectrics (HoClO and Eu5SiCl6O4) and several commonly used materials in electronics. The relevance of this analysis is also provided in detail along with examples of previously reported high-ϵ dielectrics46,47,48,49,50,51,52,53 that were later found out to be unstable when in contact with common electronic component materials such as SiO2. Our convex hull analysis indicates that both HoClO and Eu5SiCl6O4 are stable against the common electronic materials that we considered.
To understand what features of HoClO, Eu5SiCl6O4, and Tl3PbBr5 make them the best dielectric candidates in this study, we have calculated their electronic structures and partial density of states (Supplementary Fig. 5). Our analysis shows that the top of the valence bands and bottom of the conduction bands in these compounds consists of primarily the contributions from the anions (Cl, Br) and cations (Ho, Eu, Tl), respectively. This analysis indicates that having lighter anions (such as Cl, Br) is advantageous as their valence orbitals making up the valence band edge in those compounds will have lower energies, hence, a relatively larger band gap that is desired in high-ϵ materials.
In addition to the identification of high-dielectrics, we successfully demonstrated an implementation of a cross-database statistical design for computational materials selection. Datasets from the MP and OQMD repositories are used in this work as training-data and search-space, respectively. The successful identification of new materials from such a workflow is another motivation for actively moving toward the interoperability of materials databases, which is one of the four pillars of FAIR data principles69 in scientific data management. Therefore, better interoperability across databases amplifies the flexibility in utilizing materials data while solving a complex materials problem.
Lastly, this work also stands as an example of the practical implementation of a computational design strategy for property optimization via data-informed material selection. A multi-objective optimization problem (maximizing ϵ and Eg) is converted into a single objective optimization using statistical methods (maximizing ϵ) combined with explicit constraining of band gap values (higher Eg) among materials since Eg is already available for all materials in the search-space. The deviation from the ideal, statistically benchmarked multi-objective optimization workflows27 enabled the efficient utilization of resources and resulted in the identification of three high-ϵ dielectrics at the cost of just 17 new DFPT calculations.
Methods
ANN modeling
The individual models in the ANN ensemble consisted of a single hidden layer with the number of neurons in the range of 102. The exact number of neurons varied randomly within a small range (10–30) to avoid any bias that may arise from model architecture since the subset of training-data for each ANN was randomly sampled. Each ANN ensemble consisted of 2000 independent ANNs. Thus, the ϵ-distribution for each material consisted of 2000 independent ϵ predictions. A new ANN ensemble was created and trained for each new design cycle to learn the incremented training-data. The Nadam optimizer is used for network optimization during the training. Both L2 layer regularization and early-stopping callback as implemented in Keras70, are implemented for each ANN in the ensemble to prevent over-fitting. On average, it took between 300 to 400 epochs to reach the local minimum of the loss function. Each epoch is a full iteration of fitting the training-data to update the internal weights of an ANN. Validation details of one of the randomly chosen ANN models from the ensemble are plotted in Fig. 3a for reference. Feature dimensional reduction prior to the training of ANNs was done using the principal component analysis algorithm implemented in scikit-learn34. Model validation during the training of one of the 2000 ANN models in the second design cycle is plotted in Fig. 3a.
DFPT calculations
We performed all DFT calculations using the Vienna Ab initio Simulation Package (VASP)71,72 with potentials derived using the projector-augmented wave73,74 method. We calculated the total dielectric constant (sum of electronic and ionic components) values for selected materials using DFPT as implemented in VASP. All the compounds were fully relaxed before the dielectric calculations. We used an energy cutoff of 520 eV, k-mesh of 6000 k-points per reciprocal atom, and an energy-threshold of 10−8 eV during the self-consistent calculations. The forces on the atoms after structural relaxations were less than 10−3 eV Å−1. We used the generalized gradient approximation75 to approximate the exchange-correlation energies of the electrons. A detailed discussion on DFPT calculations is provided in the Supplementary Methods section included within the Supplementary Material. We did DFPT calculations on a set of well-known dielectrics and a few rare earth compounds, and benchmarked the results against previously reported results in the literature. These results indicate the reliability of our calculated ϵ values, which are provided in Supplementary Table 2. Specifically, two rare earth oxides (EuO and Ho2O3) and one rare earth halide (EuF2) were benchmarked to test the accuracy of the standard DFPT calculations in modeling these compounds. Furthermore, our calculations reveal that no imaginary phonon modes appear in HoClO, Eu5SiCl6O4, and Tl3PbBr5, the best high-ϵ materials identified in this work. More details are provided in Supplementary Table 1 and Supplementary Fig. 4.
Data availability
The data used in building statistical design models are open-sourced and available via OQMD and Materials Project databases. Other data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The raw, unformatted codes used in this project for statistical materials design are available via Github at https://github.com/tachyontraveler/diel-design-scripts/tree/v0.1.0-alpha. The latest versions of the scripts upon release will be available in the future at https://doi.org/10.5281/zenodo.6515841.
References
Ortiz, R. P., Facchetti, A. & Marks, T. J. High-k organic, inorganic, and hybrid dielectrics for low-voltage organic field-effect transistors. Chem. Rev. 110, 205–239 (2009).
Wang, B. et al. High-k gate dielectrics for emerging flexible and stretchable electronics. Chem. Rev. 118, 5690–5754 (2018).
Kingon, A. I., Maria, J.-P. & Streiffer, S. Alternative dielectrics to silicon dioxide for memory and logic devices. Nature 406, 1032 (2000).
Shevlin, S. A., Curioni, A. & Andreoni, W. Ab initio design of high-k dielectrics: LaxY1−xAlO3. Phys. Rev. Lett. 94, 146401 (2005).
Delugas, P., Fiorentini, V., Filippetti, A. & Pourtois, G. Cation charge anomalies and high-κ dielectric behavior in DyScO3: ab initio density-functional and self-interaction-corrected calculations. Phys. Rev. B 75, 115126 (2007).
Iino, Y. et al. Organic thin-film transistors on a plastic substrate with anodically oxidized high-dielectric-constant insulators. Jpn. J. Appl. Phys. 42, 299 (2003).
Kukli, K. et al. Properties of tantalum oxide thin films grown by atomic layer deposition. Thin Solid Films 260, 135–142 (1995).
Ramajothi, J., Ochiai, S., Kojima, K. & Mizutani, T. Performance of organic field-effect transistor based on poly (3-hexylthiophene) as a semiconductor and titanium dioxide gate dielectrics by the solution process. Jpn. J. Appl. Phys. 47, 8279 (2008).
Lee, M., Youn, Y., Yim, K. & Han, S. High-throughput ab initio calculations on dielectric constant and band gap of non-oxide dielectrics. Sci. Rep. 8, 14794 (2018).
Wilk, G. D., Wallace, R. M. & Anthony, J. High-κ gate dielectrics: current status and materials properties considerations. J. Appl. Phys. 89, 5243–5275 (2001).
Petretto, G. et al. High-throughput density-functional perturbation theory phonons for inorganic materials. Sci. Data 5, 180065 (2018).
Petousis, I. et al. High-throughput screening of inorganic compounds for the discovery of novel dielectric and optical materials. Sci. Data 4, 160134 (2017).
Saal, J. E., Kirklin, S., Aykol, M., Meredig, B. & Wolverton, C. Materials design and discovery with high-throughput density functional theory: the open quantum materials database (OQMD). JOM 65, 1501–1509 (2013).
Kirklin, S. et al. The open quantum materials database (OQMD): assessing the accuracy of dft formation energies. npj Comput. Mater. 1, 15010 (2015).
Jain, A. et al. The Materials Project: a materials genome approach to accelerating materials innovation. APL Mater. 1, 011002 (2013).
Giannozzi, P. & Baroni, S. Density-Functional Perturbation Theory, 195–214 (Springer, 2005).
Curtarolo, S. et al. Aflowlib. org: a distributed materials properties repository from high-throughput ab initio calculations. Comput. Mater. Sci. 58, 227–235 (2012).
Draxl, C. & Scheffler, M. The nomad laboratory: from data sharing to artificial intelligence. J. Phys.: Mater. 2, 036001 (2019).
Choudhary, K. et al. High-throughput density functional perturbation theory and machine learning predictions of infrared, piezoelectric, and dielectric responses. npj Comput. Mater. 6, 1–13 (2020).
Pyzer-Knapp, E. O., Li, K. & Aspuru-Guzik, A. Learning from the Harvard Clean Energy Project: the use of neural networks to accelerate materials discovery. Adv. Funct. Mater. 25, 6495–6502 (2015).
Saal, J. E., Oliynyk, A. O. & Meredig, B. Machine learning in materials discovery: confirmed predictions and their underlying approaches. Annu. Rev. Mater. Res. 50, 49–69 (2020).
Park, C. W. & Wolverton, C. Developing an improved crystal graph convolutional neural network framework for accelerated materials discovery. Phys. Rev. Mater. 4, 063801 (2020).
Umeda, Y., Hayashi, H., Moriwake, H. & Tanaka, I. Prediction of dielectric constants using a combination of first principles calculations and machine learning. Jpn. J. Appl. Phys. 58, SLLC01 (2019).
Qu, J., Zagaceta, D., Zhang, W. & Zhu, Q. High dielectric ternary oxides from crystal structure prediction and high-throughput screening. Sci. Data 7, 1–10 (2020).
Morita, K., Davies, D. W., Butler, K. T. & Walsh, A. Modeling the dielectric constants of crystals using machine learning. J. Chem. Phys. 153, 024503 (2020).
Balachandran, P. V., Xue, D., Theiler, J., Hogden, J. & Lookman, T. Adaptive strategies for materials design using uncertainties. Sci. Rep. 6, 19660 (2016).
Gopakumar, A. M., Balachandran, P. V., Xue, D., Gubernatis, J. E. & Lookman, T. Multi-objective optimization for materials discovery via adaptive design. Sci. Rep. 8, 3738 (2018).
Petousis, I. et al. Benchmarking density functional perturbation theory to enable high-throughput screening of materials for dielectric constant and refractive index. Phys. Rev. B 93, 115151 (2016).
Jain, A. K., Mao, J. & Mohiuddin, K. M. Artificial neural networks: a tutorial. Computer 29, 31–44 (1996).
Balachandran, P. V., Young, J., Lookman, T. & Rondinelli, J. M. Learning from data to design functional materials without inversion symmetry. Nat. Commun. 8, 14282 (2017).
Balachandran, P. V. et al. Predictions of new ABO3 perovskite compounds by combining machine learning and density functional theory. Phys. Rev. Mater. 2, 043802 (2018).
Ward, L., Agrawal, A., Choudhary, A. & Wolverton, C. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2, 16028 (2016).
Ward, L. et al. Including crystal structure attributes in machine learning models of formation energies via Voronoi Tessellations. Phys. Rev. B 96, 024104 (2017).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Belsky, A., Hellenbrandt, M., Karen, V. L. & Luksch, P. New developments in the Inorganic Crystal Structure Database (ICSD): accessibility in support of materials research and design. Acta Crystallogr., Sect. B: Struct. Sci. 58, 364–369 (2002).
Groh, D. et al. First-principles study of the optical properties of BeO in its ambient and high-pressure phases. J. Phys. Chem. Solids 70, 789–795 (2009).
Xue, D. et al. Accelerated search for materials with targeted properties by adaptive design. Nat. Commun. 7, 11241 (2016).
Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 13, 455–492 (1998).
Solomou, A. et al. Multi-objective Bayesian materials discovery: application on the discovery of precipitation strengthened NiTi shape memory alloys through micromechanical modeling. Mater. Des. 160, 810–827 (2018).
Talapatra, A. et al. Autonomous efficient experiment design for materials discovery with bayesian model averaging. Phys. Rev. Mater. 2, 113803 (2018).
Templeton, D. & Dauben, C. H. Crystal structures of rare earth oxychlorides. J. Am. Chem. Soc. 75, 6069–6070 (1953).
Hölsä, J., Lahtinen, M., Lastusaari, M., Valkonen, J. & Viljanen, J. Stability of rare-earth oxychloride phases: bond valence study. J. Solid State Chem. 165, 48–55 (2002).
Basiev, T. et al. Hydration of strontium chloride and rare-earth element oxychlorides. Russ. J. Appl. Chem. 78, 1035–1037 (2005).
Jacobsen, H., Meyer, G., Schipper, W. & Blasse, G. Synthesis, structures and luminescence of two new Europium (II) Silicate-Chlorides, Eu2SiO3Cl2 and Eu5SiO4Cl6. Z. Anorg. Allg. Chem. 620, 451–456 (1994).
Kageyama, H. et al. Expanding frontiers in materials chemistry and physics with multiple anions. Nat. Commun. 9, 1–15 (2018).
Atanassova, E. & Spassov, D. X-ray photoelectron spectroscopy of thermal thin Ta2O5 films on Si. Appl. Surf. Sci. 135, 71–82 (1998).
Schlom, D. G. & Haeni, J. H. A thermodynamic approach to selecting alternative gate dielectrics. MRS Bull. 27, 198–204 (2002).
Alers, G. et al. Intermixing at the tantalum oxide/silicon interface in gate dielectric structures. Appl. Phys. Lett. 73, 1517–1519 (1998).
Perego, M., Seguini, G., Scarel, G., Fanciulli, M. & Wallrapp, F. Energy band alignment at TiO2/Si interface with various interlayers. J. Appl. Phys. 103, 043509 (2008).
McCurdy, P. R., Sturgess, L. J., Kohli, S. & Fisher, E. R. Investigation of the PECVD TiO2–Si (1 0 0) interface. Appl. Surf. Sci. 233, 69–79 (2004).
George, J. P. et al. Preferentially oriented BaTiO3 thin films deposited on silicon with thin intermediate buffer layers. Nanoscale Res. Lett. 8, 1–7 (2013).
Hu, X. et al. The interface of epitaxial SrTiO3 on silicon: in situ and ex situ studies. Appl. Phys. Lett. 82, 203–205 (2003).
Goncharova, L. et al. Interface structure and thermal stability of epitaxial SrTiO3 thin films on Si (001). J. Appl. Phys. 100, 014912 (2006).
Barber, C. B., Dobkin, D. P. & Huhdanpaa, H. The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22, 469–483 (1996).
Sun, W. et al. The thermodynamic scale of inorganic crystalline metastability. Sci. Adv. 2, e1600225 (2016).
Wu, Y., Lazic, P., Hautier, G., Persson, K. & Ceder, G. First principles high throughput screening of oxynitrides for water-splitting photocatalysts. Energy Environ. Sci. 6, 157–168 (2013).
Zakutayev, A. et al. Theoretical prediction and experimental realization of new stable inorganic materials using the inverse design approach. J. Am. Chem. Soc. 135, 10048–10054 (2013).
Pal, K. et al. Accelerated discovery of a large family of quaternary chalcogenides with very low lattice thermal conductivity. npj Comput. Mater. 7, 1–13 (2021).
Zhou, J.-P. et al. Properties of high k gate dielectric gadolinium oxide deposited on Si (1 0 0) by dual ion beam deposition (DIBD). J. Cryst. Growth 270, 21–29 (2004).
Kwo, J. et al. Properties of high κ gate dielectrics Gd2O3 and Y2O3 for Si. J. Appl. Phys. 89, 3920–3927 (2001).
Robertson, J. High dielectric constant gate oxides for metal oxide Si transistors. Rep. Prog. Phys. 69, 327 (2005).
Keller, H.-L. Darstellung und kristallstruktur von hoch-Tl3PbBr5. J. Less-Common Met. 78, 281–286 (1981).
Denysyuk, N. et al. Electronic structure of the high-temperature tetragonal Tl3PbBr5 phase. J. Alloy. Compd. 576, 271–278 (2013).
Ferrier, A., Velázquez, M., Portier, X., Doualan, J.-L. & Moncorgé, R. Tl3PbBr5: a possible crystal candidate for middle infrared nonlinear optics. J. Cryst. Growth 289, 357–365 (2006).
Qiu, Y. & Suh, S. Economic feasibility of recycling rare earth oxides from end-of-life lighting technologies. Resour. Conserv. Recycl. 150, 104432 (2019).
Amato, A. et al. Sustainability analysis of innovative technologies for the rare earth elements recovery. Renew. Sustain. Energy Rev. 106, 41–53 (2019).
Thornton, B. F. & Burdette, S. C. Homely holmium. Nat. Chem. 7, 532–532 (2015).
Yaroshevsky, A. Abundances of chemical elements in the earth’s crust. Geochem. Int. 44, 48–55 (2006).
Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9 (2016).
Chollet, F. et al. Keras. https://keras.io (2015).
Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Comput. Mater. Sci. 6, 15–50 (1996).
Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169 (1996).
Kresse, G. & Joubert, D. From ultrasoft pseudopotentials to the projector augmented-wave method. Phys. Rev. B 59, 1758 (1999).
Blöchl, P. E. Projector augmented-wave method. Phys. Rev. B 50, 17953 (1994).
Perdew, J. P., Burke, K. & Ernzerhof, M. Generalized gradient approximation made simple. Phys. Rev. Lett. 77, 3865 (1996).
Acknowledgements
This work was funded by the SAMSUNG Global Research Outreach Program, and the U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD) award 70NANB14H012. We acknowledge the computing resources provided by (1) the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231, (2) Quest high-performance computing facility at Northwestern University which is jointly supported by the Office of the Provost, the Office for Research, and Northwestern University Information Technology, and (3) the Extreme Science and Engineering Discovery Environment (National Science Foundation Contract ACI-1548562).
Author information
Authors and Affiliations
Contributions
A.G. devised computational strategies, wrote the manuscript, and conducted the calculations. K.P. provided important hands-on guidance in calculations and theoretical understanding. A.G. and C.W. modeled the project and analyzed the results. All authors have reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gopakumar, A., Pal, K. & Wolverton, C. Identification of high-dielectric constant compounds from statistical design. npj Comput Mater 8, 146 (2022). https://doi.org/10.1038/s41524-022-00832-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41524-022-00832-5