Introduction

In the beginning of the 1970s it was empirically found that the band-gap of cubic semiconductors with the NaCl structure is inversely proportional to the square of the lattice constant, i.e., that the band-gap should be proportional to V−2/31,2, where V is the volume of the unit cell. This variation of the band gap with respect to the volume is commonly measured by the deformation potential

$${{{\Xi }}}_{{{{\rm{gap}}}}}=V\frac{{{{\rm{d}}}}{E}_{{{{\rm{gap}}}}}}{{{{\rm{d}}}}V}=\frac{{{{\rm{d}}}}{E}_{{{{\rm{gap}}}}}}{{{{\rm{d}}}}\ln (V)}.$$
(1)

One can equally define a response function with respect to the pressure P as

$${a}_{P}=\frac{{{{\rm{d}}}}{E}_{{{{\rm{gap}}}}}}{{{{\rm{d}}}}P}=-\frac{{{{\Xi }}}_{{{{\rm{gap}}}}}}{K},$$
(2)

that is trivially related to Ξgap by the bulk modulus \(K=-{{{\rm{d}}}}P/{{{\rm{d}}}}\ln (V)\). Later works studied deformation potentials of several other semiconductors, e.g., for wurtzite AlN, GaN, and InN3,4. A more complete list of deformation potentials can be found in ref. 5 (including a large number of zincblende and wurtzite semiconductors) and in ref. 6.

Theoretical models have been developed to explain the chemical trend of the values of deformation potentials in binary semiconductors with diamond and zincblende crystal structures7. A simple tight-binding model considers the minimum of the conduction band (CBM) consisting of anti-bonding s states and the maximum of the valence band (VBM) consisting of p bonding states. In that case, one expects a blue shift of the CBM with decreasing volume due to (i) stronger repulsion for the anti-bonding states with decreasing bond length and (ii) larger kinetic energy. The second effect should be identical for different systems, however one expects that the magnitude of (i) decreases for materials with anions with deeper s-levels, like halogens. For the VBM, the kinetic effect is similar, but the p bonding states shift to lower energies for shorter bonds. Therefore, due to the cancellation of these two effects, one expects a smaller absolute value of the valence term ΞVBM with respect to that of the conduction term ΞCBM7, with the following definition of the valence and conduction deformation potentials

$${{{\Xi }}}_{{{{\rm{CBM}}}}/{{{\rm{V\; BM}}}}}=V\frac{{{{\rm{d}}}}{E}_{{{{\rm{CBM}}}}/{{{\rm{V\; BM}}}}}}{{{{\rm{d}}}}V}=\frac{{{{\rm{d}}}}{E}_{{{{\rm{CBM}}}}/{{{\rm{V\; BM}}}}}}{{{{\rm{d}}}}\ln (V)},$$
(3)

where ECBM/V BM are the energies of the conduction band minimum and the valence band maximum, respectively. Following this model, and using Ξgap = ΞCBM − ΞVBM, one can explain the experimentally observed trend of increasing gap with decreasing volume mentioned above.

Of course, one should keep in mind that this model was developed for a particular family of materials, and cannot be easily generalized to arbitrary semiconductors. In fact, the existence of exceptions has been already known for a long time. For example, in lead chalcogenides the V−2/3 scaling of the band gap is not verified8,9: in fact the value of the band gap decreases going from PbS, to PbTe, and then to PbSe10,11, even though the lattice constant of PbTe is much larger than the one of PbSe.

The large amount of studies on deformation potentials is not surprising considering the importantance of the variation (or the stability) of the band gap in heterostructures inside modern opto-electronic devices, such as LEDs and transistors. One of the most basic sources of strain comes from lattice mismatch at interfaces or from different thermal expansion coefficients. In such cases large deformation potentials can be an undesirable property. In fact, in an epitaxial configuration the different lattice parameter of the materials needs to be accommodated, introducing strain in the junction3. Strain, besides being a possible source of structural defects, can modify some fundamental properties of the materials such as the position of the valence-band maximum and conduction-band minimum, and consequently the band gap. This effect must be carefully considered in situations where gap tuning is imperative, as in InGaN/GaN photovoltaic devices and LEDs12. For some applications a controlled mismatch might actually be desired, giving origin to the so-called strain engineering13.

Strain engineering is attracting growing attention, particularly for 2D materials14 that exhibit large deformation potentials and better tolerance to mechanical strain than their bulk counterparts. For example, in tunneling field effect transistors made of MoS2/BN heterojunctions15 quantum band-to-band tunneling replaces thermal injection, with tunneling currents depending directly on the band-gap size. As a consequence, the on–off switching is considerably faster than in conventional field effect transistors. Because the size of the band gap can be tuned by the application of strain, strain engineering is a promising way of adjusting the properties of such devices. Controlled strain can also induce indirect–direct band-gap transitions in semiconductors, for example in SiGe alloys16. If the material has a large gap deformation potential this would imply enabling, e.g., light emission by strain and changing in a controlled way the color of the emitted light. Previous investigations also showed that the gap variation with respect to strain plays a key role in acoustic carrier scattering (see ref. 17 and references therein), with important consequences, e.g., for thermoelectrics.

To the best of our knowledge, there is no systematic investigation of gap deformation potentials for a large class of crystal structures and a variety of chemical compositions. As such, at the beginning of our investigation we did not know what makes a material ’normal’, nor we knew what are the ’exceptional’ materials that exhibit extreme values of deformation potentials. We decided therefore to answer this open question by performing calculations of gap deformation potentials for ~5300 semiconductors and insulators. The obtained dataset is then used for machine learning regression of deformation potentials and to understand which physical and chemical characteristics are correlated with extreme values of this property.

The remainder of this paper is organized as follows. In section “Results and discussions” we discuss the calculated deformation potentials. First we try to understand the data by analyzing the correlations between the deformation potentials and other properties of the systems. We then develop machine learning models capable of interpreting and predicting the data and we analyze in more detail some specific materials that we found to have exceptional values for the deformation potentials, followed by a short conclusion. In section “Methods” we present the details of the calculations involved, along with a description of the dataset used.

Results and discussions

Volume deformation potentials

The variation of the band gap with the volume depends on two factors: the variation of the cell volume and the change in the internal parameters (ui, including both the position of the atoms in the unit cell, the cell angles and ratio between the cell parameters), i.e.,

$$\frac{{{{\rm{d}}}}{E}_{{{{\rm{gap}}}}}}{{{{\rm{d}}}}\ln (V)}=\frac{\partial {E}_{{{{\rm{gap}}}}}}{\partial \ln (V)}+\mathop{\sum}\limits_{i}\frac{\partial {E}_{{{{\rm{gap}}}}}}{\partial {u}_{i}}\frac{\partial {u}_{i}}{\partial \ln (V)}.$$
(4)

The first term is rather straightforward to calculate, and is therefore well suited for high-throughput studies. The second term requires geometry optimizations of both the cell parameters and positions of the ions at constant volume (or pressure) and is much more involved to calculate. We also know that, for group IV, III–V, and II–VI semiconductors, the effect of the second term is about 2% of the first term7, meaning that it can (to some extent) be disregarded. However, this behavior is not general, and we can expect that for some systems the second term gives a more noticeable contribution to the deformation potentials. As it will be discussed later, the size of the error is relevant for small to medium deformation potentials but it does not change the qualitative description of the largest deformation potentials in our set.

With this in mind, we decided to use a two step approach. First, we screen all systems using only the first term of Eq. (5) as an estimator of the deformation potential

$${\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}=\frac{\partial {E}_{{{{\rm{gap}}}}}}{\partial \ln (V)}.$$
(5)

We calculate this value using a two-point finite difference approximation to the derivative, obtained by straining the material isotropically and keeping the internal coordinates of the atoms unchanged. To strain the materials we compress lattice vector by 1%, i.e., after shrinking the new lattice vectors are simply \({{{{\bf{a}}}}}_{i}^{\prime}={{{{\bf{a}}}}}_{i}(1-\gamma )\), with γ = 0.01. By construction this is equivalent to a reduction in the volume of approximately 3γ = 0.03 while keeping the cell shape (angles and relative sizes of lattice vectors) constant. Of course this approach does not take into consideration, at least at this stage, the possible anisotropic response of the crystal, that can be quite relevant to its physics. Therefore, for all interesting compounds we validate the results by performing a full calculation of Ξgap. In the following, we will see that this procedure just leads to a couple of false positives in the list of ’exceptional’ materials.

Distribution of values

In Figs. 1 and 2 we present distributions of estimators of the deformation potential (in gray, as given by Eq. (5)), and in Table 1 we show the statistical measures that characterize them. As we can see, the overall distribution is not centered at zero, but at −1.9 eV. Furthermore, the curve is highly skewed towards negative values of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\). Consequently, for the majority of materials the bandgap increases when they are compressed, which is the commonly expected behavior. We can also see that for negative deformation potentials, the curve has a fat tail, that extends up to −15 eV, while for positive values, the curve goes much faster to zero. From a visual inspection of the plot we can define ’normal’ materials those crystalline compounds that have a deformation potential between around −5 and 3 eV, while we can define ’exceptional’ materials those that exhibit values smaller than −10 eV or larger than 4 eV (this last approximately defined from the range of relative frequencies below 0.5%).

Fig. 1: Histogram of deformation potential estimator.
figure 1

Histograms of the estimators of the gap deformation potential (\({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\)) for different chemical systems (All, Chalcogenides, and Others as defined in the main text). The values are calculated within the PBE approximation.

Table 1 Statistical analysis of distribution.

To make a preliminary analysis of how the chemistry influences the deformation potentials, we also plotted the distribution for subgroups of materials containing oxides, halides, oxyhalides, chalcogenides, and the remainder materials (see Figs. 1 and 2). We can see that the fat tail for negative values of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) is mostly due to oxides, halides, and oxyhalides, with oxides yielding the largest contribution. On the other hand, materials with large positive values are almost exclusively halides. In any case, the maximum of the distribution seems to be rather insensitive to the chemistry, and we find in all subgroups materials with positive and negative values of the deformation potentials.

Fig. 2: Histogram of deformation potential estimator (cont.).
figure 2

Histograms of the estimators of the gap deformation potential (\({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\)) for different chemical systems (Oxides, Halides and Oxyhalides as defined in the main text). The values are calculated within the PBE approximation.

We plot in Fig. 3 a density plot of PBE band gaps as a function of the corresponding deformation potential. Most data points are concentrated in the region of band gaps smaller than around 2 eV with a slow decay (fat tail) for larger band gaps. From the plot, we can see that there exists only a weak correlation between the two quantities, so we do not expect the band gap to be a good indicator of the magnitude (or of the sign) of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\).

Fig. 3: \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) vs. gap heat map.
figure 3

Density plot of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) with respect to the band gap. The values are calculated within the PBE approximation.

Finally, we would like to discuss the accuracy of our results in comparison to experimental values. Unfortunately, it is hard to find a large variety of experimental values for the hydrostatic deformation potentials in the literature. Most of the available values are for simple semiconductors and their alloys (which we do not consider here). The few materials for which we can compare our estimator to reported experimental values are in Table 2. Since these compounds are mostly cubic with direct gaps, the estimator should match the real value of Ξ. We see that the PBE results are in very good agreement with experiment, with a tendency to overestimate the deformation potentials. We emphasize, however, that the set of materials in Table 2 is very small, and that this behavior may not hold for other more complicated compounds.

Table 2 Comparison to experimental data.

Correlations

We further investigate the correlation of the estimator of the gap deformation potential with other materials properties. This is important in view not only of the interpretation of the results, but also of the machine learning models that we develop in the next section. We considered possible correlations with the volume of the unit cell, the band gap, several elastic properties (bulk and shear moduli, Poisson ratios obtained from Materials Project database), the space group of the crystal structure, the net charges of the atoms, and the mean hole (\({m}_{{{{\rm{h}}}}}^{* }\)) and electron (\({m}_{{{{\rm{e}}}}}^{* }\)) effective masses. Concerning the calculations of charges, we applied both Bader18,19 and DDEC620 analysis, while to get effective masses we interpolated the eigenvalues at regular k-points using BoltzTrap221,22. Furthermore, we considered a series of properties of the chemical elements that compose each compound using Matminer23. We used for each property the maximum, minimum, average and standard deviation of the values of the constituent elements, as is common in machine learning applications24.

In Table 3 we list the Spearman25 and Pearson25 correlation coefficients between the deformation potential \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) and various materials properties. We only list properties for which at least one correlation coefficient is larger than 0.2. A full list can be found in the Supplementary Information.

Table 3 Correlation analysis.

The first thing to notice is that we do not find any strong correlation between the value of deformation potential and any other property. The largest value we find in Table 3 is the Spearman anticorrelation with the effective hole mass, yielding the value −0.42. This is most likely due to outliers reducing the Pearson correlation while leaving the Spearman correlation unchanged. Therefore, we can conclude that outliers, materials with exceptionally large (positive) deformation potentials have abnormally low hole effective masses. As we will see in the following, this happens because most of the materials with record positive deformation potentials are halide perovskites that have, indeed, untypically low hole masses26. We think that hole effective masses are important as their value depends strongly on the localization of bonding electrons, and therefore on the specific bond type and bond length, e.g. through changes of the hybridization of atomic states building the top valence. In this sense, hole effective masses are correlated with both bond lengths and band widths (and indirectly with band-gap sizes). In view of these considerations we can motivate that hole masses can be good features to predict deformation potentials.

Next we find correlations with the mean value of the row of the periodic table and of the atomic mass, that effectively describe the position in the periodic table of the constituent elements. These are generally known to be effective descriptors in composition-based machine learning models as they reveal a lot about the electronic structure of the elements. Only then do we arrive at the correlation with the unit cell volume and anti-correlation with the band gap. We also find that DDEC6 charges are anti-correlated with the deformation potential, much more than Bader charges. This can be attributed to the fact that they attempt to represent accurately the electrostatic interaction in materials, while Bader charges do not achieve this objective27. The (Hill’s averaged) bulk modulus and shear modulus also make an appearance in Table 3 but they show a very weak correlation. This is not unexpected as several mechanically similar materials with widely differing electronic properties exist in the dataset.

Machine learning

To better understand our results, we trained a simple explainable machine learning model to predict deformation potentials. We used model agnostic supervised local explanations (MAPLE)28 in combination with random forests29. MAPLE models provide local explanations, feature selection, locally linear models and at least the accuracy of random forests28. As features we used all material and chemical properties that we considered in the previous section.

Using ten-fold cross-validation we achieved a mean absolute error of 0.83 ± 0.04 eV. This error should be compared to the mean absolute value and standard deviation of the cross-validation test sets of 2.54 and 2.74 eV. This means that the machine learning model has only a limited predictive capability for the deformation potential, which is perhaps expected due to the rather weak correlations found in Table 3. Also, the features selected (automatically) by MAPLE as the most relevant for the prediction (see Table 4) are not surprisingly found in Table 3. In view of these results, we conclude that simple chemical intuition, embodied in our interpretable machine learning models, is insufficient to satisfactorily explain the nature of the deformation potential of a material.

Table 4 Feature importance.

We also trained a machine learning model with better representation ability, to check if a more complex model is capable of reproducing the deformation potentials. We used a crystal graph convolutional network30 with embedding size of 64 for the atomic representations before pooling, 128 after pooling, and three message passing steps. The network was trained for 1000 epochs with Adam31, a learning rate of 0.01 and a batchsize of 64. As we tested larger and smaller network sizes with no improvement to the results, all other parameters were left at their default values in the CGCNN implementation by Xie et al. We used a split of 0.8/0.1/0.1 for training, validation, and testing, and the model with the best validation error was selected for testing. The mean absolute error for the training set at the end of the training was varying between 0.2 and 0.35 eV for different runs. This corresponds to roughly half the validation error, leading to an acceptable level of overfitting. The final mean absolute error for the test set was 0.81 eV, not significantly better than the error obtained with MAPLE. Therefore, we assume that no sufficient data was available for the CGCNN model to improve its performance over the MAPEL model. Although the error is considerable, the machine is still able to efficiently identify the exceptional materials (according to the definition given before), based only on the structure. To demonstrate this we checked which materials from Table 5 were present in the test set and confirmed whether they were correctly predicted as compounds with a high absolute deformation potential by the CGCNN model. We repeated this process for four more random data set selections finding that, in the three cases when materials from Table 5 were present, 4 out of 6, 4 out of 4, and 3 out of 4 were correctly identified as ’extreme’ materials. We also averaged the test errors of these five runs to arrive at an MAE of 0.82 ± 0.05 eV. The data and code to reproduce the machine learning models are available at https://github.com/hyllios/utils/tree/main/models/gap_deformation_potentials.

Table 5 Full deformation potential for specific entries.

Specific materials

After having performed an overall analysis of the results, we select the materials with highest absolute \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) for further analysis. We start by computing the full deformation potential, Ξgap, i.e. including the effects of the change in internal parameters under pressure, for the 15 materials with the largest (positive) and the 15 materials with the lowest (most negative) deformation potentials. In practice this was done by straining isotropically the lattice vectors by ±1%, and by performing a geometry optimization at fixed volume. For each optimized structure, we obtained the respective band structure via interpolation, using the BoltzTrap221,22 package. From the resulting values of the band gap at the three volumes (strained, stretched, and fully optimized) we computed Ξgap, using both the PBE and mBJ approximations for the exchange-correlation functional. The resulting values are shown in Table 5 together with the corresponding Materials Project ID for an unequivocal identification of the material.

By comparing the values of ΞPBE to the values of the \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) we see that these are quite close for the vast majority of the materials, showing that \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) is on average a good estimator of the deformation potential. There are some exceptions, however, and we find a few false positives in the table, the most striking of which are LiAsF6, LiPF6, Tl3AsF6, and Rb3SbF6. This is not unexpected, as there are materials where we can observe a strong variation of internal parameters with pressure32,33.

We also see that in most cases the PBE and mBJ approximations lead to similar values for the deformation potential, with discrepancies that often lie below 1 eV. However, it turns out that for some systems, and in particular few systems with extreme positive deformation potentials, these functionals yield different results. This happens for CsSnCl3, RbSnBr3, RbSnCl3, Cs2TlSbF6, K2InSbCl6, and CsGeBr3. At the moment, it is not clear what is the source of this difference, nor which functional is best, due to the lack of experimental results for the deformation potentials of these specific systems.

In the following we analyze in more detail the compounds included in Table 5. Band structures and state-resolved densities of states for all these materials can be found in the Supporting Information.

  1. a.

    GeO2 Rutile germanium oxide is a well-known material which, thanks to its optical properties34 finds applications in lens and optic-fiber manufacturing35. It has been the focus of several studies in the literature, being sometimes described as an ’ultra-wide band gap’ material34,36, presenting a direct gap of 4.68 eV37. The sensitivity of the electronic properties of GeO2 with respect to pressure is also known. In ref. 38 a PBE (G0W0) value of −13.05 eV (−17.30 eV) for Ξgap was found by studying the gap variation with volume (at fixed c/a ratio), which is in good agreement with our estimator of −13.71 eV. Note, however, that the assumption of constant c/a ratio is not representative of the actual response of the material under pressure39. In fact, relaxing this assumption decreases the PBE value by around 25% (see Table 5).

  2. b.

    SrSnO3, \(BaSr{(Sn{O}_{3})}_{2}\), Sr2AlSbO6, and Sr2GaSbO6 Gathering SrSnO3, BaSr(SnO3)2, Sr2AlSbO6 and Sr2GaSbO6 we obtain a group of perovskites and double perovskites with large deformation potentials, some of which were already reported in the literature40,41. Besides SrSnO3, we remark that also BaSnO3 appears in our study with a considerable deformation potential (\({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}=-11.69\) eV). In refs. 40,41 it was already recognized that SrSnO3 and BaSnO3 exhibit a very large variation of the band gap with volume, while the gap is not very sensitive to other types of strain. This behavior could be well modeled using a Fermi gas, and therefore the authors describe the strong variation of the gap as a signature of simple s-electron behavior in these complex oxides41.

    The mixing of Sr with Ba leads to the mixed perovskite BaSr(SnO3)2 that displays a \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) intermediate between the one of the two parent phases. A similar behavior can also be verified in other mixed perovskites, e.g. KNbO3 (−1.94 eV), NaNbO3 (−4.60 eV), and KNa(NbO3)2 (−2.49 eV). This observation hints to the possibility of tuning deformation potentials of materials via cation mixing.

    For other double perovskites such as Sr2AlSbO6 and Sr2GaSbO6, the parent ternary perovskites SrSbO3 and SrAlO3 are thermodynamically unstable (energy above hull larger than 50 meV/atom), while the structure of SrGaO3 is not included in Materials Project. Therefore, their \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) are not calculated in the present work. Another related double perovskite, Ba2InSbO6, also displays a highly negative deformation potential of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}=-11.17\) eV.

    For these perovskites, the values of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) differ slightly between antimonates and their stannate counterparts. However, there is a clear trend of less negative \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) in double perovskites going down group III from Al to In. Moreover, when substituting Sr with heavier Ba, the absolute value of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) also decreases. This indicates that in (double) perovskites that share a common anion and have cations in the same elemental group, the heavier the cation the less negative is \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\). Similar trends are found in ref. 42 for ABX3 halide perovskites.

  3. c.

    Al2O3 α-Al2O3 is the alumina polymorph corresponding to the mineral corundum. As sapphire it finds applications as a window material in high pressure experiments, while its doped counterpart ruby is used as a pressure indicator in diamond anvils. Previous calculations of gap versus pressure43 already remarked its extraordinary behavior: a volume gap deformation potential of −12.51 eV was estimated in good agreement with the present results. The values of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) for isomorphic α-Ga2O3 and α-In2O3 are respectively −10.45 and −6.83 eV, indicating that the deformation potential increases from Al to In, in agreement with the trend in perovskites discussed above.

  4. d.

    \(BaGe{(P{O}_{4})}_{2}\) and \(SrGe{(P{O}_{4})}_{2}\) BaGe(PO4)2, SrGe(PO4)2 are germanium double orthophosphates with crystal structures resembling that of the mineral yavapaiite (KFe(SO4)2, space group C2/m). They can be used as ceramic materials for microwave applications44. These two compounds turn out to have much lower absolute values of Ξgap than of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\), and therefore do not exhibit exceptional values of the deformation potential.

  5. e.

    TeWO6 TeWO6 crystallizes in the \(Fm\bar{3}m\) space group and exhibits a record negative deformation potential (among the materials we studied). The \(R\bar{3}\) phase of TeWO6 (the same as LiAsF6 and LiPF6) has a less negative \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) of −7.57 eV. It is also interesting to remark that there are some tellurium double perovskites with a large negative \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\), such as CaTeBa2O6 (−11.20 eV) and MgTePb2O6 (−11.18 eV). However the gaps of their tungsten counterparts are much less sensitive to compression, e.g. CaWBa2O6 (−3.36 eV) and MgWPb2O6 (−2.02 eV).

  6. f.

    LiAlO2 Lithium aluminate (α-LiAlO2) is one of the most efficient tritium breeding materials due to excellent performance under high radiation. The pressure-induced gap change of various phases of LiAlO2 has been investigated using DFT, yielding an estimated Ξgap of −8.1 eV45, in good agreement with our results.

  7. g.

    LiAsF6 and LiPF6 Lithium hexafluorophosphate and lithium hexafluoroarsenate are commonly used as the main conducting lithium salt in carbonate-based electrolytes for Li-ion batteries46. Similarly to TeWO6, these two compounds crystallize in the \(R\bar{3}\) space group. It turns out that there is a strong interplay between pressure and internal coordinates in these systems, leading to a strong decrease of the value of the deformation potential when these effects are taken into account. For example, in some classes of semiconductors it was already observed in literature that the band gap is more sensitive to changes of internal parameters (e.g., selective changes of bond lengths) than to volume changes. A typical example is Cu(In,Ga)(S,Se)2 and more in general ABC2 chalcopyrites32,47, or Cu2ZnSnS4 and Cu2ZnSnSe433, whose gaps are known to be strongly dependent on small variations of internal parameters. In these systems it is well understood that changes of the internal parameters, e.g. changes of bond distances, have an effect on the band gap as the valence band is made of pd hybridized states, where p electrons come from the anion and d electrons from the cation participating in the bond. A change of bond length has a strong impact on the hybridization and therefore on the valence band width, leading to significant band gap variations.

  8. h.

    PON Phosphorus oxynitride, PON, is an isoelectronic analog of SiO2. Like the latter, it has a complex PT phase diagram, and the selected structure is a high-pressure α-quartz-like phase48. The gap of α-quartz SiO2, on the other hand, is much less sensitive to the volume change (\({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) = −7.57 eV). This can can be explained again by the fact that the pp interaction is stronger between P/N and O than that between Si and O.

  9. i.

    AlSbO4 Aluminum antimonate can be seen as an analog to aluminum phosphate (AlPO4) and arsenate (AlAsO4) for which \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) = −7.12 and −8.76 eV, respectively. Going from P to Sb, the pp and ss coupling with O is slightly weaker, but with a stronger pd interaction between Sb–O and As–O. As such, ΞV BM can be much higher, leading to a more negative \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\).

  10. j.

    GaBO3 Gallium borate is isostructural with calcite CaCO3. The other calcite-like borates, such as AlBO3 and InBO3 have smaller magnitude of \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) (−4.55 and −8.41 eV, respectively). This is not the same trend observed for the various group III compounds discussed above. The difference can be explained by the fact that in AlBO3 the conduction band maximum is dominated by pp anti-bonding states coming from the B–O interaction, which can considerably lower the deformation potential of the conduction band maximum in AlBO3.

  11. k.

    Hg Liquid mercury is metallic in normal conditions, and it even becomes superconducting at very low temperature. However, several other phases of mercury have been predicted theoretically. Taking the Materials Project database49 as a reference, we can find several structures within a distance of 20 meV per atom from the convex-hull of stability. These range from metals to semiconductors with gaps of up to almost 1.5 eV. This is in our opinion an indication of the extreme sensitivity of mercury’s electronic structure to its crystal structure. It has been noted in the literature50,51 that the band structure of certain phases of mercury is very sensitive to pressure (and indeed to calculation parameters), and that a band gap opens up upon expansion. This is in agreement with the positive value of the deformation potential that we find here. This variation was connected to the change in character of the states close to the Fermi level that go from p-like to having an increased s-character for decreasing density51. The reason why mercury comes out as semiconducting in our calculations is very likely due to the use of PBE which tends to underbind and therefore leads to larger lattice parameters. For example, for the R\(\bar{3}\)m (#166) phase, the trigonal lattice constant is 2.9863 Å at 5 K52, while the PBE gives 3.179 Å. In conclusion, if we could experimentally access expanded phases of mercury, these would likely exhibit extremely large positive values of the gap deformation potential.

  12. l.

    Halide perovskites The system we found with record positive deformation potential is the halide perovskite RbSnCl3. Moreover, in the list of exceptional materials with positive deformation potentials we find a series of perovskites of composition ABX3, such as {Rb,Cs}Sn{Cl,Br}3 and TlGeBr3, and their counterparts double perovskites {K,Rb,Cs}2{In,Tl}Sb{F,Cl,Br}. Of course, the latter can be easily derived from the former by doubling the lattice (usually resulting in a face-centered cubic configuration), and performing the isoelectronic modification 2Sn → InSb or 2Sn → TlSb. Halide perovskites are known to be exceptional materials, excelling as absorbers for photovoltaic devices, achieving efficiencies of more than 20%53. In recent reviews54,55 it has already been discussed how the electronic structure and band gap of this class of materials can be tuned via changes in atomic structure. For example, iodide perovskites and related compounds show a very wide range of band gaps from ≈1.2 to ≈4.0 eV55. The same sensitivity of the band gap can also be observed in mixed-halide perovskites56. Trends in band gap are often dictated by structural subtleties involving geometric distortions and the ordering of anions56. Moreover, these materials have exceptionally small effective hole masses due to a strong hybridization between the s-states of the B-site element and the p-states of the halogen near the Fermi energy26. In ref. 54 it is also noted that a compression of the unit cell (for example, by chemical substitution, compressive biaxial strain, or high pressure) narrows the band gap, in agreement with the positive deformation potential we found here. This is explained by the fact that compression forces the B- and X-site ions closer together, enhancing the antibonding interactions at the top of the valence band, thereby raising its energy and decreasing the band gap. It is also known that the variation of the band gap comes predominately from the variation of the valence band maximum, while the conduction band minimum has a positive, small variation42. That these materials exhibit high deformation potentials is even more extraordinary when we remark that this happens for a large range of band gaps (see Table 5). This points to the fact that the unique hybridization of the orbitals between the B- and X-atoms in these structures makes indeed halide perovskites materials with extreme properties.

  13. m.

    Tl3AsF6 and Rb3SbF6These two structures are the pathological cases of Table 5, since their calculated deformation potential differs considerably from the screened value. Both materials share the same hypothetical structure, derived from TlAsF657 and KAsF6. Tl3AsF6 and Rb3SbF6 are, in any case, rather close to the convex hull of stability58,59, lying respectively at 84 and 94 meV/atom above the hull with the PBE approximation, 98 and 94 meV/atom with PBE for solids60, and 158 and 135 meV/atom with the SCAN functional61. The electronic band structure of Tl3AsF6 is rather peculiar, which might explain the behavior of Ξgap. The valence band is flat and isolated, with a gap of around 1 eV to the next occupied band. The valence band is comprised of s−As and p−F states, with the latter dominating. For the conduction band the situation is more complicated as PBE and mBJ lead to qualitatively different results. In PBE, the band gap is indirect, from near N to F. On the other hand, with mBJ the gap is direct, from Γ to Γ. The bands are mostly comprised of p orbitals of Tl, As and F (in decreasing order of DOS magnitude), consistently for both functionals. The situation for Rb3SbF6 is similar, as it also exhibits an isolated valence band with similar orbital decomposition (albeit with Sb taking the place of As). In addition, the conduction bands are quite differently described by both functionals.

  14. n.

    SnS This is the high-temperature orthorhombic β-phase of tin sulfide (TlI-type, Cmcm, #63)62,63. The λ-type transition between the low-temperature GeS-type (Pnma, #62) α-phase and the high-temperature β-phase is essentially due to the continuous movement of the Sn and S atoms along the [100] direction63. The Pnma phase can also be seen as a small distortion of the more symmetric high-temperature Cmcm crystal structure64. For comparison, our estimated \({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) for the α-phase (mp-2231) is 2.0 eV. The high value of our estimator for this material (\({\widetilde{{{\Xi }}}}_{{{{\rm{gap}}}}}\) = 4.54 eV) is considerably reduced to ΞPBE = 2.05 eV allowing for the internal relaxation of the structure, a value much closer to the one we obtained for the α-phase. This is therefore a false positive.

  15. o.

    LiAuS LiAuS synthesizes in an orthorhombic space (Fddd, #70)65, in a structure that has been described as composed of ’chicken-wire-like’ layers formed by interweaving (AuS)\({}_{n}^{n-}\) threads66. Also of note is that a cubic phase of LiAuS has been identified as the most interesting strong topological insulator (together with NaAuS) of all possible half-Heusler compounds67. It was also found that the band gap (and band inversion) in this compound changes rapidly, and with a non-monotonic behavior) with bi-axial strength67. Although we could not find electronic structure calculations for this compound in the literature, there are calculations for the related semiconducting compound NaAuS68. For this compound the top of the valence is composed of s- and d-Au states hybridized with the p-states of S, while the conduction band results from a mixture of states from all atoms. As can be seen in the Supplementary Information, the same is true for LiAuS.

In conclusion, we presented a large-scale dataset of gap deformation potentials for a wide variety of semiconductors. This dataset covers a large portion of the periodic table and is representative of the possible crystal structures in which we find semiconductors. Analysis of the data shows that the distribution of deformation potentials is peaked at around −2 eV, and quite skewed towards negative values with a fat tail that extends to almost −15 eV. We find also a sizeable fraction of systems with positive deformation potentials, extending to around 5 eV.

Aiming at rationalizing this data, we searched for correlations between the deformation potentials and several other materials and compositional properties. We found an anti-correlation with the effective hole mass and the band gap, and a positive correlation with the mean row of the periodic table, the mean atomic mass and the volume of the unit cell. In any case, it turns out that all correlation coefficients are relatively small. We then constructed machine learning models designed to interpret and predict the values of the deformation potentials. The models exhibited a limited accuracy, showing the difficulty to describe deformation potentials in terms of simple chemical quantities for a large set of materials. In any case, the developed models can be successfully applied for the identification of materials with extreme values of this quantity.

Finally, we analyzed in more detail some of the exceptional materials we found. With negative values of the deformation potential we find exclusively oxides, the record being TeWO6. On the other side of the spectrum appear mostly halide perovskites, the record belonging to RbSnCl3. In contrast with the difficulties mentioned above to establish general correlations, the variation of the values of the deformation potential within specific families of materials can usually be explained based on the character of the valence and the conduction states.

Methods

Calculations

The initial survey of the dataset was performed with density-functional theory at the level of the Perdew–Burke–Ernzerhof (PBE) generalized gradient approximation69. This is arguably the most used approximation for the ab initio study of solids, but leads to band gaps that are around a factor of two too small when compared to experiment70,71. To have a more reliable estimation of the band gaps, we also provide band structures obtained with the modified Becke–Johnson functional (mBJ)72. This approximation, when combined with the local-density approximation for the correlation term, was shown to yield accurate band-gaps70,71. Of course, this does not mean that deformation potentials are also more accurate in the mBJ than with the PBE, but we see that both functional turn out to yield similar results for most of the systems studied (see also Section I of the Supporting Information). This points to the fact that deformation potentials appear to be more robust properties than band gaps with respect to the theoretical approach used.

All calculations were performed within the projector-augmented wave formalism73, using the Vienna ab-initio simulation package (VASP version 5.4.4)74. We use the set of pseudo-potentials recommended by the Materials Project database49. In the present work we opted not to include Hubbard corrections, as general purpose values of U might not transfer into quality improvements in the band structure and we are not in the capacity to extensively benchmark these parameters. The effect of this choice is also minimized thanks to the removal of compounds containing magnetic elements (see below). The band gaps were calculated as the difference of the Kohn-Sham eigenvalues from self-consistent calculations. Our energy cut-off was set to 520 eV and we used a Γ centered k-point grids with 8000 k-points per reciprocal atom. Inclusion of spin–orbit coupling leads to an average reduction of the bandgap of 0.1 eV71. Since this value is much smaller than the typical error coming from the choice of exchange-correlation functional, we opt to neglect this effect in our calculations. Moreover, as we consider differences of band gaps at different strains, we expect the error due to the neglect of spin–orbit coupling to cancel to a large extent.

Dataset preparation

To build our dataset, we selected semiconductors from the Materials Project49 containing less than 13 atoms in the unit cell. Using these criteria we could gather 6552 systems. We then removed all systems containing magnetic elements (Fe, Ni, Cr, Mn, and lanthanides except La) and electronic band gap smaller than 0.01 eV, obtaining a dataset including 5338 systems. All ground-state geometries were taken directly from the Materials Project database49 and were not further optimized. A table containing the Materials Project ID, together with our calculated gaps and deformation potentials can be found as Supplementary Information. The dataset presents a good sampling of the periodic table although the frequency of each chemical element is not homogeneous (see Supplementary Information). The chalcogens (particularly oxygen) are the most represented group, followed by the halogens, pnictogens, and alkali metals. Overall, this distribution is representative of the distribution of stable semiconductors present in the Materials Project49.