Electronic Temperature

I often get asked questions about electronic temperature and smearing, and so I decided to put here a few notes - detailed background reading can also be found in my PhD thesis on Ab-initio Molecular Dynamics for Metallic Systems (start with Chap. 4, and maybe follow it with Chap. 3).

Why an electronic temperature: The main reason to introduce an electronic temperature is to improve convergence with respect to Brillouin zone sampling in metals. At zero temperature, occupations drop abruptly from 1 (or 2, if spin unpolarised) to 0 as the corresponding eigenvalues become larger than the Fermi energy. Since key physical quantities like the charge density and the total energy are a sum on all occupied states (in a periodic system this means an integral over all k-points k, and a sum over all bands n, where each state nk is multiplied by its occupation number (thus the sum over n can be done only on bands that have non-zero occupations)), in a metal at zero temperature we end up integrating functions that drop abruptly to zero when the corresponding band crosses the Fermi energy.

Integrating discontinuous functions requires very fine meshes, and an electronic temperature smoothes the occupation function, that can then be integrated with higher accuracy for a given computational cost (i.e. given a certain sampling of the Brillouin zone). The price to pay for this more accurate integration is that one is integrating the wrong energy functional - not anymore the 0T energy functional, but one that corresponds to a much higher electronic temperature (easily thousands of degrees, see later).

Note that there is a side effect to introducing an electronic temperature that can be also beneficial to tame the so-called level crossing instabilities - as you converge towards self-consistency, the energy eigenvalues e_nk move around, and whenever one goes from below the Fermi energy to above the Fermi energy, or viceversa, it would become suddenly empty, or suddenly full. Its contribution to the charge density disappears or appears abruptly, and this can be a large perturbation to your system. These instabilities would also be present e.g. in a molecular system, as states evolve towards self consistency. A finite electronic temperature "smooths" these instabilities, since an occupation will not abruptly jump to 0 as the energy level evolves towards higher values, but it will smoothly decrease - the higher the temperature the smoother this is.

To me, this side effect of easing electronic convergence had always been an unsatisfactory reason to introduce electronic temperatures, and one that didn't address the problem directly - instead one should have robust algorithms that converge to the ground state no matter what, and this was the reason to develop the ensemble-DFT algorithm (Phys. Rev. Lett. 79, 1337 (1997)). Ensemble-DFT is the rigorous variational approach to minimisation and to converge robustly to the ground state metallic systems (or systems where the functional is not invariant for unitary transformations - such is the case e.g. for orbital-density dependent functionals, as in the Perdew-Zunger self-interaction correction or Koopmans' compliant functionals). On the other hand, most electronic-structure codes do not employ direct minimisation techniques, but use instead iterative approaches (that can be faster, if less robust), and electronic-temperature definitely helps.

Which electronic temperature: First, let's say that an electronic temperature or a smearing of the density of states are absolutely equivalent - occupation numbers become fractional, and the correct energy functional to consider is not anymore the total energy, but the total free energy E-TS. Chap. 4 above discusses in detail all these issues, but it's important to make sure you are dealing with the correct functional. One exception is the case of atoms and molecules - e.g. at zero temperature degenerate states could get fractional occupations (e.g. the boron atom would have ⅓ of an electron in each of the 3 p orbitals), and these fractional occupations give rise to a non-zero entropy.

So, which temperature (i.e. which occupation function)? The natural choice would be the physical Fermi-Dirac distribution, and this would lead to the grand-canonical extension to DFT done by Mermin (Phys. Rev. 137, A1441 (1965)). Of course, a temperature-dependent exchange-correlation functional should be employed, if real, finite electronic temperatures were sought for - but the temperature dependence of this latter one is mild enough that one could just use standard approximations. There are two issues with using real Fermi-Dirac distributions - in order to ameliorate k-point sampling, one needs to use very large values (e.g. ~0.1-0.5 eV), corresponding to thousands of degrees, and the distribution has long tails, that means that one needs to calculate a relatively large number of states that slowly decay to zero occupation. Better then to use a fictitious smearing (e.g. Gaussians) in the density of states, as done by Fu and Ho (Phys. Rev. B 28, 5480 (1983)) and Needs, Martin and Nielsen (Phys. Rev. B 33, 3778 (1986)).

Methfessel and Paxton generalized the idea of Gaussian smearing (Phys. Rev. B 40, 3616 (1989)) to improve on the conundrum I mentioned before - i.e. that we are integrating more accurately the incorrect smeared functional. Thanks to the introduction of a Hermite polynomial expansion of Dirac's deltas in the density of states, one could keep a reasonably smooth occupation function, but remove the systematic error introduced by the smearing. In practice, a second-order Hermite polynomial (the common choice nowadays) has an occupation function that is still very smooth, but provides an energy functional that is much closer, at a given temperature, to the original zero-temperature total energy (and much easier to determine accurately, since much fewer k-points are needed).

The connection between a smeared density of states and an electronic temperature has been made explicitly by De Gironcoli (Phys. Rev. B 51, 6773 (1995)), while De Vita and Gillan introduced the idea that generalized free energy functionals can be constructed directly from a given occupation function, in itself defined as an integral of a smeared delta (A. De Vita, The Energetics of Defects and Impurities in Metals and Ionic Materials from First Principles, Ph.D. Thesis, University of Keele (1992)). A detailed discussion of this approach is also found in Chap. 4 above.

The concept of generalised free energy allows to put on a firm conceptual footing the idea of smearing the density of states - different choices for the smearing of the delta function in the density of states give rise to different free energy functionals. There is a smearing function that gives rise to Fermi-Dirac occupations, and the Mermin functional; a Gaussian smearing gives rise to Gaussian occupations, and a Gaussian free energy functional; a Methfessel-Paxton smearing gives rise to generalised Gaussian occupations and a generalised Gaussian free energy functional. The advantage of the Methfessel-Paxton functional (let's call it MP) then can be seen from the fact that it depends only quartically on the temperature - i.e. at a finite temperature, the Gaussian free energy is quadratic in temperature, while the MP free energy is quartic. So, at a given temperature, MP resembles much more closely the zero-temperature limit that we want (all the generalised free energy functionals converge to the same limit at zero temperature), but can be calculated with full sampling accuracy with much fewer k-points than the zero-temperature limit would need (due to a very discontinuous occupation function).

A similar approach has been introduced by De Vita and Gillan (J. Phys. Cond. Matt. 3 6225 (1991)), where the estimate for the zero-temperature total energy is also obtained from a finite temperature calculation (hence requiring much less sampling) - this estimate is given by the average of the total free energy at temperature T and the total energy at temperature T, i.e. by E-½ TS. Since the Gaussian or Fermi Dirac free energies are quadratic in T, and the respective total energies are also quadratic, but with equal and opposite coefficient, the semi-sum has no quadratic dependence on T (and it's actually quartic). At variance with the MP functional, E-½ TS is not a proper functional, so the Hellman-Feynman forces one calculates are not its derivatives (they are the derivatives of the functional being minimised, i.e. the free energy). As a plus, though, once can never have negative occupation numbers (in MP, occupation numbers can become lower than zero, or larger than 1).

While occupation numbers larger than 1 are less worrisome, negative occupation numbers can be conceptually problematic. Think at a molecule approaching a metal surface - the empty LUMO will start getting occupied, but in MP the occupations will be initially negative, leading to an orbital in vacuum that has negative occupations - something we do not really know how to deal with. For this reason, I constructed a smearing of the delta function that was forced to have always positive occupations, but that shared with MP the desirable consequence that the E-TS free energy functional that it led to had zero quadratic dependence on temperature (it's cubic, while MP is quartic). This is the technique of cold smearing, described again in Chap. 4 above, and in its final form in (Phys. Rev. Lett. 82, 3296 (1999)) - hence its common name of Marzari-Vanderbilt, or, more correctly, Marzari-Vanderbilt-DeVita-Payne smearing (all these concepts came from long discussions with Alessandro De Vita).

Last, Verstraete and Gonze have generalised the cold smearing idea above (Phys. Rev. B 65, 035111)) to construct "cold-smeared" schemes that target not the zero-temperature distribution, but any other choice - e.g. a finite-temperature Fermi-Dirac one. This is very relevant when e.g. one wants to describe the true electronic temperature, and this latter is different enough from the zero-temperature case (remember that 1 eV = 11600 K).

An instructive test of the above ideas would require plotting the equation of state for a simple metal (e.g. Al) at different smearings. If one were to use Fermi-Dirac or Gaussian smearing, very rapidly a lot of pressure would build up, from the expansion of the electron gas, and the equilibrium lattice parameter would change. Using 2MP or cold smearing, the change in the lattice parameter is much smaller, and so one case use large smearings (and relatively inexpensive k-point samplings) without introducing incorrect physical effects.

How do I find the right smearing parameters - temperature and k-point sampling? To do this, you need to focus on which quantity you want to describe accurately. Typically, a good choice are forces on atoms, in a system where forces are significant (e.g. not zero by symmetry - I'd use a ~10% distortion from equilibrium). Then, one plots forces on an atom as a function of smearing/temperature; these force-as-a-function-of-T curves are done for different, ever-denser k-point samplings. For large temperatures, the results will be independent of k-sampling - i.e. all k-point samplings are sufficient to get statistically accurate results. At lower temperatures, the curves will start separating from each other - the first one to go is the one with the smaller sampling, since at smaller temperature it will have the largest integration errors. So, one needs to decide what is the error on the forces that can be tolerated, choose a smearing such that the error in the forces is within tolerance, and for that smearing choose the smallest k-point sampling that correctly integrates the Brillouin zone.

Figure 1: Force acting on an iron atom in a 2-atom unit cell, plotted as a function of smearing and for different Monkhorst-Pack samplings of the Brillouin Zone (different colored curves). The 2-atom simple cubic cell breaks symmetry with a displacement along the (111) direction by 5% of the initial 1NN atomic distance. Here we use a PAW pseudo potential (pslibrary 0.2.1) with a PBE parametrization for the XC functional and Marzari-Vanderbilt smearing.

Figure 2: Same as above, but with an ultrasoft pseudo potential (pslibrary 0.2.1) and different k-point sampling (the colour caption refers to the dots, not the lines).