Baixe ab initio total energy calculations e outras Notas de estudo em PDF para Engenharia de Produção, somente na Docsity! Iterative minimization techniques for ab initio total-energy calculations:
molecular dynamics and conjugate gradients
M. C. Payne
Cavendish Laboratory, Madingley Road, Cambridge, CB3 OHE, United Kingdom
M. P. Teter and D.C. Allan
Applied Process Research, Corning Incorporated, Corning, New York 14837
T. A. Arias and 3. D. Joannopoulos
Depariment of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
This article describes recent technical developments that have made the total-energy pseudopotentisl the
most powerful ab inítio quantum-meckanica! modeling method presentiy available. In addition to preseni-
ing technical details of the pseudopotential method, the articte aims to heighten awareness of the capabili-
ties of the method in order to stimulate its application to as wide a range of problems in as many scientific
disciplines as possible,
CONTENTS
1. Introduction
II. Total-Energy Pseudopotentiat Caleutations
A.
B.
E.
F
a
H
Overview of approximations
Electron-electron interactions
1. Exchange and correlation
2. Density-functional theory
a. The Kobn-Sham energy functional
b. Kohn-Sham equations
e. Local-density approximation
Periodie supercelis
1. Bloch's theorem
2. k-point sampling
3. Plane-wave basis sets
4. Plane-wave representation of Kohn-Shami equa-
tions
5. Nonperiodio systems
Eleciron-ion interactions
1. Pscudopotential approximation
a. Norm conservation
b. Generation procedure
e. Convergence properties
d. Plane-wave basis sets
2. Structure factor
3, Total jonic potential
Ton-ion interactions
Computationat procedure with conventional matrix
diagonalization
Drawbacks of conventional procedure
Alternative methods
HI. The Molecuiar-Dyramics Melhod
A.
B
Eigenvalue solution by successive “improvement” of
a trial wave function
Molecular-dynamics procedure
1. Molecular-dynamios Lagrangian
2. Constraints
Molecular-dynamics equations of motion
1. Unconstrained equations of motion
2. Constrainod equations of motion
3. Partially constrained equetions of motion
Integration of equations of motion
1. The Verlet algorithm
2. Stability of the Verlet algorithm
Reviews of Modern Physics, Vol. 84, No. 4, October 1992
nv.
E. Orthogonalization of electronic wave functions
F. Damping
G. Self-consisteney
H. Kobn-Sham eigenstates
L Computational procedure with molecular dynamics
7. Instabilities and fuctuations in the Kohn-Sham en-
ergy
K. Computational cost of molecular dynamics
. Caleulation of charge density
Construction of Hamiltonian matrix
. Aocelerations of the coefficients
. Integration of equations of motion
. Orthogonalization
. Comparison to conventional matrix diagonatiza-
tions
7. Loca! pseudopotentinls
Improvements in Algorithms
A. Improved integration
1. Analytic integration of second-order equations
of motion
2. Analytic integration of first-order equations of
motion
B. Orthogonalization of wave functions
nupenn
1, The Gram-Schmidt scheme
2. Convergence to Kohn-Sham eigenstates
€. Comparison between algorithms
D. Remaining dificulties
- Direct Minimization of the Kokn-Sham Energy Func-
tional
A. Minimization of a function
1. The method of steepest descents
2. The conjugate-gradients technique
B. Application of the conjugate-gradients method
1. The update of a single band
. Constraimis
. Preconditioning
. Conjugate directions
. Search for the encrgy minimum
. Calculational procedure
7. Computational cost
€: Speed of convergence
1. Large energy cutof
2. Long supercells
3. A real system
ausuwn
Copyright 16:1992 The American Physical Society
1069
1070
1070
1070
1072
1072
1072
1072
1074
1074
1075
1076
1076
1076
1077
1077
1045
1046 M. C. Payne et at: Ab initio iterative minimization techniques
YL Choice of ínitial Wave Functions for the Electronic
States 1078
A. Convergence to the ground siate 1078
1. Spanning the ground state 1078
2. Symmetry conservation 1079
B. Rate of convergence 1079.
€. Specific initial configurations 1079
1. Reduced basis sets 1079
2. Symmetrized combinations of plane waves 1079
3. Random initial configurations 1080
4. Random initial velocities 1080
VFL Retaxation of the Ionic System togo
A, The Car-Parrinello Lagrangian 1081
B. Equations of motion 1081
C. The Hellmano-Feyoman theorem 1081
1. Errors in Hellmano-Feynmem forces 1082
2. Consequences of the Hellmani-Feynman
theorem 1082
D. Pulay forces 1082
E. Pulay stresses 1083
F. Local energy minimization 1083
1. Local energy minimization with the molecular-
dynamics method 1083
2. Local energy minimization with the conjugate-
gradients method 1083
G. Global energy minimization 1083
1. Simulated annealing 1084
2. Monte Carlo methods 1084
3. Location of low-energy configurations 1085
VII: Dynamical Simulations 1085
A. Limitations to dynamical simvulations 1085
B. Accuracy of dynamical trajectories 1085
€. Error cancellation in dynamical simulations 1086
D. Car-Parrinello dynamics; constraints 1087
E. Conjugate-gradients dynamics 1088
F. Comparison of Car-Parrinello and conjugate-
gradient dynamics 1089
G. Phonon frequencies 1089
1X, Nonlocul Pseudopotentiuls 1090
A. Kieinman-Bylander potentials 1091
1. Enhanced projections 1091
2. Computational cost 1091
B. Vanderbilt potentials 1091
€. Real-space projection; nonuniqueness of spherical
harmonic projection 1092
D. Potentials for real-space projection 1093
X. Parallel Computers 1094
XL Concluding Remarks 1095
Acknowledements 1096
References 1096
À. INTRODUCTION
There is littie doubt that most of low-energy physics,
chemistry, and biology can be explained by the quantum
mechanics of elecrrons and ions. The limits of applicabil-
ity and even the interpretation of the predictions of
modern quantum theory are lively areas of debate
amongst philosophers. Questions such as “How do we
interpret the probabilistic nature of wave functions?”
“What constitutes a measurement?,” “How much can we
ever know about the state of a system?,” and “Can quan-
tum mechanics describe consciousness?” are of funda-
mental importance. Despite the fact that these questions
Fev. Mod. Phys., Vol. 64, No. 4, October 1992
are still debated, it is clear that whether or not a more
complete description of the world is possible, those
things that modern quantum theory does predict are pre-
dicted with incredible accuracy. One outstanding exam-
ple of this accuracy is the calculation of the gyromagnet-
ic ratio of the electron, which agrees with the experimen-
tal result to the limit of the measurement, some 10
significant figures. Quantum theory has also proven
correct and provided fundamental understanding for a
wide variety of phenomena, including the energy levels of
atoms, the covalent bond, and the distinction between
metals and insulators. Further, in every instance of its
application to date, the equations of quantum imechanics
have yet to be shown to fail. There is, therefore, every
reason to believe that an understanding of a great num-
ber of phenomena can be achieved by continuing to solve
these equations.
As we shall soon see, the ability of quantum mechanics
to predict the total energy of a system of electrons and
nuclei, enables one to reap a tremendous benefit from a
quantum-mechanical calculation. Tn fact this entire arti-
cle is dedicated to just this one type of quantum-
mechanical calculation, the foundation for which is quite
strong. The quantum-mechanical rules, or Hamiltonians,
for calculating the total energy of simple one-atom sys-
tems have provided some of the most precise tests of the
theory, and the rules for calculating the energies of more
complicated systems are simple, straightforward exten-
sions of these atomic Hamiltonians. It is therefore em-
inently reasonable to expect quantum mechanics to pre-
dict accurately the total energies of aggrepates of atoms
as well. So far, this expectation has been confirmed time
and time again by experiment.
A few moments thought shows that nearly all physical
properties are related to total energies or to differences
between total energies. For instance, the equilibrium lat-
tice constant of a crystal is the lattice constant that mini-
mizes the total energy; surfaces and defects of solids
adopt the structures that minimize their corresponding
total energies. If total energies can be calculated, any
physical property that can be related to a total energy or
to a difference between total energies can be determined
computationally. For example, to predict the equilibri-
um lattice constant of a crystal, a series of total-energy
calculations are performed to determine the total energy
as a function of the lattice constant. As shown in Fig. 1,
the results are then plotted on a graph of energy versus
lattice constant, and a smooth curve is constructed
through the points. The theoretical value for the equilib-
rium lattice constant is the value of the lattice constant at
the minimum of this curve. Total-energy techniques also
have been successfully used to predict with accuracy
equilibrium lattice constants, bulk moduli, phonons,
piezoelectric constants, and phase-transition pressures
and temperatures (for reviews see Cohen, 1984; Joanno-
poulos, 1985; Pickett, 1989).
Part of our aim in this article is to introduce the use-
fulness of quantum total-energy techniques to a broad
M. C. Payne etal.: Ab ínitio iterative minimization techniques 1049
study of a larger variety of chemical systems than was
previously possible.
This article provides a detailed description of the
total-energy pseudopotential method, the molecular-
dynamics method, and conjugate-gradient minimization.
The article also discusses related techniques and develop-
ments.in a number of areas that have played a role in in-
creasing the computational efficiency of these methods.
Tt is hopcd that the information presented here is
sufficiently detailed and at the leading edge of the work
being done to be useful to scientists working both in and
outside the ficld of ab initio quantum-mechanical calcula-
tions.
H. TOTAL-ENERGY PSEUDOPOTENTIAL
CALCULATIONS
This section describes the total-energy psendopotential
method. An extremely useful review of the pseudopoten-
tial method can be found in articles by Thm et al. (1979)
and Dentencer and van Haeringen (1985). These articles
are essential reading for anyone intending to implement
codes for total-energy pseudopotential calculations.
Totalenergy calculations can only be performed if a
large number of simplifications and approximations are
used. These simplifications and approximations are de-
scribed in the following sections.
A. Overview of approximations
Prediction of the electronic and geometric structure of
a solid requires calculation of the quantum-mechanical
total energy of the system and subsequent minimization
of that energy with respect to the electronic and nuclear
coordinates. Because of the large difference in mass be-
tween the electrons and nuclei and the fact that the
forces on the particles are the same, the electrons
respond essentially instantaneously to the motion of the
nuclei. Thus the nuclei can be treated adiabatically, Jead-
ing to a separation of electronic and nuclear coordinates
in the many-body wave function—the so-called Born-
Oppenhcimer approximation. This “adiabatic principle”
reduces the many-body problem to the solution of the dy-
namics of the electrons in some frozen-in configuration of
the nuclei.
Even with this simplification, the many-body problem
remains formidable. Further simplifications, however,
can be introduced that allow total-energy calculations to
be performed accurately and efficiently. These include
density-functional theory to model the electron-electron
interactions, pseudopotentia! theory to model the
electron-ion interactions, supercells to model systems
with aperiodic geometries, and iterative minimization
techniques to relax the electronic coordinates.
Very briefly, the essential concepts are the following:
ti) Density-functional' theory (Hohenberg and Kohn,
Fev. Mod. Phys.. Vol. 84, No. 4, October 1992
1964; Kohn and Sham, 1965) allows one, in principle, to
map exactly the problem of a strongly interacting elec-
tron gas (in the presence of nuclei) onto that of a single
particle moving in an effective nonlocal potential. Al-
though this potential is not known precisely, local ap-
proximations to it work remarkably well. At present, we
have no a priori arguments to explain why these approxi-
mations work. Density-functional theory was revitalized
in recent years only because theorists performed total-
energy calculations using these potentials and showed
that they reproduced a variety of ground-state properties
wilhin a few percent of experiment. Thus the acceptance
of local approximations to density-functional theory has
only emerged, a posteriori, after many successful investi-
gations of many types of materials and systems. General-
ly, total-energy differences between related structures can
be believed to within a few percent and structural param-
eters to at least within a tenth of an À. Cohesive ener-
gies, however, can be in error by more than 10%.
(ii) Pseudopotential theory (Phillips, 1958; Heine and
Cohen, 1970) allows one to replace the strong electron
ion potential. with a much weaker potential —a
pseudopotential —that describes all the salient features of
a valence electron moving through the solid, including
relativistic effects. Thus the original solid is now re-
placed by pseudo valence electrons and pseudo-ion cores.
These pseudoelectrons experience exactly the same po-
tential outside the core region as the original electrons
but have a much weaker potential inside the core region.
The-fact that the potential is weaker is crucial, however,
because it makes the solution of the Schrôdinger equation
much simpler by allowing expansion of the wave func-
tions in a relatively small set of plane waves. Use of
plane waves as basis functions makes the accurate and
systematic study of complex, low-symmetry configura-
tions of atoms much more tractable.
(iii) The superccll approximation allows one to deal
with aperiodic configurations of atoms within the frame-
work of Bloch's theorem (see Ashcroft and Mermin,
1976). One simply constructs a large unit cell containing
the configuration in question and repeats it periodically
throughout space. By studying the properties of the sys-
tem for larger and larger unit cells, one can gauge the im-
portance of the induced periodicity and systematically
filter it out. This approach has been successfully tested
against “exact” Koster-Slater Green's-function methods
(see Baraff and Schinter, 1979), which are only tractable
for very-high-symmetry configurations.
(iv) Finally, new iterative diagonalization approaches
(Car and Parrinello, 1985; Payne et al. 1986; Williams
and Soler, 1987; Gillan, 1989; Stich es aí. 1989; Teter
et al., 1989) can be used to minimize the total-energy
functional. These are much more efficient than the tradi-
tional diagonalization methods. These new methods al-
low expedient calculation of ionic forces and total ener-
gies and significantly raise the level of modern total-
energy calculations. These methods are the subject of
Secs. III, IV, and V.
1050
B. Electron-electron interactions
The most difficult problem in any electronic structure
calculation is posed by the need to take account of the
effects of the electron-electron interaction. Electrons re-
pel each other due to the Coulomb interaction between
their charges. The Coulomb energy of a system of elec-
trons can be reduced by keeping the electrons spatially
separated, but this has to balanced against the kinetic-
energy cost of deforming the electronic wave functions in
order to separate the electrons. The effects of the
electron-electron interaction are briefly described below.
1. Exchange and correlation
The wave function of a many-clectron system must be
antisymmetric under exchange of any two electrons be-
cause the electrons are fermúons. The antisymmetry of
the wave function produces a spatial separation between
electrons that have the same spin and thus reduces the
Coulomb energy of the electronic system. The reduction
in the energy of the electronic system due to the antisym-
metry of the wave function is called the exchange energy.
K is straightforward to include exchange in a total-
energy calculation, and this is generally referred to as the
Hartree-Fock approximation.
The Coulomb energy of the electronic system can be
reduced below its Hartree-Fock value if electrons that
have opposite spins are also spalially separated. In this
case the Coulomb energy of the electronic system is re-
duced at the cost of increasing the kinetic energy of the
electrons. The difference between the many-body energy
of an electronic system and the energy of the system cal.
culated in the Hartree-Fock approximation is called the
correlation energy (see Fetter and Walecka, 197, It is
——w—Ww—w———>——>—w>) A
Elbh=25 Sw | É ivete [Viateminata SF
where Ein is the Coulomb energy associated with in-
toractions among the nuclei (or ions) at positions [R,),
Vin is the static total electron-ion potential, n(r) is the
electronic density given by
n()=23 ly), (2.2)
and Esc[n(r)) is the exchange-correlation functional.
Only the minimum value of the Kohn-Sham energy
functional has physical meaning. At the minimum, the
Kohn-Sham energy functional is equal to the ground-
state energy of the system of electrons with the ions in
positions f
Rev. Mod. Phys., Vol, 84, No. 4, October +992
M. C. Payne etat: Ab inítio iterative minimization techniques
extremely difficult to calculate the correlation energy of a
complex system, although some promising steps are be-
ing taken in this direction using quantum Monte Carlo
simulations of the electron-gas dynamics (Fahy et aí.,
1988; Li et al., 1991). At present these methods are not
tractable in total-energy calculations of systems with any
degree of complexity, and alternative methods are re-
quired to describe the effects of the electron-electron in-
teraction,
2, Density-functional theory
Density-fanctional theory, developed by Hohenberg
and Kohn (1964) and Kohn and Sham (1965), provided
some hope of a simple method for describing the effects
of exchange and correlation in an electron gas. Hohen-
berg and Kohn proved that the total energy, including
exchange and correlation, of an electron gas (even in the
presence of a static external potential) is a unique funo-
tional of the electron density. The minimum value of the
total-energy functional is the ground-state energy of the
system, and the density that yields this minimum value is
the exact single-particle ground-state density. Kohn and
Sham then showed how it is possible, formally, to replace
the many-electron problem by an exactly equivalent set
of self-consistent one-electron equations. For more de-
tails about density-functional theory see vor Barth
(1984), Dreizler and da Providencia (1985), Jones and
Gunnarson (1989), and Kryachko and Ludena (1990).
a. The Kohn-Sham energy functional
The Kohn-Sham total-energy functional for a set of
doubly occupied electronic states 1; can be written
atra (ro)
rr dir dr +Excln (1) + Ega RD,
(2.1)
b. Kohn-Sham equations
It is necessary to determine the set of wave functions
Y that minimize the Kohn-Sham energy functional.
These are given by the self-consistent solutions to the
Kohn-Sham equations (Kohn and Sham, 1965):
=“
2m
VA Vit DA PDA Preto) [lr espe),
(2.3)
where 1; is the wave function of electronic state i, e; is
the Kohn-Sham eigenvalue, and V; is the Hartree poten-
tial of the electrons given by
M. G. Payne eta/.: Ab initio iterativo minimization techniques 1051
Patri=e? [Rir dr . (2.4)
The exchange-correlation potential, Vyc, is given formal-
Iy by the functional derivative
8Eveln(r))
Sa(r)
The Kohn-Sham equations represent a mapping of the
interacting many-electron system onto a system of nonin-
teracting electrons moving in an effective potential due to
all the other electrons. If the exchange-corretation ener-
gy functional were known exactly, then taking the func-
tional derivative with respect to the density would pro-
duce an exchange-correlation potential that included the
effects of exchange and correlation exactly.
The Kohn-Sham equations must be solved self-
consistently so that the occupied electronic states pgen-
erate a charge density that próduces the electronic poten-
tial that was used to construot the equations. The sum of
the single-particle Kohn-Sham eigenvalues does not give
the total electronic energy because this overcounts the
effects of the electron-electron interaction in the Hartree
energy and in the exchange-correlation energy. The
Kohn-Sham eigenvalues are not, strictly speaking, the en-
ergies of the single-particle electron states, but rather the
derivatives of the total energy with respect to the occupa-
tion numbers of these states (Tariak, 1978). Nevertheless,
the highest ocenpied eigenvalue in an atomic or molecu-
lar calculation is nearly the unrelaxed ionization energy
for that system (Perdew et ai., 1982).
The Kohn-Sham eguations are a set of eigenequations,
and the terms within the brackets in Eq. (2.3) can be re-
garded as a Hamiltonian. The bulk of the work involved
in a total-energy pseudopotential calculation is the solu-
tion of this eigenvalue problem once an approximate ex-
pression for the exchange-correlation energy is given.
Vrctr 2.5)
c. Locat-density approximation
The Hohenberg-Kohn theorem provides some motiva-
tion for using approximate methods to describe the
exchange-correlation energy as a function of the electron
density. The simplest method of describing the
exchange-correlation energy of an electronic system is to
use the local-density approximation (LDA; Kohn and
Sham, 1965), and this approximation is almost universal-
ly used in total-energy pseudopotential calculations.
the local-density approximation the exchange-correlation
energy of an electronic system is constructed by assum-
ing that the exchange-correlation energy per electron at a
point r.in the electron gas, exctr), is equal to the
exchange-correlation energy per electron in a homogene-
ous electron gas that has the same density as the electron
gas at point r. Thus
EsclitoI= [exelonirdr (2.6a)
and
Rev. Mod. Phys., Vol. 64, No. 4, October 1892
SExelntr)] din(rexelr)]
ôn(r) dar) (2.6b)
with
exelr=ele in (o). (2.60)
The local-density approximation assumes that the
exchange-correlation energy functional is purely local.
Several parametrizations exist for the exchange-
correlation energy of a homogeneous electron gas
CWigner, 1938; Kohn aud Sham, 1965; Hedin and
Lundqvist, 1971; Vosko et ai., 1980; Perdew and Zunger,
1981), aH of which lead to total-energy results that are
very similar. ' These parametrizations use interpolation
formulas to link exact results for the exchange-
correlation energy of high-density electron gases and cal-
culations of the exchange-correlation energy of inter-
mediate and low-density electron gases.
The local-density approximation, in principle, ignores
corrections to the exchange-correlation energy at à point
1 due to nearby inhomogeneities in the electron density.
Considering the inexact nature of the approximation, it is
remarkable that calculations performed using the LDA
have been so successful. Recent work has shown that
this success can be partially attributed to the fact that the
local-density approximation gives the correct sum rule
for the exchange-correlation hole (Harris and Jones,
1974; Gunnarsson and Lundqvist, 1976; Langreth and
Perdew, 1977. A number of attempts to improve the
LDA, for instance by using gradient expansions, have not.
shown any improvement over results obtained using the
simple LDA. One of the reasons why these “improve-
ments” to the LDA do so poorly is that they do not obey
the sum rule for the exchange-correlation hole. Methods
that do enforce the sum rule appear to offer a consistent
improvement over the LDA (Langreth and Mehi, 1981,
1983).
The LDA appears to give a single well-defined global
minimum for the energy of a non-spin-polarized system
of electrons in a fixed ionic potential. Therefore any en-
ergy minimization scheme will locate the global energy
minimum of the electronic system. For magnetic materi-
als, however, one would expect to have more than one lo-
cal minimum in the electronic energy. If the energy
functional for the electronic system had many local mini-
ma, it would be extremely costly to perform totalenergy
calculations because the global energy minimum could
only be located by sampling the energy functional over a
large region of phase space.
C. Periodic supercelis
In the preceding section it was demonstrated that cer-
tain observables of the many-body problem can be
mapped into equivalent observables in an effcctive
single-particle problem. However, there still remains the
formidable task of handling an infinite number of nonin-
teracting electrons moving in the static potential of an
1054 M. C. Payne etal: Ab initio iterative minimization techniques
FIG. 4. Schematic illustration of & supercell geometry for a
molecule. Same convention as in Fig. 2.
crystal slab and a vacuum region. The supercelk is re-
peated over all space, so the total encrgy of an array of
crystal slabs is caleulated. To ensure that the results of
the caleulation accurately represent an isolated surface,
the vacuum regions must be wide enough so that faces of
adjacent crystal slabs do not interact across the vacuum
region, and the crystal slab must be thick enough so that
the two surfaces of each crystal slab do not interact
through the bulk crystal.
Finally, even molecules can be studied in this fashion
Toannopoulos et al, 1991), as illustrated in Fig. 4.
Again, the supercell needs to be large enough so that the
interactions between the molecules are negligible.
D. Electron-ion imeractions
4. Pseudopotential approximation
Although Bloch's theorem states that the electronic
wave functions can be expanded using a discrete set of
plane waves, a plane-wave basis set is usually very poorly
suited to expanding electronic wave functions because a
very large number of plane waves are needed to expand
the tightly bound core orbitals and to follow the rapid os-
cillations of the wave functions of the valence electrons in
the core region. An extremely large plane-wave basis set
would be required to perform an all-electron calculation,
and a vast amount of computational time would be re-
quired to calculate the electronic wave functions. The
pseudopotential approximation (Phillips, 1958; Heine
and Cohen, 1970; Yin and Cohen, 1982a) allows the elec-
tronic wave functions to be expanded using a much
smaller number of plane-wave basis states.
Jt is well known that most physical properties of solids
are dependent on the valence electrons to a much greater
extent than on the core electrons. The pseudopotential
approximation exploits this by removing the core elec-
trons and by replacing them and the strong ionic poten-
tial by a weaker pseudopotential that acts on a set of
Rev. Mod. Phys., Vol. 64, No, 4, October 1892
pseudo wave functions rather than the true valence wave
functions. An ionic potential, valence wave function and
the corresponding pseudopotential and pseudo wave
function are illustrated schematically in Fig. S. The
valence wave functions oscillate rapidly in the region oc-
cupied by the core electrons due to the strong ionic po-
tential im this region. These oscillations maintain the
orthogonality between the core wave functions and the
valence wave functions, which is required by the ex-
clusíon principle. The pseudopotential is constructed,
ideally, so that its scattering properties or phase shifts for
the pseudo wave functions are identical to the scattering
properties of the ion and the core electrons for the
valence wave functions, but in such a way that the pseu-
do wave functions have no radial nodes in the core re-
gion. In the core region, the total phase shift produced
by the ion and the core electrons will be greater by 7, for
each node that the valence functions had in the core re-
gion, than the phase shift produced by the ion and the
vatence electrons. Outside the core region the two poten-
tials are identical, and the scattering from the two poten-
tials is indistinguishable. The phase shift produced by
the ion core is different for each angular momentum
component of the valence wave function, and so the
scattering from the pseudopotential must be angular
momentum dependent. The most general form for a
pseudopotential is
=5 im)VXiml,
Im
Vy (2.11)
where im) are the spherical harmonics and V, is the
pseudopotential for angular momentum !. Acting on the
electronic wave function with this operator decomposes
the wave function into spherical harmonics, each of
which is then multiplied by the relevant pseudopotential
Fo
FIG. 5. Schematic illustration of all-electron (solid lines) and
pseudoelectron (dashed lines) potentials and their correspond-
ing wave functions. The radius at which all-electron and pseu-
doelectron values match is designated r..
M.C. Payne etal.: Ab initio iterative minimization techniques 1055
A pseudopotential that uses the same potential for all
the angular momentum components of the wave function
is called a local pseudopotential. A local pseudopotential
is a function only of the distance from the nucleus. It is
possible to produce arbitrary, predetermined phase shifts
for each angular momentum state with a local potential,
but there are limits to the amount that the phase shifts
can be adjusted for the different angular momentum
states, while maintaining the crucial smoothness and
weakness of the pseudopotential. Without a smooth,
weak pseudopotential it becomes difficult to expand the
wave functions using a reasonable number of plane-wave
basis states.
a. Norm conservation
In total-energy calculations, the exchange-correlation
energy of the electronic system is a function of the elec-
tron density. If the exchange-correlation energy is to be
desired accurately, it is necessary that outside the core
regions the pseudo wave functions and real wave func-
tions be identical, not just in their spatial dependences
but also in their absolute magnitudes, so that the two
wave functions generaté identical charge densities. Ad-
justment of the pseudopotential to ensure that the in-
tegrals of the squared amplitudes of the real and the
pseudo wave functions inside the core regions are identi-
cal guarantees the equality of the wave function and
pseudo wave function outside the core region. One of the
first attempts to construct pseudopotentials of this type
was by Starkloff and Joannopoulos (Joannopoulos et al.
1977, Starkloff and Joannopoulos 1977). They intro-
duced a class of local pseudopotentials that described the
valence energies and wave functions of many heavy
atoms aceurately.
Of course, in general, the scattering from the ion core
is best described by a nonlocal pseudopotential that uses
a different potential for each angular momentum com-
ponent of the wave function. Various groups (Redondo
et al. 1977; Hamamn et aí. 1979; Zunger and Cohen,
1979; Kerker, 1980; Bachelet et aí., 1982; Shirley et al,
1989) have now introduced nonlocal pseudopotentials of
this type that work extremely well. Moreover, as pointed
out by Hamann, Schluter, and Chiang (1979), a match of
the pseudo and real wave functions outside the core re-
gion aiso assures that the first-order energy dependence
of the scattering from the ion core is correct, so that the
scattering is accurately described over a wide range of en-
ergy. A method for the construction of pseudopotentials
that corrects even the higher-order energy dependence of
the scattering has recently been introduced by Shirley
et al. (1989). Local and nonlocal pseudopotentials of
these types are currently termed ab initio or norm con-
serving aud are capable of describing the scattering due
to the ion in a variety of atomic environments, a property
referred to as rransferability.
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
b. Generation procedure
The typical method for generating an ionic pseudopo-
tential for an atom of species q, v, is illustrated in Fig. 6
and proceeds as follows. All-electron calculations are
performed for an isolated atom in its ground state and
some excited states, using a given form for the exchange-
correlation density functional. This provides valence
electron eigenvalues and valence electron wave functions
for the atom. A parametrized form for the ionic pseudo-
potential is chosen. The parameters are then adjusted, so
that a pseudoatom calculation using the same form for
exchange-correlation as in the all-electron atom gives
both pseudowave functions that match the valence wave
functions ouiside some cutoff radius 7, and pseudoeigen-
values that are equal to the valence eigenvalues. The ion-
ic pseudopotential obtained in this fashion is then used,
without further modification, for any environment of the
atom. The electronic density in any new environment of
the atom is then determined using both the ionic pseudo-
potential obtained in this way and the same form of
exchange-correlation functional as employed in the con-
struction of the ionic pseudopotential. A generalization
of this pseudopotential construction procedure for solu-
tions of the atom that are not normalizable has recently
been introduced by Hamann (1989).
Finally, it should be noted that ionic pseudopotentials
are constructed with r, ranging typically from one to two
times the value of the core radius. It should also be not-
ed that, in general, the smaller the value of r,, the more
“transferable”” the potential, (The entire procedure for
solving the problem of a solid, given an ionic pseudopo-
tential, is outlined in Sec. ILF.)
Sole all-electron.
sigenvalues & wave functions]
Select a paramerized form
for the pseudopotentialvtr)
Áre pseudosigenvalues”
equal to fheali-electron
valence eigenvalves ?,
generated
FIG. 6. Flow chart describing the construction of an ionic
pseudopotential for am atom.
1056 M. GC. Payne etal: Ab inítio iterative minimization techniques
e. Convergence properties
The replacement of the truc ionic potential by a weak-
er pseudopotential allows the electronic wave functions
to be expanded using far fewer plane-wave basis states
than would be needed to expand the wave functions in a
full ionic potential. The rapid oscillations of the valence
wave functions in the cores of the atoms have been re-
moved, and the small core electron states are no longer
present. The pseudopotential approximation has a num-
ber of othcr advantages in addition to reducing the num-
ber of plane-wave basis states needed to expand the elec-
tronic wave functions. The removal of the core electrons
means that fewer electronic wave functions have to be
calculated, More importantly, the total energy of the
valence electron system is typically a thousand times
smaller than the energy of the all-electron system. The
difference between the electronic energies of different ion-
ic configurations appears almost totully in the energy of
the valence electrons, so that the accuracy required to
determine energy differences between ionic configura-
tions in a pseudopotential calculation is much smaller
than the accuracy required in an all-electron calculation.
The energy differences are just as large when only the
valence electrons are included in the calculation, but the
total energies are typically a thousand times smaller in
the pseudopotential calculation than in the all-electron
caleulation. But, of course, the total energy is no longer
meaningful. Only differences have meaning.
d. Plane-wave basis sots
One property of a pseudopotential that is not incor-
porated into the standard generation procedure is the
value of the cutoff energy required for lhe plane-wave
basis sets, Obviously, the smaller this value, the smaller
the basis set required for any particular caleulation, and
the faster the calculation. A number of approaches to
this problem have been adopted. Some authors add addi-
tional constrainis in the process of generating the pseu-
dopotential which are intended to produce a more rapid-
ly convergent potential (Trouillier and Martins, 1991).
Altermnatively, a separate optimization procedure can be
applied after generating a pseudopotential using one of
the standard techniques (Rappe et aí., 1990; Rappe and
Joannoponlos, 1991). A rather more radical approach
has been suggested by Vanderbilt (Vanderbilt, 1990;
Laasonen et aí., 1991), which involves relaxing the norm
conservation of the pseudopotential. This approach will
be described more fully in Sec. IX.B.
2. Structure factor
The total ionic potential in a solid is obtained by plac-
ing an ionic pseudopotential at the position of every ion
im-the solid. The information about the positions of the
ions is contained in the structure factor. The value of the
Rev. Mod. Phys.. Vol. 64, No. 4, October 1992
structure factor at wave vector G for ions of species a is
given by
SAGI=S expliG-R;], 212
4
where the sum is over the positions of all the ions of
species a in a single unit cell.
The periodicity of the system restricts thc nonzero
components of the ionic potential to reciprocal-attice
vectors. Hence it is only necessary to calculate the struc-
ture factor at the set of reciprocal-lattice vectors.
3. Total ionic potential
The total ionic potential Vi, is obtained by summing
the product of the structure factor and the pseudopoten-
tial over all the species of ions. For example, for a tocal
potential Vi, is given-by
Vol S SAGAS) . (2.13)
a
At farge distances the pseudopotential is a pure
Coulomb potential of thc form Z/r, where Z is the
valence of the atom. On taking the Fourier transform,
one finds that the pseudopotential diverges as Z/G” at
smali wave vectors. Therefore the total ionic potential at
G=0 is infinite, so the electron-ion energy is infinite.
However, there are similar divergences in the Coulomb
energies due to the electron-electron interactions and the
ion-ion interactions. The Coulomb G=0 contributions
to the total energy from the three interactions cancel ex-
actly. This is not surprising because there is no Coulomb
potential at G=0 in a charge-nentral system, and so
there cannot be a contribution to the total energy from
the G=0 component of the Coulomb potential.
The pscudopotential is, however, not a pure Coulomb
potential and therefore not exactly proportional to Z/G”
for small G. There is a constant contribution to the pseu-
dopotential at small G, equal to the integral of the
difference between the pure Coulomb Z /r potential and
the pseudopotential. This constant for species a is
dane fUZ/7— vir artar, (2.14)
where vê is the pseudopotential for the 1=0 angular
momentum state. This integral is nonzero only within
the core region because the potentiais arc identical out-
side the core region.
There is no contribution to the total energy from the
Z+G? component of the pseudopotential at G=0 be-
cause of the cancellation of the infinities in the electron-
ion, electron-electron, and ion-ion energies. However,
the non-Coulomb part of the pseudopotential at G=0
does contribute to the total energy. The contribution is
equal to
Na! E Navecore » (2.15)
a
where Ny is the total number of electrons in the system,
M. C. Payne et at: Ab inítio iterativo minimization techniques 1059
3 cho tómiHldn) ZA» (3.3
and the constraint of normalization requires that
SleaP=1. (3.4)
a
The values of the coefficients c, can be varied subject
to the constraint of normalization until the minimum
value for the expectation value of the Hamiltonian is
reached. This minimum value gives an upper bound for
the ground-state energy of the Hamiltonian.
The variational theorem gives an upper bound to the
ground-state energy of the Hamiltonian. However, the
difference between the minimum value of the expectation
value and the true ground-state energy of any given
Hamiltonian is due to the lack of completeness in the
basis set fd). The eigenstate and the eigen-energy ob-
tained by using the variational theorem are exact in the
space of the basis set. Diagonalization of the Hamiltoni-
an matrix in the Hilbert space of the same basis states
would yield an identical solution for the lowest-energy
eigenstate.
The variational principle can be applied to obtain an
estimate for the energy of the next-lowest-energy eigen-
state of the Hamiltonian by using a trial wave function
that is orthogonal to the ground-state wave function.
The eigenstate and eigen-energy obtained for the second
eigenstate will again be identical to those calculated by
diagonalizing the Hamiltonian matrix in the Hilbert
space of the same basis functions. A third eigenstate can
be obtained by using a trial wave function that is orthog-
onal to the ground state and to the first excited state.
This process can be repeated until all of the eigenstates
have been obtained. The essential point is that the varia-
tional principle can be used to obtain eigenstates that are
exact in the Hilbert space of the basis set used in the cal-
culation. The molecular-dynamics method is essentially
a dynamical method for applying the variational princi-
ple, in which the eigenstates of all the lowest-energy elec-
tronic states are determined simultaneousiy.
B. Molecular-dynamics procedure
The molecular-dynamics method will be introduced by
a description of its application to a system in which the
positions of the ions and the size of the unit cell remain
fixed. The calculation can then be directly compared to a
total-energy calculation performed using conventional
matrix diagonalization techniques. In the traditional
molecular-dynamics approach a system of classical parti-
cles with coordinates [X,) interact through an interac-
tion potential V([X,)). If the configuration of minimum
energy is required, the system is started at a high temper-
ature, and the temperature is gradually reduced until the
particles reach a configuration [X,lo that minimizes V.
This procedure is illustrated schematically in Fig. 8.
In the Car-Parrinello scheme the Kohn-Sham energy
Rev. Mod. Phys., Val. 84, No. 4, October 1992
Voo)
FIG. 8. Schomatic illustration of annealing procedure in molec-
ular dynamics. The system is started at a high temperature
with total energy E,. The trajectory at this energy allows the
system to sample a large amount of phase space. As the system
is gradually cooled to E,, Es, ete., it settles down to a minimum
energy configuration.
functional E [fc;)] is a function of the set of coefficients
of the plane-wave basis set fc;). Each coefficient c; can
te regarded as the coordinate of a classical “particle.”
To minimize the Kohn-Sham energy functional, these
“particles” are given a kinetic energy, and the system is
gradually cooled until the set of coordinates reaches the
values fc;jo that minimize the functional. Thus the
problem of solving for the Kohn-Sham eigenstates is re-
duced to one of solving for a set of classical equations of
motion. It should be emphasized, however, that the
Kohn-Sham energy functional is physically meaningful
quantum mechanicaily only when the coefficients take
the values fc; lo.
1. Molecuiar-dynamics Lagrangian
Car and Parrinello formulated their method in the
language of molecular dynamics. Their essential step was
to treat the electronic wave functions as dynamical vari-
ables. A Lagrangian is defined for the electronic system
as follows:
LS utbildo) Ely), Robo taç o (3.5)
where ps is a fictitious mass associated with the electronic
wave functions, E is the Kohn-Sham energy functional,
R, is the position of ion F, and «, define the size and
shape of the unit cell. The kinetic-energy term in the La-
grangian is due to the fictitious dynamics of the electron-
ic degrees of freedom. The Kohn-Sham energy function-
al takes the place of the potential energy in a convention-
al Lagrangian formulation.
2. Constraints
The electronic wave functions are subject to the con-
straints of orthonormality,
1060 M. €. Payne etal: Ab initio iterative minimization techniques
Ieampndr=s,. (3.6)
“Fhese constraints are incorporated in the molecular-
dynamics Lagrangian by using the method of Lagrange
multipliers. The molecular-dynamies Lagrangian be-
comes
L=Eutiildo Eli (Riba)
+53 Ay | [ese og 37
bj
The Lagrange multipliers A, ensure that wave functions
remain normalized, while the Lagrange multipliers A,,
(17) ensure that the wave functions remain orthogonal.
As described below, the Lagrange multipliers may be
thought of as providing additional forces acting on the
wave functions, which ensure that the wave functions
remain orthonormal.
€C. Molecular-dynamics equations of motion
The equations of motion for the electronic states are
derived from the Lagrange equations of motion,
d [9L aL
Sá l=dãs (3.8
de adt | out
which give
Ê (3.9)
nb Hb HE At»
7
where H is the Kohn-Sham Hamiltonian.
The force —(H%,) is the gradient of the Kohn-Sham
energy functional at the point in the Hilbert space that
corresponds to the wave function y,. The Lagrange mul-
tipliers add forces A;tb; to the force —(H,). These
forces ensure that the electronic wave functions remain
orthonormai as they propagate along their molecular-
dynamics trajectories. A general discussion of the conse-
quences of various orthonormalization schemes is given
by Broughton and Khan (1989).
4. Unconstrained equations of motion
The constraints of orthonormality play a crucial role
in the evolution of the electronic states in the molecular-
dynamics method. To illustrate the importance of these
constraints, we consider the evolution of the electronic
states in the absence of any constraints:
ma>—[H —o ly,
where H is the Kohn-Sham Hamiltonian, and o is an en-
ergy shift that defines the zero of energy.
TÉ 1, is expanded in the basis set of the eigenstates of
Hamiltonian H,
= Ginbno
(3.10)
SID
Rev. Mod. Phys. Vol, 84, No. 4, October 1892
and if Eq. (3.11) is substituted into (3.10), the following
equation of motion for the coefficient of the cigenstate É,
is obtained:
HE Eno Je» GI
where £, is the eigenvalue corresponding to cigenstate
E,- Integration of these equations of motion, under the
assumption that the velocities of the coefficients are ini-
tially zero, gives the coefficients at time t as
=cin(Ocosf[(e, —0)/n]'2t), en>o, (3.139)
(3.13b)
cilb=e,n(Ocosh(le,—ol/u)2t], en<o
Here c,, (0) are the initial values of the coefficients.
It can be seen that the amplitudes of the coefficients of
the eigenstates with energies greater than o osciltate with
time, while the amplitudes of the coefficients of the cigen-
states wilh energies less than o increase with time. Ifo
is chosen to be larger than the lowest-energy eigenvalue,
then all the electronic states that have c; (00 will con-
verge to the lowest-energy eigenstate £o, since the
coefiicient cio will increase faster than any other
coefficient. Therefore, under the unconstrained equations
of motion, the electronic wave functions remain neither
orthogona! nor normalized. The initial wave functions
will only converge to different eigenstates when the con-
straints of orthogonality are imposed.
2. Constrained equations of motion
The constrained molecular-dynamics
motion for the electronic states,
nb= HAS Ay;
4
equations of
(3.14)
ensure that the electronic wave functions remain ortho-
normal at every instant in time. The molecular-dynamics
evolution of the electronic wáve functions under these
equations of motion would also conserve the total energy
in the electronic degrees of freedom for the system of
fixed ions we assume for this section. However, to ensure
these properties, the values of the Lagrange multipliers
must vary continuously with time, and so implementa-
tion of this form of the molecular-dynamics equations re-
quires that the Lagrange multipliers be evaluated at
infinitely small time separations. To make the calcula-
tions tractable, variation of the Lagrange multipliers dur-
ing a time step is neglected and the Lagrange multipliers
are approximated by a constant value during the time
step. In this case the wave functions will not be exactly
orthonormal at the end of the time step, and a separate
orthonormalization step is needed in the calculation.
3. Partially constrained equations of motion
Since a separate orthonormalization step is required at
the end of each time step, it is possible to remove the
M. C. Payne etal.: Ab initio iterative minimization techniques 1081
constraints of orthogonality from the equation of motion
and use a partia!ly constrained equation of motion. The
constraints of orthogonality are then imposed after the
equations of motion have been integrated, and the
Lagrange multipliers for the constraints of normalization
Ay are approximated by the expectation values of the en-
ergies of the states, À,, where
A=ChilEp;) (3.15)
This leads to an equation of motion that has the form
ut LE A. (3.16)
With this equation of motion, the acceleration of an elec-
tronie state is always orthogonal to that state, a necessary
requirement to maintain normalization, and the accelera-
tion becomes zero when the wave function is an exact
cigenstate.
D. Integration of equations of motion
Once the accelerations of the coefficients have been
calculated, the equations of motion for the coefficients of
the plane-wave basis states have to be integrated. Car
and Parrinello used the Verlet algorithm (Verlet, 1967) to
integrate the equations of motion.
1. The Verlet algorithm
The Verlet algorithm is derived from the simplest
second-order difference equation for the second deriva-
tive, It gives the value of the ith electronic state at the
next time step, Y; (At), as
AD 20) WA AD +APYLO), (3.178)
where At is the length of the time step, 4;(0) is the value
of the state at the present time step, and :;( — At) is the
value of the state at the last time step. Substitution of
Eg. (3.16) into (3.17a) then gives
2
sAsO=20 (0-4 do ELE =, eo
(3.17b)
The Verlet algorithm introduces an error of order At*
into the integration of the equations of motion. A more
sophisticated finite-difference algorithm could be used to
integrate the equations of motion and hence reduce the
error in the integration to a higher order of At. In princi-
ple, for a given level of accuracy this would allow a
longer time step to be used in the integration of the equa-
tions of motion and hence reduce the total computational
time by reducing the number of time steps required to
perform the calculation. The maximum stable time step,
however, is not significantly increased with a higher-
order difference scheme. A more sophisticated finite-
difference equation would also require the values of the
coefficients or the corresponding accelerations from a
Rev. Mod. Phys., Vol. 84, No. 4, Octobor 1992
larger number of time steps in order to integrate the
equations of motion. It requires a large amount of
memory to store the wave-function coefficients and ac-
celerations for each time step in a total-energy pseudopo-
tential calculation. If the extra coefficients and accelera-
tions did not fit into core memory, the computation could
become 1/0 bound and the total time required for the
calculation may actually increase.
2. Stability of the Verlet algorithm
A general performance measure of algorithms of the
Verlet type is the rate at which they converge to
minimum-energy state. A piven problem normally re-
quires a certain amount of real time to converge, and the
computational effort is then determined by the size of the
time step Àt. In what follows it is demonstrated that the
largest At allowed for stability is related to the difference
between the largest and smallest eigenvalues of the sys-
tem.
Given the assumption that 1, is near the lowest-energy
eigenstate £y, the state 1), is expanded as in Eq. (3.11),
SEA E ADE, (3.18)
ao
where-the 3, represent infinitesimal coefficients. Substi-
tution of Eg. (3.18) into (3.17b) gives to first order in 8
ta
Ball) =28(0) Bl Ate)
[84 Eo]840)
(3.19)
In the standard stability analysis (see Mathews and Walk-
er, 1970), a constant growth factor g is introduced at
each time step, so that
SndD=g8 An — DAL). (3.20)
Substitution of Eq. (3.20) into (3.19) then gives
2
g"-2g +1+ “o (Ea Eng =0, (3.21)
and the real part of g can become greater than 1 if
2 1/2
dt> Ta (3.224)
teg—Eo)
Therefore the largest possible Ar that is allowed for sta-
bility must be
aro
, 3.22b)
(Emas E9)!2 é ,
where Emax is the largest eigenvalue of the problem.
For a Hamiltonian representation in a plane-wave
basis, the largest eigenvalue is primarily determined by
the cutoff kinetic energy of the basis set. Thus'the Verlet
algorithm will require the time step to be reduced as the
cutoff energy is increased. This problem is addressed
again in Sec. IV.A.
1064 M. C. Payne etal.: Ab initio iterative minimization techniques
FIG. 12. Illustration of instability in the molecular-dynamics
method and Kohn-Sham Hamiltonian as a consequence of a
large time step. The convention is the same as in Fig. 10. Note
that each iteration drives the coefficients further from their
equilibrium value.
tongest length of the supercell. Therefore, as supercells
become physically larger, the stability of the problem
eventually becomes dominated by “charge sloshing” con-
siderations, and the time step must be reduced 10 keep
the fluctuations small.
A related difficulty, which leads to a similar
phenomenon, arises if many states with nearly the same
eigenvalue exist in the neighborhood of the Fermi energy.
Under these circumstances, macroscopic ascillations in
electron density can occur with very little change in total
energy. The problem is particularly serious for metals.
Since small energy changes may yield large differences in
electron density, and hence in forces, close convergence
of the electrons to their ground state is necessary in me-
tallic systems.
Thus the largest stable time step in the Car-Parrincilo
algorithm is dominated by either the maximum kinetic
energy in the problem (as discussed in Sec. IIL.D.2) or the
need to limit charge sloshing. One of these two con-
siderations will place a practical limit on the size and
type of problem that can be attacked with the Car-
Parrinello algorithm. Despite these limitations, however,
there are many problems for which the method has out-
standing utility.
Finally, it should be noted that, even for an
infinitesimal time step, the Kohn-Sham energy will not
always decrease monotonically during the evolution of
the electronic system to its ground state. Insufficient
damping of the wave-function coefficients in the equa-
tions of motion, for example, will result in fluctuations of
the Kohn-Sham energy during this evolution, This is
simply the expected dynamical behavior of an under-
damped system and will not prevent the electrons from
eventuaily reaching their ground state.
K. Computational cost of molecular dynamics
The molecular-dynamics method provides a general
technique for solving cigenvalue problems. However,
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
this method was developed specifically in the context of
total-energy pseudopotential calculations, and in this sec-
tion the computational cost of using the molecular-
dynamics method. to perform total-energy pseudopoten-
tial calculations will be calculated. An important feature
of any computational method is the rate at which the
computational time increases with the size of the system,
and so particular attention will be paid to (he operations
that increase fastest as the size of the system increases.
In a total-cnergy pseudopotential calculation the wave
functions 4; are expanded in a plane-wave basis set so
that
HANS Gps gesplilk+G)r]. (3.25)
G
A feature of total-energy pseudopotential calculations
is that the number of plane-wave basis states used in the
calculations is always much larger than the number of
oceupied bands. The ratio is typically 100:1. This ratio
is independent of the size of the system: increasing the
size of the unit cell will increase the number of atoms in
the unit cell. However, the lengths of the reciprocal-
lattice vectors will be reduced, so that there will be more
plane waves with energies below the cutoff energy for the
plane-wave basis set. The ratio of the number of plane
waves to the number of occupied bands is considerably
larger when the system contains any transition-metal
atoms or first-row atoms. These atoms have strong pseu-
dopotentials, and much larger basis sets are needed to ex-
pand the electronic wave functions. Only the parts of the
total-energy calculation that scale faster than linearly
with the size of the system will be considered in the fol-
lowing discussion. The parts of the calculation that scale
linearly with the size of the system do not have large
“prefactors” associated with their computational cost, so
the computational time required for these parts of the
calculation is negligible. A calculation for a system that
has Na occupied bands in which the electronic wave
functions are expanded in a basis set containing Ney
plane waves will be considered. The operations described
below scale linearly with the number of k points so that a
calculation for a single k point will be considered.
1. Calculation of charge density
The charge density is most cheaply computed in real
space because it is simply the square of the magnitude of
the wave function. This requires that the wave functions
be Fourier transformed from reciprocal space to real
space. However, the charge density has components
with wave vectors up to twice the cutoff wave vector for
the electronic wave functions. Hence, to maintain a
faithful representation of the charge density, one must
compute it on a Fourier grid twice as dense in each spa-
tial direction as the grid required to provide a faithful
representation cf the wave function. Furthermore, the
plane-wave components of the wave function, which lie
within a sphere in reciprocal space, must be placed-in an
M.C. Payne eta!: Ab initio iterative minimization techniques 1085
orthorhombic grid of plane waves to perform the fast
Fourier transformation (FFT). Fourier transformation of
a single wave function from reciprocal space to real space
can be performed in Nery — Nesln(Nas) operations using
a fast Fourier transform algorithm, where Nas is the
number of points in the real-space grid. For the reasons
given above, Nas is of the order of 16 times the number
of plane waves in the wave function. Therefore the cost
of performing the fast Fourier transform for a single
band requires Nery = 16Npy/In(Npy) operations; the fac-
tor 16 is irrelevant in the logarithm. The total charge
density can then be calculated in Na Nprr operations.
2QDWwW>—>—— SS
»
nêgte= — [om k+GR-A,
“E HrelG-Gro E Polk +GR+O sete»
& g
with À; defined as in Eq. (3.15),
If Eq. (3.26) is used to calculate the accelerations of the
coefficients of the plane-wave basis states, NPy opera-
tions are required to calculate the accelerations for a sin-
gle wave function. This is the number of operations re-
quired to multiply the wave function 4; by the Kohn-
Sham Hamiltonian H. The accelerations of each of the
Np occupied bands must be calculated, so that a total of
Na N2w Operations are required to compute the accelera-
tions of the wave functions.
4. Integration of equations of motion
Integration of the equations of motion for the electron-
ic states requires Np N py operations.
5. Orthogonalization
The number of operations required to orthogonalize
the wave functions using Car and Parrinello's algorithm
is proportional to N$ Npy-
6. Comparison to conventional matrix
diagonalizations
The total computational time for the processes de-
scribed above is dominated by the Ns Ny operations re-
quired to calculate the accelerations of the wave func-
tions. This should be compared to the computational
cost of conventional matrix diagonalization techniques
for total-energy pseudopotential calculations, which is
dominated by the N2y operations required to diagonalize
the Kohn-Sham Hamiltonian. As the number of occu-
pied bands in a total-encrgy pseudopotential calculation
is so much smaller than the number of plane waves in-
cluded in the basis set, it appears that the molecular-
dynamics method offers a considerable increase in com-
Rev. Mod. Phys., Vol, 84, No. 4, October 1992
2. Construction of Hamiltonian matrix
The Kohn-Sham Hamiltonian is a matrix of dimension
Npw- The total number of elements in the matrix is N2y.
The number of operations to construct the matrix and
the storage required by the matrix both increase as Ny.
3. Accelerations of the coefficients
The molecular-dynamics equation of motion for the
coefficient of the plane-wave basis state with wave vector
k+G is obtained by substitution of Eq: (3.25) into (3.16)
and use of (2.3) for H. Integration over r then gives
Cro VulS— Gra
&
(3.26)
P>>—>—»
putational speed over matrix diagonalization techniques.
However, the number of time steps required to integrate
the equations of motion and converge the electron states
in the molecular-dynamics method is usually Jarger than
the number of iterations required to achieve self-
consistency in matrix diagonalization calculations.
Hence the equation of motion given in (3.26) does not
provide a significantly faster method for obtaining the
self-consistent eigenstates than conventional matrix diag-
onalization techniques. HH should also be noted that the
full Kohn-Sham Hamiltonian has to be stored, which re-
quires N2y words of memory. Therefore, even if the
molecular-dynamics method were faster than convention-
al matrix diagonalization methods, the memory reguire-
ment would make it difficult to perform calcuiations that
required extremely large plane-wave basis sets.
7. Local pseudopotentials
“The problems described above can be overcome by re-
placing the nonlocal ionic pseudopotentia! by a local
psendopotential. A local pseudopotential is a function
only of the distance from the nncleus or, eguivalentIy, is
a function only of the difference between the wave vec-
tors of the plane-wave basis states. Therefore use of the
same procedure as before to derive Eq. (3.26) results in
the equation of motion for the coefficient of the plane-
wave basis state at wave vector k+G with a local pseu-
dopotential
a » 2
Wis+o= — 2m e +6] —A;
Cik+a
“BS VUS-G eo» (3.27)
&
where Vy(G) is the total potential given by
VADE Vo OH PDA VrelO) (3.28)
1066 M.€. Payne etal: Ab ínitio iterative minimization techniques
The equation of motion can more usefulky be written as
a ”
HE G= — Fay Et GA, cura
— [lato tr) explilk+G]-neêr. (3.29)
“This form of the equation of motion shows that the mul-
tiplication of the wave function by the Kohn-Sham Ham-
iltonian can be divided into a part that is diagonal in re-
ciprocal space and a part that is diagonal in real space.
However in order to maintain a faithful representation of
the potentials and the product V;(r)hp;(r), one must per-
form the real-space multiplication on a double-density
Fourier transform grid (see Sec. I.K.1). This separation
procedure leads to a multiplication that can be per-
formed in just 17Npy operations: 16Npy for the multi-
plication of the wave function by the potential in real
space and Npy for the multiplication of the wave func-
tion by the Kinetic-energy operator in reciprocal space.
The computational cost of calculating the accelerations
of the wave functions is dominated by the Nrry opera-
tions required to Fourier-transform the wave function
from reciprocal space to real space and the Nprr opera-
tions required to transform the contribution to the ac-
celeration that is calculated in real space back to recipro-
cal space. The accelerations of all the plave-wave
coefficients for the Np occupied electron states can be
calculated in 2NpNrpr operations. Reduction of the
number of operations required to evaluate the accelera-
tions of the coefficients from N)NZy to 2N 5 Ny makes
the molecular-dynamies method very much faster than
conventional matrix diagonalization techniques for any
system, although the saving in lime is obviously much
greater for large systems. The samé technique can be
used with any iterative matrix diagonalization technique,
since all iterative matrix diagonalization techniques in-
volve the multiplication of trial wave functions by the
Hamiltonian matrix.
Another advantage of using local pseudopotentials in
total-energy pseudopotential calculations is that the
Hamiltonian “matrix” can be stored in 17Npy words of
memory by storing the potential-energy operator in real
space in 16Npy words and the kinetic-energy operator in
reciprocal space in Nry words. This allows extremely
large “matrices” to be stored with case.
Unfortunately, it is not possible to describe all atoms
with local pseudopotentials. It is important to use an
efficient scheme for applying nonlocal pseudopotentials if
the computationa! speed of the molecular-dynamics
method is to be retained. Efficient schemes have been
developed for applying nonlocal pseudopotentials in the
molecular-dynamics method and other iterative methods.
These schemes will be described in Sec. IX, but it should
be noted that the most efficient of these schemes requires
16N Ny operations to apply the operators for each
component of the nonlocal pscudopotential. This is less
than the cost of Fourier-transforming the wave functions
and. is less than the cost of orthogonalizing the wave
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
functions in any but the smallest of systems, and so these
operations dominate the computational cost irrespective
of whether local or nonlocal pseudopotentials are used.
ty. IMPROVEMENTS IN ALGORITHMS
In this section a number of relatively simple
madifications to the molecular-dynamics-based method
are introduced which offer significant improvements over
the original approach when calculating the electronic
ground state for a fixed ionic configuration. These im-
provements include methods that increase the computa-
tional speed of the calculations and methods that permit
the electrons to converge to the exact Kohn-Sham eigen-
states. As discussed in Sec. III, the possibility of obtain-
ing the latter is important, for it paves the way for stud-
ies of metallic systems. However, it should be noted that
these modifications are not generally useful when per-
forming dynamical simulations of the ionic system, for
reasons that will be discussed in Sec. VII. The remaining
problems in the calculation of the electronic ground state
for a static ionic configuration, problems that cannot be
overcome with the improvements described below, are
addressed in Sec. IV.D.
A. Improved integration
lt has already been pointed out in Sec. IILD that an at-
tempt to improve on the Verlet algorithm by using a
more sophisticated finite-difference technique for in-
tegrating the equations of motion may not produce any
increase in computational speed. This is because a more
sophisticated finite-difference equation typically requires
information from a large number of time steps in order 10
integrate the equations of motion. A large amount of
memory is generally needed to store this information.
Therefore, if sufficient core memory is not available, the
computation can involve an excessive number of input
and output operations.
An alternative technique for improving the integration
of the equations of motion, which does not require any
additional storage, is to calculate the change in the mag-
nitude of the acceleration of the coefficient during the
time step, as the values of the coeficients change. [Recall
that simple use of the Verlet algorithm (3.17) requires the
coefficients to remain fixed during a time step.] At first
thought, it might appear that the determination of the
time dependence of the coefficients during a time step ac-
tually necessitates a solution of the equations of motion
in the first place! This, however, turns out not to be the
case. An examination and manipulation of the form of
the equations of motion (3.26) and (3.27) reveals that it is
indeed possible to obtain a good approximation to the
time dependence of the coeficients during a time step.
The argument is as foliows (Payne et al., 1986).
With the assumption of a local pseudopotential for
M. C. Payne etal: Ab initio iterative minimization techniques 1069
mi level. Without Kohn-Sham eigenstates, the Fermi
level of a metallic system cannot be defined.
C. Comparison between algorithms
The simple modifications to the molecular-dynamics-
based method described in Secs. IILA and III.B above
can produce significant increases in computational speed.
This is illustrated in Fig. 13, where the evolution of the
total energy of an 8-atom cubic supercell of germanium
in the diamond structure is shown. Here the ions are
held fixed and the electrons are being iterated to conver-
gence. A local pseudopotential of the Starklof-
Joannopoulos type is used with a basis set of 4096 plane
waves. The open circles show the results obtained by us-
ing the Verlet algorithm to integrate the equations of
motion, while the filled circles show the results obtained
by using analytic integration of the equation of motion.
The use of the analytic expression to integrate the equa-
tion of motion allows the time step to be increased to six
times the value at which the Verlet algorithm becomes
unstable and gives convergence of the total energy in
one-tenth of the number of time steps.
D. Remaining difficulties
The molecular-dynamies method and the algorithm
improvements described above all become inéffective as
the size of the system increases. For example, in silicon,
only systems with supercell length scales less than about
50 À benefit from these modifications to the molecular-
dynamics method. For larger length scales the maximum
stable time step is completeiy dominated by the need to
suppress fiuctuations in the charge density, or charge
sioshing. As discussed earlier in Sec. THI.J, this is a conse-
quence of instabilities in the Kohn-Sham energy Hamil-
tonian which arise for very small reciprocal-lattice vec-
tors and which require intractably small time steps to
overcome. These instabilities present severe obstacles to
Total energy (ev)
o 2o 40 so so
Time steps
FIG. 13. Evolution of the total energy for an eight-atom cell of
germanium 'in the diamond structure. Open circles represent
the original scheme. The filled circles correspond to an analytic
integration of the equations of motion as described in the text.
Rey. Mod. Phys., Vol. 84, No. 4, Octobor 1992
future studies of large and more complex systems.
Clearly, some different and novel methods are required
to surmount the problems encountered in the regime of
large systems. One method that overcomes these
difficulties involves a direct minimization of the Kohn-
Sham total energy with a conjugate-gradients approach.
This is the subject of the next section.
V. DIREGT MINIMIZATION OF THE KOHN-SHAM
ENERGY FUNCTIONAL
At the end of Sec. IV, it was stated that all the algo-
rithms described so far in this article encounter
difficulties when performing calculations on large sys-
tems. The difficulties encountered can be attributed to
the discontinuous changes in the Kohn-Sham Hamiltoni-
an from iteration to iteration. There are an infinite num-
ber of Kohn-Sham Hamiltonians, each of which has a
different set of eigenstates. One of these sets of eigen-
states, the set generated by the self-consistent Kohn-
Sham Hamiltonian, minimizes the Kohn-Sham energy
functional. All the methods for performing total-energy
pseudopotential calculations described so far involve an
indirect search for the self-consistent Kohn-Sham Hamil-
tonian. The search procedure used in the molecular-
dynamics method was illustrated in Fig. 10. The discon-
tinuous evolution of the Hamiltonian can be cicarly seen
in this figure. In Sec. II.J it was shown that use of too
long a time step in the molecular-dynamics method can
lead to an unstable evolution of the Kohn-Sham Hamil-
tonian. This results in wave functions that move further
from the seif-consistent Kohn-Sham eigenstates at cach
time step, as illustrated in Fig. 12. Unfortunately, the
value of the critical time step at which the instability
occurs decreases as the size of the system increases. Tt is
always possible to ensure stable evolution of the electron-
ic configuration using the molecular-dynamics metbod,
but this is at the cost of using smaller time steps and
hence more computational time as the size of the system
increases. This problem is also present in all of the algo-
rithms described in Sec. III, since ali of these employ an
indirect search for the self-consistent Kohn-Sham Hamil-
tonian.
To perform a total-energy pseudopotential calculation
it is necessary to find the electronic states that minimize
the Kohn-Sham energy functional. As discussed above,
to perform this process indirectly, by searching for the
self-consistent Kohn-Sham Hamiltonian, can lead to in-
stabilities. These instabilities would not be encountered
if the Kohn-Sham energy functional were minimized
directly because the Kohn-Sham energy functional nor-
mally has a single well-defined energy minimum. A
search for this energy minimum cannot lead to instabíli-
ties in the evolution of the electronic configuration. In
this section, a computational method is introduced that
allows direct minimization of the Kohn-Sham energy
functional in a tractable and efficient manner. In Sec.
V.A, an introductory discussion is provided of two gen-
1070 M. C. Payne eta!.: Ab ínitio terative minimization techniques
eral methods that can be used for the minimization of
any function. Of these general methods, the conjugate-
gradients method is shown to be particularly promising.
The modifications and extensions to the conjugate-
gradients method that are needed in order to perform
total-energy pseudopotential calculations tractably and
stably for large supercells and large plane-wave kinetic-
energy cutoffs are described in Sec. V.B.
A. Minimization of a function
In this section two general methods are described that
can be used to locate the minimum of function F(x),
where x is a vector in the multidimensional space [a de-
tailed description of the techniques outlined in this sec-
tion can be found in the book by Gill, Murray, and
Wright (1981, p. 144)]. 1t will be assumed that the fune-
tion F(x) has a single minimum. Tf the function had
several minima the methods described here would locate
the position of the minimum “closest” to the initial sam-
pling point (stricily speaking, it would locate the
minimum in whose basin of attraction the initial sam-
pling point lies).
1. The method of steepest descents
In the absence of any information about the function
F(x), the optimum direction to move from the point x!
to minimize the function is just the steepest-descent
direction g! given by
1= 2F
5.1
dx (5.1)
Tt wili be assumed that the direction of steepest descent
at the point x! can be obtained from the negative of a
gradient operator G acting on the vector x! so that
gi=—Gx!. (5.2)
To reduce the value of the function F(x) one should
move from the point x! in the steepest-descent direction
e! to the point x/+blg!, where the function is a
minimum. This can be done by sampling the function
F(x) at a number of points along the line x!+bglin or-
der to determine the value of b at which F(x!+bg) is a
minimum. Alternatively, if the gradient operator G is ac-
cessible, the minimum value of the function aiong the
line x/+bg! can be found by locating the point where the
gradient of the function is orthogona! to the search direc-
tion, so that g'-G(x!+b!gl)=0, It should be noted that
this process minimizes only the value of the function
along a particular line in the multidimensional space. To
find the absolute minimum of the function F(x), one
must perform a series of such line minimizations. Thus
the vector x!+b'!g! is used as the starting vector for the
next iteration of the process. This next point is conven-
tionally labeled x?. (The superscripts label the iterations
of the minimization process.) The steps described above
Rev. Mod. Phys., Vol. 64, No. 4, October 1982
STEcPEST DescENIS
CONJUGATE GRADIENT
FIG. 14, Schematic illustration of two methods of convergence
to the center of an anisotropic harmonic potential. Top:
stespest-descents method requires many steps to converge. Bot-
tom: Conjngate-gradients method allows convergence in two
steps.
can be repeated to generate a series of vectors x” such
that the value of the function F(x) decreases at each
iteration. Hence F(x))<F(x*) for >k. Each iteration
reduces the value of the function F(x) and moves the tri-
al vector x” towards the vector that minimizes the func-
tion. This process is ilustrated schematically in the top
panel of Fig. 14.
Although each iteration of the steepest-descents algo-
rithm moves the trial vector towards the minimum of the
function, there is no guarantee that the minimum will be
reached in a finite number of iterations. In many cases a
very large number of steepest-descents iterations is need-
ed to get close to the minimum of the function. The
method of steepest descents performs particularly poorly
when the minimum of the function F(x) les in a long
narrow valley such as the one illustrated in Fig. 14. The
reason for the poor performance in this case is that each
steepest-descent vector is orthogonal to the steepes!
descent vector of the previous iteration. If the initial
steepest-descent vector does not lie at right angles to the
axis oí the valley, successivc vectors will point across
rather than along the valley, so that a large number of
iterations will be needed to move along the valley to the
minimum of the function. This problem is overcome by
using the conjugate-gradients technique.
2. The conjugate-gradients technique
E might seem surprising that there can be a better
method of minimizing a function than to move in the
direction in which the function decreases most rapidly.
The rate of convergence of the steepest-descents method
is limited by the fact that, after a minimization is per-
formed along a given gradient direction, a subsequent
M. C. Payne etat.: Ab initio iterative minimization techniques 1971
minimization along the new gradient reintroduces errors
proportional to the previous gradient. If the only infor-
mation one has about the function F(x) is its value and
gradient at a set of points, the optimal method would al-
low one to combine this information, so that each minim-
ization step is independent of the previous ones. To ac-
complish this, one must first derive the condition that
makes one minimization step independent of another.
For simplicity, consider a symmetric and positive-
definite function of the form
Flo) in-Gx, (5.3)
where G is the gradient operator defined in Eg. (3.2).
Consider now the minimization of F(x) along some
direction d! from some point x! The minimum will
oceurat x2=x! +b!g! where b! satisfies
(xl+oldl)-G-d'=0. (5.42)
This is obtained by differentiation of Eg. (5.3) with
respect to b! at x), A subsequent minimization along
some direction d? will then yield x'=x?+b?d?, where b?
satisfies
(xi +oidl + bd?) Ga? 0. (5.4b)
However, the best choice of b! and b?, for the minimiza-
tion of F(x) along d! and d?, is obtained from the
differentiation of Eq. (5.3) with respect to both bl and 5?
at x). This gives
tu +bld! + bd?) .G-d!=0 (5.54)
and
(xi+bldl ba?) G.a?=0 (5.5b)
KH is clear that in order for Egs. (5.4) and (5.5) to be con-
sistent, and consequentty for the minimization along d!
and d? to be independent, one must require that
d'-G-di=a2.Gdi=0. (5.6)
This is the condition that the directions d! and d? be con-
Jugate to each other (see Gill et al., 1981) and can be im-
mediately generalized to
d"-G-d"=0 for nZm. 5.7)
The conjugate-gradients technique provides a simple
and effective procedure for implementation of such a
minimization approach. The initial direction is taken to
be the negative of the gradient at the starting point. A
subsequent conjugate direction is then constructed from
a linear combination of the new gradient and the previ-
ous direction that minimized F(x). In a two-dimensional
problem, it is clear that one would need only two conju-
gate directions, and this would be sufficient to span the
space and arrive at the minimum in just two steps, as
shown at the bottom of Fig. 14. It is less clear, however,
that the current gradient and the previous direction vec-
tor would maintain all of the information necessary to in-
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
clude minimization over ail previous directions in a mul-
tidimensional space. The proof that directions generated
in this manner are indeed conjugate is the important re-
sult of the conjugate-gradients derivation. The precise
search directions d' generated by the conjugate-gradients
methad are obtained from the following algorithm:
de= grtyrgro!, (5.8)
where
, (5.9)
and y!=0.
Since minimizations along the conjugate directions are
independent, the dimensionality of the vector space ex-
plored ih the conjugate-gradients technique is reduced by
1at each iteration. When the dimensionality of the func-
tion space has been reduced to zero, there are no direc-
tions left in which to minimize the function, so the trial
vector must be at the position of the minimum. There-
fore the exact location of the minimum of a quadratic
function will be found, conservatively speaking, in a
number of iterations that is equal to the dimensionality of
the vector space, In practice, however, it is usually possi-
ble to perform the calculations so that far fewer itera-
tions are required to locate the minimum.
Another way of thinking about the difference between
the conjugate-gradients technique and the method of
steepest descents is that, in the method of steepest de-
scents, each direction is chosen only from information
about the function at the present sampling point. In con-
trast, in the conjugate-gradients technique the search
direction is generated using information about the func-
tion obtained from ali the sampling points along the
conjugate-gradients path.
The conjugate-gradients technique provides an efficient
method for locating the minimum of a general function.
This suggests that it should be a good technique for to-
cating the minimum of the Kohn-Sham energy function-
al It is important, however, to implement the
conjugate-gradients technique in such a way as to max-
imize computational speed, so that each iteration of the
method is not significantly more expensive than alterna-
tive techniques, and to minimize the memory require-
ment so that calculations are not limited by the available
memory. A conjugate-gradients method that fulfills
these criteria has heen developed by Teter et al. (1989).
This method is described in Sec. V.B below, Stich et al,
(1989) have used a conjugate-gradients method with the
indirect approach in order to search for the eigenstates of
the Kohn-Sham Hamiltonian. While this is an efficient
method for obtaining the eigenstates of the Hamiltonian,
it is expected, for reasons discussed earlier, that this
method will encounter difficulties when calculations are
performed on very large systems. The authors point out
that in these cases it will be necessary to use a direct
minimization method. Gillan (1989) has developed a
conjugate-gradients technique for directly minimizing the
1074
Pr Criar — E Cu lmfDda; (5.18)
sa
4. Conjugate directions
The conjugate-gradient direction is constructed out of
steepest-descent vectors as indicated in Eq. (5.8), With
the inclusion of preconditioning as described above, thc
preconditioned conjugate directions q? are given by
mam
pr=n"+yroi (5.19)
where
Curie)
TE Toma ema (5.20)
O
and y!=0.
The conjugate direction generated by Eg. (5.19) will be
orthogonal to all the other bands because it is construct-
ed from preconditioned steepest-descent vectors that are
orthogonal to all the other bands. However, the conju-
gate direction will nor be orthogonal to the wave function
of the present band. A further orthogonalization to the
present band should be performed and a normalized con-
jugate direction qp;” calculated as
r= gr Cyro,
Cgirlgjmyi2
6520)
9; (5.22)
5. Search for the energy minimum
The steps outlined above yield a preconditioned conju-
gate direction described by the normalized vector q”,
which is orthogonal to al! the bands. The [ollowing com-
bination of the present wave function 4)” and the precon-
ditioned conjugate vector q;”,
wr'coso-+ p;"sinO (O real), (5.23)
is a normalized vector that is orthogonal to all the other
bands 4; (45). Therefore any vector described by Eq.
(5.23) obeys the constraints of orthonormality required
for the electronic wave functions.
The conjugate-gradients technique requires that the
value of 9 that minimizes the Kohn-Sham energy func-
tional be found. The search for the position of minimum
energy could be performed by the caleulation of the
Kochn-Sham energy for various values of & until the
minimum is located. If the approximate location of the
minimum is not known, this would be a relatively expen-
sive search technique. As an alternative method for lo-
cating the minimum, the Kohn-Sham energy could be
written as a general function of 0,
E(O=Esg + E [A cos(2n0)+B sin(2n0)]. (5.24)
n=1
Rev. Mod. Phys. Vol. 64, No. 4, October 1992
M.C. Payne etaí.: Ab initio iterative minimization techniques
One piece of information, such as a total energy or a
gradient of the total energy, is required to evaluate each
term in this expression. As the summation in Eg. (5.24)
is infinite, any attempt to locate the minimum of the
Kohn-Sham energy functional using this expression
would be even more costly than searching for the
minimum by calculation of the Kohn-Sham energy for
various values of 9. However, it is found that the varia-
tion of the Kohn-Sham energy with 6 is very accurately
reproduced by just the 2 =1 term in the summation in
Eq. (5.24). The accuracy of the fit can be seen in Fig. 16.
Therefore the following expression for the Kohn-Sham
energy is sufficient to locate the minimum of the Kohn-
Sham energy functional:
E(0)=Esvg + Arcos(20)+B,sin(20) . (5.25)
“Three pieces of information are required to evaluate
the three unknowns in this expression. The value of the
total energy at 9=0, E(0), is already known. The gra-
dient of the energy with respect to 6 at 6=0 is given by
E
30 amo COLE + Cortiço
=2Re( (gr lHldro). (5.26)
Since H|y”) has been computed to determine the
steepest-descent vector nº”, the value of (3E/96)]0=0
can be computed cheaply. Therefore just one further
piece of information is required to determins all the pa-
rameters in Eq. (5.25). This conld be the second deriva-
tive of the Kohn-Sham energy functional at 9=0, or the
Kobn-Sham energy, or its derivative at any other value
of 6, Calculation of the second derivative of the Kohn-
Sham energy functional would require additional pro-
gramming effort in most pseudopotential codes, whereas
no adéitional programming effort is required to calculate
the Kohn-Sham energy or its derivative at a second value
of 9. Therefore we shall first describe the method that
uses the Kohn-Sham energy at a second value of 6.
The second value of 8 should be chosen so that the
-180
+20!
Nf cell)
-200]
-2:0)
Total energy
20d =
8(rad)
FIG. 16. The total energy of an 8-atom silicon supercell plotted
as a function of the parameter 6, which determines the propor-
tions of wave function and its gradient as described in the text.
The dots represent the exact calculation and the line is the
lowest harmonic contribution.
M. C. Payne etal: Ab initio iterative minimization techniques 1075
sampling point is far enough from 8=0 to avoid round-
ing errors but not so far from the origin that the estimate
of the curvature of the Kohn-Sham energy functional at
8=0 becomes inaccurate. In the later stages of the cal-
culation, shorter and shorter steps will be taken along the
conjugate directions, and so it is important that the cur-
vature of the Kohn-Sham energy functional at 8=0 be
accurately determined in order to locate the position of
the minimum to-high precision. It has been found that
computing the Kohn-Sham energy at the point 9=/300
gives reliable results. If the value of the Kohn-Sham en-
ergy at the point 9=7/300, E (7 /300), is computed, the
three unknowns in Eq. (5.25) are calculated as
DIE) s|2r
Elo [7230 loop Fl |
E >, am
I-cos |-ZE-
* [306
13%
230
, (5.28)
i-cos |22
300
- 198
1208 po” (5.29)
Once the parameters Es, 41 and By have been
determined, the value of 6 that minimizes the Kohn-
Sham energy function can be calculated. The stationary
points of the function (5.25) oceur at the points
B
Bom qranrioç . (5.30)
The value of Os that lies in the range O<8<7,/2 is the
required value, O min:
The calculation of an analytic second derivative of the
Kohn-Sham energy at 8=0 provides an elegant way of
determining the optimum step length, even though it
does require some additional programming. The re-
quired expression for the second derivative is
2E
a = u(gyrialoro = Corlalyr))
28 lg girtelo vrlaly;
+f[ Hartree term
+exchange-correlation term], (5.31)
where f is the product of the occupation number and the
k-point weight, the Hartree term is
al, cet ZReLgi to tur ar, (5.32)
unit ce
where
2 Relg;"ir brio)
a e. 533
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
Alternatively, the Hartree term can be written
A o elo
> 5.34,
Dó q (5.34)
where
=1 o.
= [ca SPÚGO)
X2Relgj" tr) type dr. (5.35)
The exchange-correlation term is
1 9Vyc
JA Relogio ERC qã .
DE Sai cl? RL rt (5.36)
The cost of computing the analytic second derivative is
the same as the cost of calculating the Kohn-Sham ener-
gyata trial value of 0.
The required value of 9, Gin: is determined using
(5.37)
The wave function used to start thc next iteration of
the conjugate-gradients procedure, W” +! is
EPT! = cos Omi) + Qi" SIN( ia) (5.38)
The new wave function generates a different charge den-
sity from that generated by the previous wave function,
and so the electronic potentials in the Kohn-Sham Ham-
iltonian must be updated before commencing the next
iteration.
6. Calculational procedure
The flow diagram in Fig. 17 illustrates the steps in-
volved in an update of the wave function of a single band
using the conjugate-gradients method. Eventually the
wave functions of all of the bands must be updated.
There is no point in converging a single band exactly if
large errors remain in the bands that have not yet been
updated. Thus no more than three or four conjugate-
gradients iterations should be performed on one band be-
fore moving to the next band. Once all of thc bands have
been updated, conjugate-gradients iterations are started
again on the lowest band. Rather than perform a fixed
number of conjugate-gradients iterations ou cach band,
one can perform conjugate-gradients iterations on one
band until the total energy changes by less than a partic-
ular value or by Jess than a given fraction of the change
of energy in the first conjugate-gradients iteration. Then
iterations are started on the next band. The convergence
criterion should be changed as the system moves towards
the minimum of the Kohn-Sham energy functional.
1076 M. C. Payne etal.: Ab inifio iterative minimization techniques
TRIAL WAVE FUNCTION FOR BAND
Colculate sleepest-descem vector
Orthagonalize to ali bands.
Precondition vector
Orthogondlize to al bands
Determine conjugate direction
Orthogonalize to present band
Calculate Kohn-Sham erergy ot
i triat value of O
REPEAT E
N TIMES Colcalnte volue of O that
UNTIL minimizes Kohr-Shom energy
CONVERGED functionot
FIG. 17. Flow diagram for the update of a single band in the
(direct-minimization) conjugate-gradients method.
After each sweep through the bands, the change in the
total energy at which conjugate-gradients iterations on a
band are stopped should be reduced. “The reasons for up-
plying the convergence criteria outiined in this section
and a discussion of the optimum choice of the parameters
is given by Arias et al. (1991).
7. Computational cost
The computational cost of performing a conjugate
gradients iteration on a single band is higher than the
cost of performing a molecular-dynamics time step on a
single band. In a molecular-dynamics calculation the
computational cost per iteration is dominated by the
three Fourier transforms for each band and the orthogo-
nalization. The number df operations required for each
band at each time step is 3Nyyr for the Fourier trans-
forms and Na Npy for the orthogonalization, where Npy
is the number of plane-wave basis states,
Nyrr = 16N pyilnNpy, and Ny is the number of occupied
bands. The computational cost of a conjugate-gradients
iteration is also dominated by the cost of performing
Fourier transforms and the cost of orthogonalizing the
wave functions. Only the steps in the conjugate-
gradients iteration that involve Fourier transforms or or-
thogonalizations will be considered in this section. All of
the other steps in the conjugate-gradients update of a sin-
gle band require only Npy operations and constitute a
negligible part of the computational effort.
The calculation of the steepest-descent vector in the
conjugate-gradicnts method is identical to the calculation
Rev. Mod. Phys., Vol, 64, No. 4, October 1992
of the accelerations of the wave functions in (he
molecular-dynamies method. Hence the computational
cost is dominated by the cost of performing two Fourier
transformations which require 2Nyy operations. If
preconditioning is applied, one orthogonalization of the
steepest-descent vector to all the bands and one orthogo-
nalization of the preconditioned steepest-descent vector
to all the bands must be performed. These two orthogo-
nalizations require 2NpNpy operations. Only a single
orthogonalization would be needed if preconditioning
were not applied. To calculate the Kobn-Sham energy at
the trial value of 9, one must transform the trial wave
function to real space, so that the charge density can be
computed, and then transform the charge density to re-
ciprocal space so that the Hartree energy can be calculat-
ed. These two Fourier transforms require 2Nrm opera-
tions. After the wave function is updated, the new
Kohn-Sham Hamiltonian must be calculated. The new
charge density can be computed directly from the previ-
ous wave function and the trial wave function, both of
which have already been transformed to real space, so
thai no extra Fourier transforms are required for this
operation. However, the new charge density must be
transformed to reciprocal space so that the new Hartree
potential can be computed, and the new Hartree poten-
tial must be iransformed back to real space. These two
Fourier transforms require 2N py Operations. Therefore
the total number of operations required to perform a
conjugate-gradienis update on a single band is 6Nrrr
operations for the Fourier transforms and 2NaNpy
operations for the orthogonalizations. Hence each
conjugate-gradients iteration reguires twice the number
of operations required by a molecular-dynamics time step
for a single band.
G. Speed of convergence
Tn this section the speed of convergence of methods
that directly minimize the Kobn-Sham Hamiltonian is
compared with the speed of convergence of the
molecular-dynamies method. The systems used for thesc
calculations were specificaliy chosen to highlight the
problems associated with the use of conventional
methods to perform Lotal-energy pscudopotential calcula-
tions. However, it is important to appreciate that these
systems are representativo of systems for which
molecular-dynamies calculations are currently being per-
formed.
1. Large energy cutof
Figure 18 shows lhe error in Lhe total energy against
iteration number for a calculation on an 8-atom unit cell
of silicon with a cutoil energy for the plane-wave basis set
of 32 rydherg. Although this cutoff energy is much
higher than the value actually needed for calculations on
silicon, it is typical of the cutoff energies required for the
M. €. Payne etaf.: Ab inítio iterative minimizaiton techniques 1079
energy plane waves as the initial states for a calculation
for the tetrahedral semiconductors will not yield the elec-
tronic ground state. In order to span the ground state,
the initial electronic configuration must not contain any
pairs of plane wáves from the [1,1,0) star of wave vec-
tors connected by [2,2,0) reciprocalattice vectors.
Only when three plane waves from the [1,1,1) star are
included among the initial states will the electronic states
relax to the required 16 valence-band states. If the
choice of lowest-cnergy plane waves as the initial states
were retained, 22 electronic states would have to be in-
cluded in the calcuiation before the electronic
configuration could relax to the ground state, and of
these, 6 states would end up being unoecupied.
The computational cost of ali of the iterative schemes
described in the previous sections increases at least
linearly with the number of electronic states included in
the calculation, and the cost of orthogonalizing the elec-
tronic wave functions increases as the square of the num-
ber of bands. Including unoccupied states in thê elec-
tronic relaxation reduces the advantage in computational
speed of the molecular-dynamics and conjugate-gradients
methods over conventional matrix diagonalization tech-
higues. It is sensible, therefore, to attempt to use the
smiallest possible number of electronic states in the calcu-
lation. However, it is essential that the initial electronic
states be able to converge to the ground state.
The problem with the calculation for an 8-atom cubic
cell of the diamond-structure materials is that there are
many reciprocal-lattice vectors at which the structure
factor is zero. Hence there is no lattice potential associ-
ated with these reciprocal-lattice vectors. If the ionic po-
tential is nonzero at every reciprocal-lattice vector, any
choice of plane waves for the initial electronic states will
span the ground state. The most common reason for the
structure factor to be zero at some reciprocal-lattice vec-
tors is symmetry. If the ionic configuration has no sym-
metry, any choice of initial electronic states should con-
verge to the ground state. Only if the system has some
symmetry must precautions be taken to ensure that the
initial electronic states span the ground state.
2. Symmetry conservation
There is another reason why a particular set of initial
states may not relax to the ground state when molecular-
dynamics equations of motion are used to evolve the elec-
tronic configuration. This is a constraint on the evolu-
tion of the electronic states that arises from symmetry
conservation. The effect is described in detail by Payne
et al. (1988), and only a brief outline of the problem will
be presented here.
The molecular-dynamics equations of motion and the
related first-order equations of motion for the electronic
wave functions will conserve any symmetry that is shared
by . the Hamiltonian and the initial electronic
configuration. This symmetry can be broken when the
Rey. Mod. Phys., Vol. 64, No. 4, October 1992
electronic wave functions are orthogonalized, but this de-
pends on the orthogonalization scheme used. In the
Gram-Schmidt orthogonalization scheme, the symmetry
is broken, whereas in many others, including the iterative
scheme by Car and Farrinello, it is not. Since the elec-
tronic ground-state configuration must have the same
symmetry as the Hamiltonian, one might not expect con-
servation of symmetry in the electronic configuration to
cause any problems. However, the initial electronic
configuration may not be able to propagate to the
ground-state configuration without breaking this symme-
try. Tf this:is the case, the electronic configuration will
not reach the ground state unless a symmetry-breaking
orthogonalization scheme is used. Symmetry breaking is
not necessary if random initial conditions are applied to
the electronic wave functions, as described in Secs.
VLC. and VLC.
B. Rate of convergence
Once it has been ensured that the initial electronic
wave functions can relax to the ground state, the next
most important consideration in choosing the wave func-
tions is to maximize the rate of convergence. In all itera-
tive matrix diagonalization techniques, new wave func-
tions are generated by successive improvements to the
previous wave functions. The closer the initial wave
functions are to the self-consistent Kohn-Sham eigen-
states, the fewer the number of iterations required to
reach the ground state of the electronic configuration.
However, it is extremely difficult to estimate the form of
all the Kohn-Sham eigenstares of a complex system.
Hence it will be necessary to use fairly poor approxima-
tions to the eigenstates as the initial states for most calcu-
hations.
€. Specific initial configurations
Selection of a set of initial wave functions is straight-
forward when the system to be studied is similar to a sys-
tem that has been studied previôusly. In this case the
converged wave functions for that system should be used
as the initial states. This is particularly useful when test-
ing for k-point convergence or for convergence as the
cutoff energy for the plane-wave basis set is increased.
The wave functions that were calculated with the previ-
ous set of k points or with the previous energy cutoff can
be used as the initial electronic states.
1. Reduced basis sets
KH is possible to obtain approximations to the Kohn-
Sham eigenstates by using conventional matrix diagonali-
zation techniques to find the eigenstales of the Kohn-
Sham Hamiltonian with a small number of basis states.
For most of the calculations that have been performed
using the molecular-dynamics and conjugate-gradients
1080 M. C. Payne etat.: Ab initio iterative minimization techniques
methods, this technique would yield a relatively large
matrix to diagonalize, even if a basis set consisting of
only a few plane waves per atom were used. The Hamil-
tonian matrix would have to be diagonalized several
times to achieve approximate self-consistency. Using this
method to obtain a set of initial states for a molecular-
dynamics or conjugate-gradients calculation might not
save any computational effort over a method that used
much poorer approximations to. the self-consistent
Kohn-Sham eigenstates but applied iterative matrix diag-
onalization methods throughout. To avoid the cost of di-
agonalizing the Hamiltonian matrix using conventional
matrix diagonalization techniques, one conlg use iterative
techniques to find the initial states with the smaller basis
set. However, the computationa! cost of the iterative
techniques increases only linearly with the number of
basis states, so that there may not bé much to be gained
by using a reduced number of basis states initially. If
most of the computation has to be performed using the
complete basis set, then there will be an insignificant sav-
ing in computational time if the calculation is started
with a reduced basis set.
2. Symmetrized combinations of plane waves
The choice of single plane waves for the initial elec-
tronic states in a molecular-dynamics or conjugate-
gradicnts calculation does not exploit the symmetry of
the system, The computational cost of such a calculation
could be reduced by using symmetrized combinations of
planc waves as the initial states for the electronic relaxa-
tion. Factoring the Hamiltonian matrix into submatrices
of different symmetry, which can be solved independent-
ly, drastically reduces the computational time from that
required using conventional matrix diagonalization tech-
niques. However, there is an insignificant time saving to
be gained by exploiting the symmetry of the system in the
molecular-dynamics or conjugate-gradients methods. A
significant saving in computational time could only be
achieved if the fast Fourier-transform routines were
rewritten for each symmetrized combination of basis
states, and this requires a considerable investment of pro-
gramming effort. Computational time can be saved in
the orthogonalization routine by using syrometrized com-
binations of plane waves as the initial states. States of
different symmetry are automatically orthogonal, and
only states with thc same symmetry have to be orthogo-
nalized. If there are N, states with one symmetry, N; of
another, and so on, orthogonalization requires
(NFANZA+ --- )Npy Operations rather Lhan lhe N$Npy
operations required if symmetrized combinations of
planc waves are not uscd. However, rounding errors will
tend to destroy the symmetry of the wave functions, and
so additional compuiational effort will be required to
resymmetrize them periodically.
The relatively small reduction in computational speed
for systems with low symmetry is one of the strengths
of the molecular-dynamics and conjugate-gradients
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
methods: The usc of symmetry is contrary to the spirit of
these methods. In ail of the calculations performed using
these techniques, the positions of the ions have been al-
lowcd to vary. IF the positions of the ions vary during
the calculation, the íonic configuration will spend most of
the time in regions of the phase space that have very low
symmetry. The use of symmetry was essential in the days
of primitive compnters, when conventional matrix diago-
nalization techniques were used to solve for the Kohn-
Sham eigenstates. Imposing symmetry onto a system
adds fictitious constraints to the motions of the ions and
restricts the relaxation of the ionic configuration. There
is no point in imposing symmetry in molecular-dynamics
or conjugate-gradients calculations because this does not
significantly increase the computational speed of these
techniques.
3. Random initial configurations
The initial electronic states for a molecular-dynamics
or conjngate-gradients calculation can be gencrated by
choosing random values for the coefficients of the plane-
wave basis states. This method ensures that the ground-
state is spanned by the initial states and that there is no
conserved symmetry in the initial electronic
configuration that might prevent the wave functions from
relaxing to the Kohn-Sham eigenstates. Ft is sensible to
give nonzero values only to the coefficients of plane-wave
basis states that have small cnergies, so that the initial
states do not have very high energies. With this precau-
tion the electronic states are unlikely to be significantly
further from eigenstates than simple plane waves, and
very few extra iterations will be required to converge to
the ground state.
4. Random initial velocities
In the molecular-dynamics method there are two de-
grees of freedom available for choosing the initial elec-
tronic configuration: the initial wave functions and thcir
velocities can be chosen arbitrarily. Adding random ve-
locities to the coefficients of the initial electronic states
avoids the problem of the initial configuration's not span-
ning the ground-state and not relaxing to the ground
state due to a conserved symmetry. It is sensible to limit
the kinetic energy of the initial wave functions so that
there is not too much excess energy in the electronic sys-
tem.
Vil. RELAXATION OF THE IONIC SYSTEM
Up to this point, the relaxation of the electronic
configuration to its ground state has been considered,
while the ionic positions and the size and shape of the
unit cell have been held fixed. Once these additional de-
grees of freedom are allowed to relax to equilibrium, new
features emerge. This procedure is much simpler than a
M.C. Payne etal.: Ab inítio iterative minimization techniques 1081
full dynamical simulation of the ionic system because
only the final state of the system (ions and electrons in
their minimum energy configurations) is of interest, and
the path towards this state is irrelevant. Hence errors
can be tolerated along the relaxation path. This is not
the case with a full dynamical simulation, where errors
must be carefully controlled at all points along the ionic
trajectories. The problems associated with full dynami-
cal simulations of the ionic system will be discussed in
Sec. VIII. Here we describe how ionic relaxation is easily
incorporated into a molecular-dynamics-based method.
A. The Car-Parrinello Lagrangian
The positions of the ions and the coordinates that
define the size and shape of the unit cell can be included
as dynamical variables in the molecular-dynamics La-
grangian. The resulting Lagrangian is usually referred to
as the “Car-Parrinello Lagrangian” and takes the form
LSD +HSIMR?
7 7
+53 484 Elfy;). (Roller, (7.1)
where M, is the mass of ion I and 8 is a fictitious mass
associated with the dynamics of the coordinates that
define the unit cell, fev).
B. Equations of motion
Two further sets of equations of motion can be ob-
tained from the Lagrangian (7.1), the first for the posi-
tions of the ions,
(7.2)
which simply relates the acceleration of the ions to the
forces acting on them. The second set of equations is for
the coordinates of the unit celt,
9E
Bã,=— da (7.3)
These equations relate the rate of acceleration of the
lengths of the lattice vectors to the diagonal components
of the stress tensor integrated over the unit cell and relate
the accelerations of the angles between the lattice vectors
to the off-diagonal components of the stress tensor in-
tegrated over the unit cell.
The equations of motion for the degrees of freedom as-
sociated with the dynamics of the ions and of the unit cell
can be integrated at the same time as the equations of
motion for the electronic states and, as will be shown
betow, provide a method for performing ab inítio dynam-
ical simulations of the ionic system. However, a relaxa-
tion of the ionic system can be performed using these
equations of motion simply by removing kinetic energy
from the electronic system, the ionic system, and the
Rev. Mad. Phys., Vol. 64, No. 4, October 1992
motion of the unit cell. In this case the system will
evolve until the total energy of the system is minimized
with respect to all of these degrees of freedom, ând the
ionic configuration will have reached a local energy
minimum. However, integration of the equations of
motion for the ions and for the unit cell is not as straight-
forward as it first appears. This is because physical
ground-state forces on the ions and integrated stresses on
the unit cell cannot be calculated for arbitrary electronic
configurations, as shown in the following section.
€. The Helimann-Feynman theorem
The force on ion Ff ;, is minus the derivative of the to-
tal energy of the system with respect to the position of
the ion,
dE
dR,
f,=— (7.4)
As an ion moves from one position to another, the wave
functions must change to the self-consistent Kohn-Sham
eigenstates corresponding to lhe new position of the ion
if the value of the Kohn-Sham energy functional is to
remain physically meaningful. The changes in the elec-
tronic wave functions contribute to the force on the ion,
as can be clearly seen by expanding the total derivative in
(7.4),
d
po DE dE dt
s 2E a di
0R, du; dR$
à aus dRG O (7.5
Equation (7.5) should. be compared with the Lagrange
equation of motion for the ion (7.2). Tt can be seen that
the “force” in Eq. (7.2) is onky the partial derivative of
the Kohn-Sham energy functional with respect to the po-
sition of the ion. In the Lagrange equations of motion
for the ion, the force on the ion is not a physical force, It
is the force that the ion would experience from a particu-
lar electronic configuration. However, it is easy to show
that when each electronic wave function is an eigenstate
of the Hamiltonian the final two terms in Eg. (7.5) sum to
zero. Since E /9y! is just H,, these two terms can be
rewritten
Ed lg + [Bt 0.6
3 (o [rt E (terre) 9
However, if ench 1, is an eigenstate of the Hamiltonian,
HbA, qn
so Eq. (7.6) is equal to
dy; EA
z (am bos)+3 (ua SR, )
=E tg télu=o, (7.8)
i
since (y,lb; ) is a constant by normalization.
1084 M. CG. Payne etal: Ab initio iterativo minimization techniques
be used to relax a system of ions to a local energy
minimum, The technique of moving the ions in the
directions of the Hellmann-Feynman forces until the
forces on the ions become zero is basically a zero-
temperature quench, because the ions do not acquire any
kinetic cnergy during the relaxation. At the end of this
process, the system will be in a local energy minimum.
By performing zero-temperature quenches from a variety
of initial configurations of the ionic system, one can ob-
tain information about the local energy minima of the
system. However, there is no guarantee that this method
will locate the global energy minimum of the system. In
theory, a very-low-energy minimum can only be found if
a simulated annealing process is carried out (Kirkpatrick
et al., 1983). But even with a simulated annealing pro-
cedure, there is no guarantee that the global energy
minimum will be located.
1. Simulated annealing
The success of any simulated annealing technique is
very sensitive to the structure of the phase space being
explored. The purpose of performing a simulated anneal
is to: determine the lowest-energy configuration of the
ionic system. For a system that contains many ions there
will be a large number of ionic configurations that are lo-
cal energy minima. The simulated annealing procedure
has to explore the phase space of the system to locate the
lowest-energy local minimum. The phase space for a par-
ticularly simple system is shown schematicaily in Fig. 22.
The diagram shows two local energy minima, separated
by energy AE. If the position of the lowest-energy
minimum is to be located using the technique of simulat-
ed annealing, the thermal energy AT must be smaller
than AE. If the thermal energy is larger than this, the
energies of the two minima cannot be distinguished
within the thermal smearing. The diagram shows an en-
ergy barrier of height E, separating the local energy
minima. The ionic configuration can only move between
the two local energy minima by gaining at least Ey in en-
ergy through a thermal fluctuation. The time spent wait-
ing for a thermal fluctuation of this magnitude is
(1/vlexplEs /k'T), where v is the attempt frequency. In
Totet Energy
Cortiguration Coordrato
FIG. 22. Representation of two energy minima differing by AE,
separated by à barrier Ep.
Rey. Mod. Phys. Vol. 64, Na. 4, October 1992
a typical simulation with the molecular-dynamics
method, the total time of the simulation is of the order of
10-100 times 1/v. Therefore the probability that the
ionic configuration will traverse an energy barrier during
the simulation is dominated by the exponential factor.
When the temperature is low enough to distinguish be-
tween the energies of the local energy minima (kT — AE),
the time taken for the system to move between the energy
minima is proportional to exp(E, /AE). The system
must move betwcen the minima at least once to locate
the lower-energy minimum. H E, /ÃE is large, it is ex-
tremely unlikely that the simulation will locate the global
energy minimum, but if E /AE is small, the global ener-
gy minimum can be located easily.
Simulated annealing is ideal for escaping from local en-
ergy minima separated from the global energy minimum
by small energy barriers. However, the method is very
inefficient when the energy barriers separating the energy
minima are large. Unless the structure of the phase
space of the system is very favorable, any simulated an-
nealing process will leave the system in a local energy
minimum rather than at the globai energy minimum.
The success of simulated annealing techniques also de-
pends on the number of local energy minima that have
encrgies close to the energy of the global energy
minimum. In principle, all of these local energy minima
have to be visited during the simulated annealing process
in order to determine which is the true global energy
minimum. As the number of local energy minima in-
creases exponentially with the number of atoms in the
system, simulated annealing is only likely to be successful
for small systems.
2. Monte Carlo methods
The difficulty of locating the position of the global en-
ergy minimum using the technique of simulated anneal-
ing is due to the difficulty of traversing the energy bar-
riers that separate the local energy minima. If the energy
barriers are large, the system only occasionally traverses
an energy barrier. Between traversals of the energy bar-
riers, the system explores the region of phase space
around a single local energy minimum. However, only
the value of the energy at the local minimum has any
relevance to the process of locating the global energy
minimum. The time spent exploring the phase space
around each local energy minimum serves no useful pur-
pose in the simulated annealing process, although it does
provide information about the free energy of the system.
It is easy to locate the local energy minimum in each re-
gion of phase space, but it is difficult to move between re-
gions of phase space that have low-energy local minima.
The process of locating the global energy minimum is
sometimes attempted by using the method of stecpest
descents or simple molecular dynamics to sample the re-
gion of phase space around each local energy minimum,
followed by a discontinuous jump through phase space
into the region around a different local energy minimum.
M. CG. Payne etal: Ab initio iterative minimization techniques 1085
However, there is no point in exploring regions of phase
space that have very high cnergies, and so a sampling cri-
terion should to be applied to determine whether to ex-
plore the region of phase space reached by the discon-
tinuous jump. The sampling criterion generally adopted
compares the energies of the new and the old fonic
configurations. The new configuration is accepted or re-
jected according to a Monte Carlo algorithm (Metropolis
ei ai., 1953): if the new configuration is of lower energy
than the old, it is accepted; if the new configuration is AE
higher in energy than the old configuration, it is accepted
with a probability exp(— AE /kT). This method allows
the system to cross energy barriers without waiting for a
thermal fluctuation large enongh to traverse the barrier.
The Monte Carlo technique described above is compu-
tationally expensive to implement with molecular-
dynamics schemes for relaxing the electronic
configuration to its ground state. When the ionic
configuration makes a discontinuous jump through phase
space, the electrons will not be close to the ground state
of the new ionic configuration. Each change in the ionic
configuration must be followed by a complete rclaxation
of the electronic configuration to the new ground state
before the energies of the initial and final configurations
can be compared. In contrast, the Monte Carlo tech-
nique could be efliciently implemented with the
conjugate-gradients method because the encrgy of the
new ionic configuration can be calculated rapidiy.
8. Location of low-energy contigurations
Location of global energy minima is a complex prob-
lem. No scheme can be guaranteed to find the global en-
ergy minimum in a single calculation. The only way of
being reasonably confident that the global energy
minimum has been located is to find the same lowest-
energy configuration in a number of different calcula-
tions, In practice a number of low-energy configurations
will be located by successive calculations. When subse-
quent calculations do not locate any new low-energy
configurations and the ionic configuration always reaches
one of the low-energy configurations found previously or
a configuration of significantly higher energy, then there
is a very high probability that all the low-energy
configurations of the system have been located and,
hence, that the global energy minimum has been located.
VIH. DYNAMICAL SIMULATIONS
There is an enormous literature associated with studies
of the dynamical behavior of systems. The book by Allen
and Tildesley (1987) provides af excellent introduction to
the subject. Obvious areas of interest include diffusion,
melting, and the calculation of free energies. These stud-
ies are generally carried out using empirical potentials
fi.e., some model of the interaction between the atoms in
the system parametrized according to experimental data).
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
Empirical potentials have the drawback that it is impos-
sible to know their region of validity. The potentials are
often parametrized using data for the perfect crystal or
data describing only small perturbations from thé perfect
crystal. Even if these potentials do work perfectly for the
crystal, there is no reason why they should be capable of
describing diffusion in the solid, which can involve
configurations very different from those found in the
crystal, let alone a liquid, whose structure may bear no
relation whatsoever to the parent crystal. The problem
of determining an accurate interatomic potential is par-
ticularly acute in the case of silicon, for which many
years of effort have yet to produce a general-purpose po-
tential. In contrast, the total-energy pseudopotential
method has been shown to be applicable in a much larger
region of phase space than any empirical potential.
Hence a dynamical simulation performed using these
forces should accurately describe a real system, irrespec-
tive of the region of phase space that is explored under
the dynamical evolution of the system.
A. Limitations to dynamical simulations
Tf simulations are performed using a finite supercell, as
they must be when plane-wave basis sets are used, the
systems cannot undergo true phase transitions, and the
range of correlations in the system will be limited by the
size of the supercelt. It should also be appreciated that
the electron temperature will, in general, be zero in such
a simulation. Thermal excitation of the electronic system
can be described in density-functional theory (Mermin,
1965); however, there are fundamental problems with
density-functional theory which make it difficult to de-
scribe a system at finite temperature without performing
an extremely time-consuming calctlation for the excited
states of the system (Hybertson and Louie, 1985; Godby
et al. 1986). Provided that the thermal energy is much
smaller than the smallest excitation energy of the elec-
tronic system, the cffccts of a finite electron temperature
should be small. If this is the case, the error introduced
by using zero-temperature density-functional theory in a
dynamical simulation should not be significant. The
effect of setting the electronic temperature to zero recent-
ly has been shown to be negligible in 4 study of the
structural phase transition of GeTe (Rabe and Joanno-
poulos, 1987).
B. Accuracy of dynamical trajectories
In Sec. VII it was pointed out that the calculated
forces on the ions are only the truc physical forces when
the electronic system is in its exact ground-state
configuration. Therefore, to generate correct dynamical
trajectories for the ions, the electrons must be relaxed to
the ground state at each ionic time step. Although any of
the methods described in Secs. HI, IV, and V can be used
to relax the electronic configuration to its ground state,
1086 M.C. Payne etal: Abi
most of Lhese prove to be extremely expensive computa-
tionally for performing dynamical simulations of the ion-
ic system. The most efficient of these, the conjugate-
gradients method, is fast enough to allow dynamical
simulations, but even in the case of this technique it is
important to generate good sets of initial wave functions
according to the technique outlined in Sec. VIIL.E below.
However, there is an alternative approach to performing
dynamical simutations, which forms the basis of the Car-
Parrincilo method. Rather than insisting that the elec-
tronic configuration be in the exact ground-state
configuration at each ionic time step, one may be able to
perform dynamical simulations even if the electronic
configuration is only close to the exact ground state. Al-
though this implies that there are errors in the
Hellmann-Feynman forces at cach time step, dynamical
simulations will be successful provided that the errors in
the forces remain small and that the effect of these errors
remains bounded in time. The Car-Parrineilo method
can fulfill both of these criteria (Remler and Madden,
1990; Pastore et aí., 1991). It is this latter point about
the boundedness of the errors which provides the distinc-
tion between the Car-Parrivello method and the “im-
proved” methods outlined in Secs. IV and V. While
these improved methods will for a fixed computational
effort move the electronic system closer to the ground-
state configuration than the simple molecular-dynamies
method, the errors introduced by these improved
methods, although smaller than the error in the simple
molecular-dynamics method, does not remain bounded in
time. The boundedness of the error in the Car-Parrincllo
method results from an “error cancellation” that occurs
when the Car-Parrinello Lagrangian is used to generate
the dynamics of the electronic and ionic system. This
effect is most easily demonstrated by the simple example
in the following section, which clarifies this point by
comparing second-order and first-order equations of
motion.
C. Error cancellation in dynamical simulations
The origin of the cancellation of the errors in the
Hellmann-Feynman forces under the equations of motion
generated by the Car-Parrinelio Lagrangian can be illus-
trated by considering a system that contains a single
atom, which has a single oscupied electronic orbital, as
shown in Fig. 23. The molecular-dynamies equation of
motion for the evolution of the electronic wave function
is
IH Aly. (8.1)
Tf the atom is at rest and the electronic wave function
is the ground-state wave function, then [H — A Jy=0, and
the wave function will be stationary. If the orbital is dis-
placed away from the ion, the magnitude of the accelera-
tion of the wave function will increase roughiy linearly
with the magnitude of the displacement. If the ion is
moving at constant velocity and the orbital begins to lag
u
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
iterative minimization techniques
ORBITAL 15 STATIONARY
TON BEGINS TO MOVE
(3 ORBITAL ACCELERATED
ORBITAL IN
INSTANTANEOUS
GROUND STATE
ORBITAL DECELERATED. (—)
FIG. 23, Schematic illustration of how an orbital will oscillate
around à moving ion during a simulation with
ab [8 —A]y, as discussed in the text. Velocities and ae-
celerations are designed as open and filled arrows, respectively.
behind the ion, the acceleration of the orbital. will in-
crease. The velocity of the orbital will increase until the
orbital overtakes the ion. As the orbital overtakes the
ion, the acceleration of the wave function will change
sign and the orbital will begin to slow down. The orbital
continues to slow down until the ion overtakes it, at
which point the whole process starts again. Herice, if the
ion were to move at constant velocity, the electronic or-
bital would oscillate around the instantaneous position of
the ion. The value of the Hellmann-Feynman force ex-
erted on the ion by the orbital will oscillate around the
correct value, so that the error in the Hellmann-Feynman
force will tend to cancel when averaged over a number of
wave-function oscillations. The oscillation of the error in
the Hellmann-Feynman force will prevent a continuous
transfer of energy between the ionic and the electronic
degrees of freedom, as long as the fictitious oscillations
occur over timc scales much shorter than the physical
ionic time scales. This is a reflection of the fact that,
given a sufficiently large mass ratio, there is an adiabatic
isolation of the (much) “lighter” electronic coordinates
from the “heavier” ionic degrees of freedom.
A first-order equation of motion, on contrast, gives the
following expression for the evolution of the electronic
orbital:
ó
With this equation of motion the velocity of the orbital
increases roughly linearly with the dispiacement of the
orbital from the ion. Oncc the ion has begun to move,
—[H AJ. (8.2)
M.C. Payne etal: Ab
within a few micro-eV per ion of its ground state. When
this extrapolation technique is combined with the
conjugate-gradients method, the resulting computational
scheme is sufficient to make the entire dynamical simula-
tion comparable in speed to a Car-Parrinello simulation.
However, the technique has the advantage that it can be
applied to a broader class of systems. The details of the
scheme can be found in Arias er aí. (1991); it will only be
described briefly below.
The basis for the trial wave-function scheme is the
first-order extrapolation
Plinio D= EultrG),
+albattrsD—Palfrt DI],
(8.4)
where fr(4,)) are the ionic coordinates at time 7, with
the ionic iteration number, a is a fitted parameter, and
the prime indicates a trial wave function, as opposed to
the fully converged W,,([r(5,)|). This scheme produces
trial wave functions correct to first order in dr (and, by
the 2N +1 theorem, energies correct to third order in the
time step) when the ionic coordinates are
fra di=irtp)+Helrd=-rtg OD. (8.5)
To ensure that the resulting wave functions are in as
close correspandence as possible with the actual ionio to-
cations, fr (%+1)),a is taken to minimize the discrepancy
trtgs dra d)]
=|rtta)—Utortt)tartg |. (8.6
One may imagine generalizing this scheme to higher
orders, employing more of the preceding wave functions
and producing ever smaller errors in the extrapolated
wave functions. However, higher-order schemes suffer
from an instability that pushes the wave-function errors
into regions of phase space where convergence is so
difficult that the net effect is to slow the simulation.
As in the Car-Parrinello scheme, orthonormality of the
wave functions must be maintained; however, Eq. (8.4)
yields wave functions that are not properly orthonorrhal.
Tn the present case, one can simply Gram-Schmidt ortho-
normalize the resulting wave functions, because there is
no longer any concern for maintaining a proper electron
dynamic and because this procedure will not disturb the
correctness to first order of the wave functions, a conse-
quence of the fact that
CPC DEL r (4 DD) =Bym + Ode?)
(8.7)
Once the wave functions extrapolated according to Eq.
(8.4) have been Grahm-Schmidt orthonormalized, they
are then relaxed to within a set tolerance of the Born-
Oppenheimer surface by the conjugate-gradient pro-
cedure; this completes one cycle of iteration of the ionic
motion.
Rev. Mod. Phys.. Vol. 64, No. 4, October 1992
fo Rerative minimization techniques 1089
F. Comparison of Car-Parrinello
and conjugate-gradient dynamics
The Car-Parrinello and conjugate-gradients schemes
for performing dynamical simulations are very different,
and it is important to understand these differences in or-
der to apply either technique successfully. The most im-
portant point is the difference between the time steps
used in the two methods. In this respect conjugate-
gradients dynamics is closer to conventional dynamical
simulations, in which the time step is chosen to ensure an
accurate integration of the ionic equations of motion. In
simulations employing empirical potentials and those us-
ing the conjugate-gradients scheme, the forces on the
ions are, to high precision, true derivatives of the total
potential energy of the ions. In the case of empirical po-
tentials, the only differences between the computed forces
and the derivatives of the total ionic energy are rounding
errors due to finite machine accuracy, but in the case of
the conjugate-gradients simulation, the differences also
include contributions due to the failure of the Hellmann-
Feynman theorem because the electronic system is not
exactly converged to its ground state. In the Car-
Parrinello simulation, at cach time step there are
significantly larger errors in the Hellmann-Feynman
forces, because the electronic configuration is not main-
taincd so close to its exact ground-state configuration.
The time step used in a Car-Parrinello simulation has to
be much shorter than the one used for an equivalent
conjugate-gradients simulation to integrate the electronic
equations of motion stably. Additionally, the longest
time period in the electronic system must be less than the
shortest ionic time period, to ensure that the errors in the
Hellmann-Feynman forces average to zero along the ion-
ic trajectories.
At first sight the Car-Parrinello method and the wave-
function extrapolation combined with conjugate-gradient
relaxation are rather similar, in that cach essentially per-
forms an integration of the wave functions forward in
time, However, the spirit of each technique and the be-
havior of the wave-function coefficients in the two cases
are very different. Im the case of the conjugate-gradients
dynamics, the wave functions are propagated as close as
possible to the instantancous ground state, in order to
reduce the effort required to fully relax them to the
ground state. In the Car-Parrinello method, the motion
of the electronic degrees of freedom preserves a delicate
adiabatic separation between the electronic and ionic de-
grees of freedom. The electronic coefficients oscillate
artificially about their ground-state values, which leads to
a cancellation of the errors in the ionic forces.
G. Phonon frequencies
The phonon frequencies of a system can be obtained by
performing a dynamical simulation and then Fourier-
transforming either the velocity or the position auto-
correlation functions. However, for this procedure to
1090 M.C. Payne etat.: Ab inítio iterative minimization techniques
give phonon frequencies to high accuracy, the original
ionic trajectories must be extremely accurate, since any
noise in the trajectories will broaden the phonon frequen-
cies. The conjugate-gradients dynamics scheme gen-
erates extremely accurate ionic trajectories, in which the
noise can be reduced to an arbitrarily low level, and thus
provides an excellent set of input data with which to
determine phonon frequencies. Figure 25 shows the
transform of the longitudinally potarized autocorrelation,
as determined by the maximum-entropy method (Press
et al., 1989), of 40 silicon atoms in a periodic system over
50 À long in the [100] direction. Each peak represents a
natural frequency in the system. Neither the heights of
the peaks nor this integrated intensíties are meaningful,
in that the system has not yet reached thermal equilibri-
um. Note that the primary caveat when working with
the maximum-entropy method is that it produces spuri-
ous peaks when working with noisy data. No'such peaks
are obtained, indicating very clean data. The frequencies
of the peak values of these spectra, as well as their trans-
verse counterparts, are then compared with the experi-
mentally measured phonon freguencies (Dolling, 1963;
Nilsson and Nelin, 1972) in Fig. 26. As can be seen,
there is good agreement between the results of the calcu-
lation and experiment, particularly in resolving thc deli-
cate splitting of the optic bands along À, which beat
against each other with periods on the order of one pi-
cosecond. This technique for obtaining phonon frequen-
cies requires no information about the displacements as-
sociated with each phonon mode and is particularly at-
tractive for complex systems in which the phonon dis-
placements are not known and for which it would be ex-
tremely expensive to compute the full matrix of second
5 a
o op
5
“os
sim Fr
Lida Ih
C 02 04 06 os Lo
us
FIG. 25. Superposition of maximum-entropy-method spectral
fits for cach class of allowed phonon k state. For clarity, only
the longitudinal spectra are displayed. The frequencies have
been scaled so that the optic phonon frequency «o, as calculated
from a frozen phonon calculation at the same cutoff in the same
supercell, is normalized to unity.
Rev. Mod. Phys., Vol. 64, No. 4, October 1892
EXPERIMENT
8º
88
THEORY
FIG. 26. Phonon spectrum as determined from muximum peak
values of maximum-entropy-method fts, These values are com-
pletely ab inítio, with no free parameters. Empty circles
represent experimental data (Dolling, 1963; Nilsson and Nelin,
1972), and filed cireles represent resulls of an extrapolated
conjugate-gradient dynamics simulation.
derivatives of the ionic potential energy—a calculation
normatly required to calculate phonon frequencies and
eigenvectors,
IX. NONLOCAL PSEUDOPOTENTIALS
The computational speeds of the molecular dynamics
and conjugate gradients techniques are significantly
enhanced by using local pseudopotentials rather than
nonlocal pseudopotentials. This allows the number of
operations required to multiply each of the wave func-
tions by the Hamiltonian to be reduced from Ny to
16NpylnlN py), where Npy is the number of plane wave
basis states. However, it is not possible to produce accu-
rate local pseudopotentials for ali atoms. To apply
molecular-dynamics and conjugate-gradients methods to
systems containing atoms that can only be represented by
nonlocal pseudopotentials, it is necessary to use an
efficient scheme for dealing with the nonlocality of the
pseudopotential. Nonlocal pseudopotentials generally re-
quire fewer plane-wave basis states than do local pseudo-
potentials to expand the electronic wave functions.
Therefore, although it will require additional computa-
tional effort to use nonlocal pseudopotentials in
molecular-dynamies and conjugate-gradients calcula-
tions, some of the loss in computational speed will be
recouped because fewer plane-wave basis states are re-
quired. However, it is essential to find an efficient
method for using the nonlocal pseudopotentials. The
methods that have been used employ only a partial pro-
jection of the nonlocal components of the wave functions.
Examples of such methods are described in the following
two sections. It has long been appreciated that all of
these partial-projection methods could be applied in ei
ther real space or reciprocal space. The computational
cost scales as the cube of the system size using a
M. C. Payne etat.: Ab initio iterative minimization techniques 1091
reciprocal-space method but only as the square of the
system size with the real-space method. As will be seen
in Sec. IX.C.l, there are difficulties associated with the
real-space projection method that have delayed its im-
plementation. These problems have now been overcome,
and Sec. IX.D describes a successful real-space projection
method for nonlocal pseudopotentials.
A. Kleinman-Bylander potentials
The most general form for a nonlocal pseudopotential
is
Vin =D Pim PC Fim (9.1)
Im
where Y,, are spherical harmonics and V, is the pseudo-
potential acting on the component of the wave function
that has angular momentum /. Outside the core radius
the potentials V, are identical for all the angular momen-
tum components of the wave function. To implement
this form for the nonlocal pseudopotential, one needs a
complete projection of the angular momentum com-
ponents of the wave functions. In contrast, the
Kleinman-Bylander pseudopotential (Kleinman and By-
lander, 1982; Allan and Teter, 1987) is a norm-conserving
pseudopotential that uses a single basis state for each an-
gular momentum component of the wave function. The
Kleinman-Bylander pseudopotential has the form
bm 8V7) CB V iba |
Vion=Proct DB ST» (9.2)
not or svi lg)
where Vroc is a local potential, 4%, are the wave func-
tions of the pseudoatom, and 8V, is
8V=Vinc—Vroc (9.3)
Here V,xz is the ! angular momentum component of any
nonlocal pseudopotential. Kleinman and Bylander sug-
gested using the arbitrariness of Voc to produce an ac-
curate and transferable pseudopotential.
The Kieinman-Bylander pseudopotential projects each
spherical harmonic component of the wave function onto
a single basis state. When applied to the pseudoatom, the
potential gives identical results to the nonlocal pseudopo-
tential it was derived from, independent of the choice for
the local potential Vroc. However, the potential does
not produce identical results when applied in another en-
vironment, because the wave function is not projected
onto a radially complete set of spherical harmonics.
Some of the difficulties that can be encountered with this
approach have been discussed recently by Gonze et ai.
(1990). Al of the known problems can be overcome by
the proper choice of local potential, a simple reduction in
the core radius, or the application of the ideas of extend-
ed norm conservation (Shirley et al, 1989). It may be
necessary, however, to include pseudo core states to
achieve a high degree of transferability for certain
Rev. Mod. Phys.. VOL 64, No. 4, October 1992
transition-metal atoms. These improvements all typically
require a larger number of plane waves in the basis set.
1, Enhanced projections
The Kleinman-Bylander form of the pseudopotential
can be systematically improved by adding more basis
functions for the projection of the spherical harmonics
(Bloechl, 1990). This allows the accuracy of the noniocal
potential to be checked by plotting the total energy as a
function of the number of basis states used for the projec-
tion of the spheriçal harmonics of the wave functions.
2. Computational cost
The contribution to the product of the Hamiltonian
and the wave function ; at wave vector k+G for the
Kleinman-Bylander pseudopotential is given by
z [mera [E timerociro JI , (9.4)
tm
where
Srarpdk+olnsPnsL o)
[4 df [8 Po lb 1!
and j, is the spherical Bessel function of order 1. The
spherical Bessel function j/(|k-+G!r) gives the amplitude
of the / angular momentum component of the plane wave
expli(k-+G)-r] at distance r from the origin.
The contributions to the product of the Hamiltonian
with each and every electronic wave function from the
nonlocal pseudopotential can be calculated in
3NpNpyNiNr operations per k point, using the
Kleinman-Bylander pseudopotential, where Ny is the
number of nonlocal spherical harmonic components in
the pseudopotential. 3NpNpyN;Nr, operations are re-
quired to calculate the contributions to the forces on all
the ions from the nontocal pseudopotential using the
Kleinman-Bylander scheme in reciprocal space, and
6N,NpyN;N, operations are required to compute the
diagonal and off-diagonal stresses on the unit cell. Hence
the computational cost of all these operations scale as the
third power of the number of ions in the unit cell, since
Na and Npy are proportional to N,. This computational
cost is usually significantly larger than the cost of ortho-
gonalizing the wave functions (N$Npy). Therefore the
application of the nonlocal potential in reciprocal space
will dominate the computational cost for large systems.
Xim p+6— (9.5)
B. Vanderbilt potentials
A rather more radical approach to modifying pseudo-
potentials for use in plane-wave calculations has been
suggested by Vanderbilt (1990). The basic aim with these
potentíals, in common with the other schemes described
in Sec. ILD.1.d, is to allow calculations to be performed
1094
wave with wave vector G =0 or any other integer multi-
ple of 4Gas- This is a general result; any function
defincd only on the real-space grid points is unchanged if
any of its Fourier components are changed by any multi-
ple of the wave vector 4G wa This is the origin of the
so-called “wraparound error” of discrete representations
of Fonrier transforms. The use of a double-density
Fourier transform grid eliminates the wraparound error
in all the parts of a pseudopotential calculation con-
sidered so far. Unfortunately, this is not true in the case
of a real-space projection of thc nonlocal Kleinman-
Bylander projectors because these projectors must have
Fourier components at all wave vectors if they are to be
strictly localized in the core of the atom. If the Fourier
transform grid moves with respect to a fixed ionic and
electronic configuration, there will be interference be-
tween the components of the Kleinman-Bylander projeo-
tors at wave vectors G +r4Gyy (n integer), and the
value of the Kieinman-Bylander matrix elements will
vary. For a fixed ionic and electronic configuration,
these matrix clements must be independent of the origin
of'the Fourier transform grid for their values to be physi-
cally meaningful, and without a solution of this problem
the real-space projection technique cannot bc applied.
Tt should be remembered that the jocal part of the
pseudopotential wilt also have components at large wave
vectors. However, the real-space representation of this
potential includes only wave vectors up to 2G ax. Since
this representation is generated by Fourier-transforming
the potential from reciprocal space, where only com-
ponents of the potential up to wave vector 26, are
represented on the double-density Fourier transform
grid. This analogy provides an immediate solution to the
problem associated with the real-space Kleinman-
Bylander projectors. Rather than using the actual real-
space Kleinman-Bylander projectors, we should use pro-
jectors that have been Fourier filtered so that they do not
contain any components at wave vectors larger than a
wave vector Gp, which must be less than 4Gmax- T this is
the case, these modified projectors will not be subject to
any wraparound error.
The cost of forcing the high Fourier components of the
modified Kleinman-Bylander projectors to be strictly
zero at wave vectors above Gp is that in real space the
projectors are no longer localized in the core, but are
nonzero over the whole of the real-space grid. Obvious-
ly, if the projectors extend over the whole of the real-
space grid, there is nothing to be gained from the real-
space projection technique. The trick is to use the arbi-
trariness of the modified Kleinman-Bylander projectors
over the range of wave vectors Gmax <G <Gp to ensure
that the magnitude of the modified projectors is negligi-
ble at distances greater than roughly 2”, from the ion
core, where r, is the core radius of the pseudopotential.
Of course, the Fourier components of the projectors at
wave vectors less than Ga, must not be changed, or the
real. and reciprocal-space Kicinman-Bylander projec-
tions will not be identical. The Kleinman-Bylander pro-
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
M.C, Payne etat: Ab initio Rerative minimization techniques
jections are carried out only over the grid points where
the projectors are non-negligible, thus restoring the
favorable scaling of the real-space projection technique.
The procedure for generating the modified projectors is
detailed in King-Smith et al. (1991). The test of the gen-
eration technique is simple. The modificd real-space pro-
jectors must yield identical results to the reciprocal-space
Kleinman-Bylander projectors, irrespective of the rela-
tive positions of the ion and the Fourier transform grid, a
condition that the old, unmodified real-space Kleinman-
Bylander projectors failed to meet.
X. PARALLEL COMPUTERS
The series of algorithmic improvements described in
this article yield a method for performing total-energy
pseudopotential calculations whose computational time
requirements scale essentially as a constant times the
theoretical minimum number of operations required to
update al! the wave functions in a single iteration. (This
can never be reduced below the cost of orthogonalizing
the wave functions without introducing an error.) The
value of this constant depends on the system but always
lies between several tens and several hundreds. Although
there may be some possibility of reducing this constant, it
is clear that, sincc this constant can never be less than
one and is more likely to be of the order of ten, there are
ultimate limits to the gains to be achieved by improve-
ments in thc numerical methods. There is certainly no
longer the possibility of increases in computational speed
by many orders of magnitude, of the sort that have been
gained in the last few years, without fundamentally
changing the essential features of the total-energy pseu-
dopotential method. This does not, of course, exclude
the possibility of a completely different method proving
to be more efficient.
The devclopments in the total-energy pseudopotential
method allow calculations to be performed on any
reasonably powerful computer (anything from a modern
workstation to a conventional supercomputer) for quite
large systems containing up to 150 atoms in the unit cell,
provided that the pseudopotentials are moderately weak.
However, to allow studies of significantly larger systems,
much more powerful computers are required, which
combine both faster processing speeds with extremely
large amounts of memory. Without a fundamental
change in computer technology (the speed of each pro-
cessor of a conventional supercomputer has changed by
significantly less than one order of magnitude in the last
decade), these requirements can onhy be fulfilled by com-
bining a number of processors into a “paralle?” comput-
er. In principle, by combining encugh compute nodes,
parallel computers can be constructed that achieve arbi-
trarily large numbers of operations per second. No
parallel computer will achieve 100% efficiency on a real
computation, since there are overheads associated with
communication between the processors, and achievement
M.C. Payne eta/: Ab initio iterative minimization techniques 1095
of a significant fraction of full efficiency is quite difficult
on any but the most trivial of tasks. A relatively low
efficiency is not too great a cause for concern, as long as
it is maintained as the number of compute nodes. is in-
creased. In this case, any required computational speed
is achievable with a large enough number of
processors—even if this computational speed “is
significantly less than the theoretical speed of that num-
ber of processors. The real problem, however, is that in
all applications the efliciency falls steadily above a criti-
cal number of processors. This effect is associated with
the parts of the code that cannot be run in parallel. At
the critical number of processors this part of the calcula-
tion starts to dominate the computational time, and no
further reduction in computational time is possible by in-
creasing the number of processors. In this regime the
efficiency varies as the inverse of the number of compute
nodes! Tt is clear that, for any calculation to be run
efficiently and “scaleably” (ie., so that computational
time decreases with number of processors), on a parallel
machine containing n processors not more than 1/n of
the entire calculation may run sequentially.
There are many possible architectures for parallel
machines. Each tends to have its own strengths and
weaknesses and, more importantly, its own suitability for
any particular computation. Interestingly total-energy
pseudopotential calculations have been successfully im-
plemented on two very different classes of parallel
machine. One, the Connection Machine, consists of an
extremely large number of relatively modest-performance
compute nodes. The other class of machine consists of a
smaller number of extremely powerful compute nodes;
examples of this latter class of machine are those
manufactured by Intel, Meiko, and N-Cube. Although
the strategies for implementing the codes is the same ón
both classes of machine, the detailed methods required to
implement the codes are rather different. The Connec-
tion Machine is programmed using a standard high-level
computer language, Fortran 90. When the total-energy
pseudopotential calculation is expressed using vector-
oriented Fortran 90 statements, parallel execution is im-
plemented by the compiler. The programmer is not re-
quired explicitly to implement communications among
the thousands of processors constituting the fine-grained,
massively parallel architecture. To further accelerate
processing, the vendor supphies a library of hand-micro-
coded FFT subroutines, which are directly callable from
Fortran 90 programs. The combination of programming
in Fortran 90 and the use of the distributed machine FFT
makes it possible for most of the total-energy calculation
to be performed in parallel.
In the case of the other class of machine, there is nor-
mally no fully distributed three-dimensional FFT, and a
major task in implementing total-energy pseudopotential
codes on these machines is the implementation of the
communications required to perform a distributed FFT.
A full description of the implementation of a set of pseu-
dopotential codes on this class of machine can be found
Rev. Mod. Phys., Vol. 64, No, 4, October 1992
in Clarke et ai, (1992),
The potential for pseudopotential calculations on
parallel computers has been demonstrated by the success-
ful calculations of the surface energy and relaxed struc-
ture of the 7X7 Takayanagi reconstruction (Takayanagi
et al., 1985) of the (111) surface of silicon on a 64-node
Meiko machine (Stich et al., 1992) and on a Connection
Machine (Brommer et aí., 1992). The supercells used for
these calculations contained 400 atoms and had a volume
of 784 times the atomic volume. Basis sets of up to
35000 plane waves were used to expand the electronic
wave functions, and the electronic minimization involved
up to 2.8X 10º degrees of freedom.
XI. CONCLUDING REMARKS
Car and Parrinello's molecular-dynamies method
stands as a lândmark in the field of total-energy pseudo-
potential caleulations. Iterative matrix diagonalization
techniques were in use before the molecular-dynamics
method was developed, but these schemes only partially
exploited the benefits to be gained by the use of iterative
techniques. Car and Parrinello's technique exploited the
advantages of overlapping the processes of calculating
eigenstates and iterating to self-consistency and the use of
fast Fourier transform techniques, Efficient schemes for
using nonlocal psendopotentials were only implemented
as a result of the molecular-dynamics method. The com-
bination of all of these features rather than just the re-
placement of a conventional matrix diagonalization
scheme by an iterative scheme is responsible for the
significant increase in the power of the total-energy pseu-
dopotential technique brought about by the molecular-
dynamics method. Any iteralive matiix diagonalization
technique must exploit-all of these features to be able to
challenge the molecular-dynamics method. Car and
Parrinello's method did not change the basic total-energy
pseudopotential technique but offered an enormous in-
crease in computational efficiency, so that much larger
and more complex systems became accessible to the tech-
nique. It also allowed the first ab inírio dynamical simu-
lations to be performed. Only xith the reduction in the
scaling of computational time with system size that
comes from Car and Parrinello's molecular-dynamics
method and from conjugate-gradients techniques is it
worthwhile performing calculations for large systems on
paraltl computers.
As mentioned previously, there is now only limited
scope for improvements in the algorithms used to per-
form total-energy pseudopotentia) calculations. Howev-
er, there are a number of areas where progress can still be
made: At present a definitive scheme for generating
pseudopotentials that are fully transferable and computa-
tionally efficient is still lacking. There is considerable
scope for improved density functionals. Perhaps the
most ambitious objective is to couple quantum-
mechanical modeling of small, critical regions of a system
1096
(such as a dislocation core) with a less rigorous modeling
of the noncritical regions (which could be modelcd using
classical elasticity theory).
Even if all else is forgotten, the authors would like the
reader to retain just one idea. This is that ab initio
quantum-mechanical modeling using the total-energy
pseudopotential technique is now capable of addressing
an extremely large range of problems in à wide range of
scientific disciplines.
ACKNOWLEDGMENTS
The authors wish to thank Y. Bar-Yam, E. Kaxiras,
M. Needels, K. M. Rabe, E. Tarnow, and D. H. Vander-
bilt for help and advice in developing, applying, and un-
derstanding the molecular-dynamics and conjugate-
gradients methods. MCP wishes to thank Volker Heine
for many insightful discussions on the pseudopotential
approach and R. J. Needs, R. W. Godby, and C. M. M.
Nex for helpful discussions on iterative matrix diagonali-
zation techniques.
REFERENCES
Alan, D.C, and M. P. Teter, 1987, Phys. Rev. Lett. 59, 1136.
Allen, M. P., and D. 3. Tildesley, 1987, Computer Simulation of
Liquids (Clarendon, Oxford).
Arias, T., J. D. Joannopoulos, and M. C. Payne, 1991, Phys.
Rev. B, in press.
Asheroft, N. W., and N. D. Mermin, 1976, Solid State Physics
(Holt Saunders, Philadelphia), p. 113.
Bachelet, G. B., D. R. Hamann, and M. Schliiter, 1982, Phys.
Rev. B 26, 4199.
Baraff, G. A. and M. A. Schluter, 1979, Phys. Rev. B 19, 4965.
Bar-Yam, Y. S. T. Pantelides, and J. D. Joannopoulos, 1989,
Phys. Rev. B 39, 3396.
Benedek, R,, L. H. Yang, C. Woodward, and B. 1. Min, 1991,
unpublished.
Bloechi, P. E., 1990, Phys. Rev. B 41, 5414.
Bloechi, P. E, and M. Parrinello, 1991, unpublished.
Brommer, K., M. Needels, B. Larson, and J, D. Joannopoulos,
1992, Phys. Rev. Lett. 68, 1355,
Broughton, 1., and F, Khan, 1989, Phys, Rev, B 40, 12098.
Car, R., and M. Parrineilo, 1985, Phys. Rev. Lett. 55, 2471.
Car, R., and M. Parrinello, 1989, in Simple Molecular Systems
ar Very High Density, edited by A. Polian, P. Lebouyre, and N.
Boccara (Plenum, New York), p. 455.
Car, R., M. Parrinello, and M, C. Payne, 1991, J. Phys.: Con-
dens. Matter 3, 9539.
Chadi, D. 1, and M. L. Coben, 1973, Phys. Rev. B8, 5747.
Cho, K., and J. D. Ioannopoulos, 1992, Phys. Rev. A 45, 7089.
Clarke, L. T. L. Stich, and M. C. Payne, 1992, unpublished.
Cohen, M. L., 1984, Phys. Rep. 110, 293.
Cohen, M. L., and V. Heine, 1970, Solid State Physics Vol. 24,
p.37.
Dentencer, P., and W. van Haeringen, 1985, J. Phys. C 18, 4127.
Dolling, G., 1963, in Inelastic Scattering of Neutrons in Solids
and Liguids Vol. IT (International Atomic Energy Agency,
Vienna), p. 37.
Drabold, D. A. P. A. Fedders, Otto F. Sankey, and Jobn D.
Rev. Mod. Phys., Vol. 64, No. 4, October 1992
M. C. Payne et at: Ab initio iterative minimization techniques
Dow, 1990, Phys. Rev. B 42, 5135.
Dreizler, R. M,, and J. da Providencia, 1985, Density Functional
Methods in Physies (Plenum, New York).
Evarestov, R. A, and V. P. Smirnov, 1983, Phys. Status Solidi
“9,9.
Ewald, P. P. 1917, Ann. Phys, (Leipzig) 54, 519.
Ewald, P. P. 1917h, Ann. Phys. (Leipzig) 54, 557.
Ewald, P. P, 1921, Ann. Phys. (Leipzig) 64, 253.
Fahy, S., X. W. Wang, and S. G. Louie, 1988, Phys. Rev. Leit.
61, 1631.
Fernando, G. W., Guo-Xin Qian, M. Weinert, and J. W. Daven-
port, 1989, Phys. Rev. B 40, 7985.
Fetter, A. L., and J, D. Walecka, 1971, Quantum Theory of
Many-Particle Systems (McGraw-Hill, New York), p. 29.
Feynman, R. P, 1939, Phys. Rev. 56, 340.
Francis, G. P, and M. C. Payne, 1990, F. Phys.: Condens.
Matter 17, 1643.
Froyen, S., and M. L. Cohen, 1986, J. Phys. C 19, 2623.
Gill, P. E, W. Murray, and M. H. Wright, 1981, Practical Op-
timization (Academic, London).
Gillan, M. 3, 1989, 1. Phys. Condens. Matter 1, 689.
Godby, R. W., M. Schluter, and L. J Shum, 1986, Phys. Rev.
Letr. 56, 2415.
Gomes Dacosta, P., O. H. Nlelsen, and K. Kunc, 1986, J. Phys.
Cas, 3163,
Gone, X., P. Kackeil, and M. Scheffler, 1990, Phys. Rev. B 42,
12264.
Gunnarsson, O., and B, L Lundgvist, 1976, Phys. Rev. B 13,
4174.
Hamann, D. R., 1989, Phys. Rev. B 40, 2980.
Hamann, D. R., M. Schliter, and C. Chiang, 1979, Phys. Rev.
Lett. 43, 1494.
Harris, J.. and R. O. Jones, 1974, J. Phys. F 4, 1170.
Hedin, L., and B. Lundqvist, 1971, J. Phys. C 4, 2064.
Heltmann, EL, 1937, Einfuhrung in die Quantumchemie (Deu-
ticke, Leipzig).
Hohenberg, P. and W. Kohn, 1964, Phys. Rev. 136, 864B.
Hybertson, M. S., and S. G. Louie, 1985, Phys. Rev. Lett. 55,
1418.
Th, J., A, Zunger, and M. L. Cohen, 1979, 3. Phys. C 12, 4409.
Janak, J. F., 1978, Phys. Rev. B 18, 7165.
Joannopoulos, 3. D., 1985, in Physics of Disordered Materials,
edited by D. Adler, H. Fritzsche, and S. R. Ovshinsky (Ple-
num, New York), p. 19.
Joannopoulos, J. D., P. Bash, and A. Rappe, 1991, Chemical
Design Automation News 6, No. 8.
Joannopoulos, J. D., and M. L. Cohen, 1973, 3. Phys. C 6, 1572.
Joannopoulos, J. D., T. Starkloff, and M. A. Kastner, 1977,
Phys. Rev. Lett. 38, 660.
Jones, 1991, Phys. Rev. Lett. 67, 224.
Jones, R. O., and O. Gunnarsson, 1989, Rev. Mod. Phys. 61,
689.
Kerker, G., 1980, J. Phys. C 13, LI89.
King-Smith, R. D., M. C. Payne, and ).S, Lin, 1991, Phys. Rev.
B44, 13063.
Kirkpatrick,
220, 671.
Kleinman, L., and D. M. Bylander, 1982, Phys. Rev. Lett. 4,
1425.
Kohn, W. and L. J. Sham, 1965, Phys. Rev. 140, LI33A.
Kryachko, E. S. and E. V. Ludena, 1990, Energp Density Fune-
tional Theory of Many Electron Systems (Kluwer Academic,
Boston).
Laasonen, K., R. Car, C. Lee, and D. Vanderbilt, 1991, Phys.
C. Gelatt, Jr., and M. Vecchi, 1983, Science