Basis Sets¶

The following lays down the schema annotation for several families of basis sets. We start off genercially before running over specific examples. The aim is not to introduce the full theory behind every basis set, but just enough to understand its main concepts and how they relate.

General Structure¶

Basis sets are used by codes to represent various kinds of electronic structures, e.g. wavefunctions, densities, exchange densities, etc. Each electronic structure is therefore described by an individual BasisSetContainer in the schema.

Basis sets may be partitioned by various regions, spanning either physical / reciprocal space, energy (i.e. core vs. valence), or potential / Hamiltonian. We will cover the partitions per example below. Each BasisSetContainer is then constructed out of several basis_set_components matching a single region. Sometimes, a region is defined in terms of another schema section, e.g. an atom center (species_scope) or Hamiltonian terms (hamiltonian_scope).

Note that typically, different kinds of regions also have different mathematical formulations. Each formulation has its own dedicated section, to facilitate their reuse. These are all derived from the abstract section BasisSetComponent, so that basis_set_components: list[BasisSetComponent].

Generically, BasisSetComponent will allude to the the formula at large and just focus on capturing the subtype, as well as relevant parameters. The most relevant ones are those that most commonly listed in the Method section of an article. These typically also influence the precision most. Extra, code-specific subtypes and parameters can be added by their respective parsers.

This then coalesces into the following diagram:

ModelMethod
└── NumericalSettings[0]
└── ...
└── NumericalSettings[n] = BasisSetContainer
                            └── BasisSetComponent[1]
                            └── ...
                            └── BasisSetComponent[n]
                                └──> AtomsState
                                └──> BaseModelMethod

Plane-waves¶

Plane-wave basis sets start from the description of a free electron and use Fourier to construct the representations of bound electrons. In reciprocal space, the basis set can thus be thought of as vectors in a Cartesian* grid enclosed within a sphere.

The main parameter is the spherical radius, i.e. the cutoff, which corresponds to the highest frequency representable Fourier frequency. By convention, the radius is typically expressed in terms of the kinetic energy for the matching free-electron wave. PlaneWaveBasisSet allows storing either cutoff_radius and cutoff_energy. It can even derive the former from the latter via normalization.

Pseudopotentials¶

Pseudopotentials replace the strong Coulomb potential of the nucleus and tightly-bound core electrons with a weaker effective potential acting on valence electrons. This approximation dramatically reduces the number of basis functions needed in plane-wave calculations while preserving chemical behavior.

The Pseudopotential class stores metadata identifying which pseudopotential files were used. The actual numerical data (radial functions, projectors, augmentation charges) resides in external files (POTCAR, UPF, psp8, etc.) that are typically not archived due to size and licensing restrictions.

Type Classification¶

The type field classifies pseudopotentials by their fundamental formalism:

NC (Norm-conserving): Maintains charge norm in the pseudized core region. Highest transferability across chemical environments but requires higher plane-wave cutoffs. Generated using methods like Troullier-Martins or optimized dual-space approaches.
US (Ultrasoft): Vanderbilt's formalism that relaxes norm-conservation for significantly softer pseudopotentials. Lower cutoffs reduce computational cost but may sacrifice transferability. All ultrasoft pseudopotentials follow the same fundamental formalism.
PAW (Projector Augmented Wave): All-electron frozen-core method that reconstructs the full wavefunction within augmentation spheres. Standard PAW uses non-norm-conserving partial waves optimized for ground-state DFT.
NC-PAW: PAW with norm-conserving partial waves. More expensive than standard PAW but provides better scattering properties for high-energy states.
NC-PAW-GW: NC-PAW optimized for GW/BSE calculations. Includes additional projectors at higher energies to accurately describe quasiparticle states far above the Fermi level. Standard PAW and US systematically underestimate scattering into high-energy unoccupied states, which is critical for many-body perturbation theory.

Cross-Code Terminology¶

Different DFT codes use varying terminology and file organization schemes for pseudopotentials. The table below maps common code-specific naming conventions to the standardized schema fields:

Code	File Format	Hardness Indicators	Type Detection	Standard Sets
VASP	POTCAR (proprietary)	`_h` (hard), `_s` (soft) suffixes; ENMAX/ENMIN in header	LPAW=T → PAW; LULTRA=T → US; _GW suffix → NC-PAW-GW	Licensed with VASP; organized by XC functional (PBE, LDA, etc.)
Quantum ESPRESSO	UPF (XML)	Type-dependent recommended cutoffs; ecutwfc/ecutrho in input	Header `pseudo_type` field (NC/US/PAW); `paw_as_gipaw` flag	SSSP (precision/efficiency), PseudoDojo (standard/stringent), Quantum ESPRESSO library
CASTEP	UPF or proprietary	COARSE/MEDIUM/FINE precision levels; `cut_off_energy` in file	Inferred from file content structure	"On-the-fly" generation (OTFG); NCP/USP libraries organized by functional
ABINIT	psp8 (text), UPF	`high`/`low` suffixes in PseudoDojo sets; `ecut` recommendation in header	`pspcod` identifier (Troullier-Martins, HGH, PAW); format version	PseudoDojo (NC and PAW-XML), JTH table (PAW), HGH tables

Notes:

Hardness terminology across codes refers to the same underlying physics: smaller core radii require higher plane-wave cutoffs but provide better transferability and accuracy.
VASP, QE, and CASTEP all support PAW, but implementation details differ. VASP's PAW follows Kresse & Joubert (1999), while QE implements the original Blöchl formulation.
The Morrison-Bylander-Kleinman (MBK) separable form is an implementation technique used across all types in modern codes, not a distinct pseudopotential classification.
Standard pseudopotential libraries (SSSP, PseudoDojo) provide validation data including recommended cutoffs and accuracy metrics (Δ-gauge). These should be used when available rather than arbitrary cutoff choices.

LAPW¶

The family of linearized augmented plane-waves is one of the best examples of region partitioning:

first it partitions the physical space into regions surrounding the atomic nuclei, i.e. the muffin-tin spheres, and the rest, i.e. the interstitial region.
it then further partitions the muffin tins by energy, i.e. core versus valence. Note that unlike with pseudpotentials, the electrons are not abstracted away here. They are instead explicitly accounted for and relaxed, just via a different representation. Hence, LAPW is a full-electron approach.

The interstitial region, covering mostly loose bonding, is described by plane-waves (APWPlaneWaveBasisSet). [1] The valence electrons in the muffin tin (MuffinTinRegion), meanwhile, are represented by the spherically symmetric Schrödigner equation. [1] They follow the additional constraint of having to match the plane-wave description. In that sense, where the plane-wave description becomes too expensive, it is "augmented" by the muffin-tin description. This results in a lower plane-wave cutoff.

The spherically symmetric Schrödigner equation decomposes into an angular and radial part. In traditional APW (not supported in NOMAD), the angular and radial part are coupled in a non-linear fashion via the radial energy (at the boundary). All versions of LAPW simplify the coupling by parametrizing this radial energy. [1]

The representation vector is then developed in terms of the angular basis vectors, i.e. \(l\)-channels, each with their corresponding radial energy parameter. This approach is -confusingly- also called APW. It is typically not found standalone, though. Instead, the linearization introduces a secondary representation via the first-order derivative of the basis vector (function). Both vectors are typically developed together. This technique is called linearized APW (LAPW). [1]

Other formulas have been experimented with too. For example, the use of even higher-order derivatives, i.e. superlinearized APW (SLAPW). [2, 3] All of these types are captured by APWOrbital, where type distinguishes between APW, LAPW, or SLAPW. The name quantity

Another option is to stay with APW (or LAPW) and add standalone vectors targeting specific atomic states, e.g. high-energy core states, valence states, etc. These are called local orbitals (lo) and bear other constraints. Some authors distinguish different vector sums with different kinds of local orbitals, e.g. lo, LO, high-dimensional LO (HDLO). [2, 4] Since there is no community-wide consensus on the use of these abbreviations, we only utilize lo via APWLocalOrbital.

In summary, a generic LAPW basis set can thus be summarized as follows:

LAPW+lo
├── 1 x plane-wave basis set
└── n x muffin-tin regions
    └── l_max x l-channels
        ├── orbitals
        └── local orbitals ?

or in terms of the schema:

BasisSetContainer(name: LAPW+lo)
├── APWPlaneWaveBasisSet
├── MuffinTinRegion(atoms_state: atom A)
├── ...
└── MuffinTinRegion(atoms_state: atom N)
    ├── channel 0
    ├── ...
    └── channel l_max
        ├── APWOrbital(type: lapw)
        └── APWLocalOrbital ?

[1]: D. J. Singh and L. Nordström, \"INTRODUCTION TO THE LAPW METHOD,\" in Planewaves, pseudopotentials, and the LAPW method, 2nd ed. New York, NY: Springer, 2006.

[2]: A. Gulans, S. Kontur, et al., exciting: a full-potential all-electron package implementing density-functional theory and many-body perturbation theory, J. Phys.: Condens. Matter 26 (363202), 2014. DOI: 10.1088/0953-8984/26/36/363202

[3]: J. VandeVondele, M. Krack, et al., WIEN2k: An APW+lo program for calculating the properties of solids, J. Chem. Phys. 152(074101), 2020. DOI: 10.1063/1.5143061

[4]: D. Singh and H. Krakauer, H-point phonon in molybdenum: Superlinearized augmented-plane-wave calculations, Phys. Rev. B 43(1441), 1991. DOI: 10.1103/PhysRevB.43.1441

Gaussian-Planewaves (GPW)¶

The CP2K code introduces an algorithm called QuickStep that partitions by Hamiltonian, describing

the kinetic and Coulombic electron-nuclei interaction terms of a Gaussian-type orbital (GTO).
the electronic Hartree energy via plane-waves.

This GPW choice is to increase performance. [1] In the schema, we would write:

BasisSetContainer(name: GPW)
├── PlaneWaveBasisSet(hamiltonian_scope: [`/path/to/kinetic_term/hamiltonian`, `/path/to/e-n_term/hamiltonian`])
└── AtomCenteredBasisSet(name: GTO, hamiltonian_scope: [`/path/to/hartree_term/hamiltonian`])

For further details on the schema, see the CP2K parser documentation.

[1]: J. VandeVondele, M. Krack, et al., Quickstep: Fast and accurate density functional calculations using a mixed Gaussian and plane waves approach, Comp. Phys. Commun. 167(2), 103-128, 2005. DOI: 10.1016/j.cpc.2004.12.014.