Structure enumeration

To train a cluster expansion, one needs a set of symmetrically inequivalent structures and their corresponding energy (or another property of interest). It is sometimes possible to generate such structures by random occupation of a supercell. In many systems, however, a much better approach is to generate all symmetrically inequivalent occupations up to a certain supercell size. This process is usually referred to as structure enumeration. Enumeration is useful both for generating a set of small supercells without duplicates and for systematically searching for ground states once a cluster expansion is fitted. The present tutorial demonstrates the use of the structure enumeration tool in icet.

Import modules

The enumerate_structures function needs to be imported together with some additional functions from ASE.

from ase import Atom
from ase.build import bulk, fcc111, add_adsorbate
from ase.db import connect
from icet.tools import enumerate_structures

Generate binary structures

Before being able to perform the structural enumeration, it is first necessary to generate a primitive structure. In this case, an Au fcc ase.Atoms object is created using the bulk() function. Then a database AuPd-fcc.db is initialized, in which the enumerated structures will be stored. All possible binary Au/Pd structures with up to 6 atoms per unit cell are subsequently generated and stored in this database.

primitive_structure = bulk('Au')
db = connect('AuPd-fcc.db')
for structure in enumerate_structures(primitive_structure,
                                      range(1, 7),
                                      ['Pd', 'Au']):
    db.write(structure)

Generate binary structures in the dilute limit

The number of distinct structures grows extremely quickly with the size of the supercell. It is thus not possible to enumerate too large cell sizes. When the number of structures grows, a larger and larger proportion of the structures will have equal amounts of the constituent elements (e.g., most structures will have concentrations close to 50% in binary systems). Structures in the dilute limit may thus be underrepresented. To overcome this problem, it is possible to enumerate structures in a specified concentration regime by providing a dict in which the range of allowed concentrations is specified for one or more of the elements in the system. Concentration is here always defined as the number of atoms of the specified element divided by the total number of atoms in the structure, without respect to site restrictions. Please note that for very large systems, concentration restricted enumeration may still be prohibitively time or memory consuming even if the number of structures in the specified concentration regime is small.

conc_rest = {'Au': (0, 0.1)}
for structure in enumerate_structures(primitive_structure,
                                      range(10, 14),
                                      ['Pd', 'Au'],
                                      concentration_restrictions=conc_rest):
    db.write(structure)

Generate structures with vacancies

The steps above are now repeated to enumerate all palladium hydride structures based on up to four primitive cells, having up to 4 Pd atoms and between 0 and 4 H atoms (note, however, that in this example, no structure with 4 H atoms will be generated, as a structure with 4 Pd and 4 H is always symmetrically equivalent to the primitive structure with 1 Pd and 1 H). Vacancies, represented by ‘X’, are explicitly included, which results in a ternary system. The structures thus obtained are stored in a database named PdHVac-fcc.db.

a = 4.0
primitive_structure = bulk('Au', a=a)
primitive_structure.append(Atom('H', (a / 2, a / 2, a / 2)))
species = [['Pd'], ['H', 'X']]
db = connect('PdHVac-fcc.db')
for structure in enumerate_structures(primitive_structure, range(1, 5), species):
    db.write(structure)

Generate surface slabs with adsorbates

Lower dimensional systems can be enumerated as well. Here, this is demonstrated for a copper surface with oxygen atoms adsorbed in hollow sites on a {111} surface. In order to deal with enumeration in only one or two dimensions, the periodic boundary conditions of the input structure need to reflect the desired behavior. For example in the case of a surface system, one has to use non-periodic boundary conditions in the direction of the normal to the surface. This is the default behavior of the surface building functions in ASE but is enforced for clarity in the following example.

primitive_structure = fcc111('Cu', (1, 1, 5), vacuum=10.0)
primitive_structure.pbc = [True, True, False]
add_adsorbate(primitive_structure, 'O', 1.2, 'fcc')
add_adsorbate(primitive_structure, 'O', 1.2, 'hcp')
species = []
for atom in primitive_structure:
    if atom.symbol == 'Cu':
        species.append(['Cu'])
    else:
        species.append(['O', 'H'])
db = connect('Cu-O-adsorbates.db')
for structure in enumerate_structures(primitive_structure, range(1, 5), species):
    db.write(structure)

Source code

The complete source code is available in examples/enumerate_structures.py

"""
This example demonstrates how to enumerate structures, i.e. how to
generate all inequivalent structures derived from a primitive
structure up to a certain size.
"""

# Import modules
from ase import Atom
from ase.build import bulk, fcc111, add_adsorbate
from ase.db import connect
from icet.tools import enumerate_structures

# Generate all binary fcc structures with up to 6 atoms/cell
# and save them in a database
primitive_structure = bulk('Au')
db = connect('AuPd-fcc.db')
for structure in enumerate_structures(primitive_structure,
                                      range(1, 7),
                                      ['Pd', 'Au']):
    db.write(structure)

# Generate fcc structures in the dilute limit
conc_rest = {'Au': (0, 0.1)}
for structure in enumerate_structures(primitive_structure,
                                      range(10, 14),
                                      ['Pd', 'Au'],
                                      concentration_restrictions=conc_rest):
    db.write(structure)

# Enumerate all palladium hydride structures with up to 4 primitive
# cells (= up to 4 Pd atoms and between 0 and 4 H atoms). We want to
# specify that one site should always be Pd while the other can be
# either a hydrogen or a vacancy ('X' will serve as our vacancy)
a = 4.0
primitive_structure = bulk('Au', a=a)
primitive_structure.append(Atom('H', (a / 2, a / 2, a / 2)))
species = [['Pd'], ['H', 'X']]
db = connect('PdHVac-fcc.db')
for structure in enumerate_structures(primitive_structure, range(1, 5), species):
    db.write(structure)

# Enumerate a copper surface with oxygen adsorbates (or vacancies) in
# fcc and hcp hollow sites.
primitive_structure = fcc111('Cu', (1, 1, 5), vacuum=10.0)
primitive_structure.pbc = [True, True, False]
add_adsorbate(primitive_structure, 'O', 1.2, 'fcc')
add_adsorbate(primitive_structure, 'O', 1.2, 'hcp')
species = []
for atom in primitive_structure:
    if atom.symbol == 'Cu':
        species.append(['Cu'])
    else:
        species.append(['O', 'H'])
db = connect('Cu-O-adsorbates.db')
for structure in enumerate_structures(primitive_structure, range(1, 5), species):
    db.write(structure)