Parallel Monte Carlo simulations

Monte Carlo simulations are in general pleasingly parallel in the sense that no communication is needed between two runs with different sets of parameters. In mchammer, this can be conveniently exploited with the multiprocessing package, which is included in the Python standard library. A run script requires very little modification to be parallelized. Here, the Monte Carlo simulation in the basic tutorial is reproduced. The initialization is identic:

ce = ClusterExpansion.read('mixing_energy.ce')
cs = ce.get_cluster_space_copy()
structure = make_supercell(cs.primitive_structure,
                           3 * np.array([[-1, 1, 1],
                                         [1, -1, 1],
                                         [1, 1, -1]]))
calculator = ClusterExpansionCalculator(structure, ce)


A non-parallel simulation would now run in a nested loop over all parameters. In a parallel simulation, the content of the loop is instead wrapped in a function:

def run_mc(args):
    temperature = args['temperature']
    dmu = args['dmu']
    mc = SemiGrandCanonicalEnsemble(
        structure=structure,
        calculator=calculator,
        temperature=temperature,
        dc_filename='sgc-T{}-dmu{:+.3f}.dc'.format(temperature, dmu),
        chemical_potentials={'Ag': 0, 'Pd': dmu})
    mc.run(number_of_trial_steps=len(structure) * 30)


Next, all sets of parameters to be run are stored in a list:

args = []
for temperature in range(600, 199, -100):
    for dmu in np.arange(-0.6, 0.6, 0.05):
        args.append({'temperature': temperature, 'dmu': dmu})

Finally, a multiprocessing Pool object is created. At this step, the number of processes can be specified (the default value for processes is the number of CPUs on your machine, which is typically a sensible choice). The simulation is started by mapping the sets of parameters to the run function.

pool = Pool(processes=4)
results = pool.map_async(run_mc, args)
results.get()
pool.close()

Note

It is strongly recommended to use the functions for asynchronus mapping, specifically Pool.map_async and Pool.starmap_async. The reason for this is that these, in contrast to Pool.map and Pool.starmap, do not block the main process, which can cause some of the child processes to hang when running Monte Carlo simulations using functionalities imported from the mchammer module.

Note that in the above example, an ensemble object will always be initialized with the same supercell, which means that the system needs to be equilibrated from scratch for every set of parameter. If equilibration is time consuming, it may be advisable to, for example, avoid parallelization over chemical potential (but keeping parallelization over temperature).

Source code

The complete source code is available in examples/advanced_topics/parallel_monte_carlo.py

from multiprocessing import Pool
from ase.build import make_supercell
from icet import ClusterExpansion
from mchammer.calculators import ClusterExpansionCalculator
from mchammer.ensembles import SemiGrandCanonicalEnsemble
import numpy as np

# step 1: Set up structure to simulate as well as calculator
ce = ClusterExpansion.read('mixing_energy.ce')
cs = ce.get_cluster_space_copy()
structure = make_supercell(cs.primitive_structure,
                           3 * np.array([[-1, 1, 1],
                                         [1, -1, 1],
                                         [1, 1, -1]]))
calculator = ClusterExpansionCalculator(structure, ce)


# step 2: Define a function that handles MC run of one set of parameters
def run_mc(args):
    temperature = args['temperature']
    dmu = args['dmu']
    mc = SemiGrandCanonicalEnsemble(
        structure=structure,
        calculator=calculator,
        temperature=temperature,
        dc_filename='sgc-T{}-dmu{:+.3f}.dc'.format(temperature, dmu),
        chemical_potentials={'Ag': 0, 'Pd': dmu})
    mc.run(number_of_trial_steps=len(structure) * 30)


# step 3: Define all sets of parameters to be run
args = []
for temperature in range(600, 199, -100):
    for dmu in np.arange(-0.6, 0.6, 0.05):
        args.append({'temperature': temperature, 'dmu': dmu})

# step 4: Define a Pool object with the desired number of processes and run
pool = Pool(processes=4)
results = pool.map_async(run_mc, args)
results.get()
pool.close()