Data container¶
The results of a MC simulation are stored in the form of a data
container object. Most ensembles use DataContainer
objects; Wang-Landau simulations though employ
WangLandauDataContainer
objects.
For simplicity the examples below are based on the DataContainer
class. The use of the
WangLandauDataContainer
objects is
largely analogous. In addition, there are several additional methods that are
specific to Wang-Landau simulations.
Accessing a data container¶
The data container be accessed via the data_container
property of the
ensemble:
>>> dc = mc.data_container # here mc is DataContainer
More commonly the data container is written to file during the simulation and
can then be read from file for analysis. (N.B.: To trigger saving the data
container a filename has to be provided when initializing the ensemble.) The
data container can be read via the read
function, e.g., (assuming the name of data container file in my_test.dc)
>>> from mchammer import DataContainer
>>> dc = DataContainer.read('my_test.dc')
The DataContainer
class provides ample
functionality for processing data and extracting various observables that are
briefly introduced below.
Extracting data¶
The raw data as a function of MC trial step can be obtained via the get
function, which also allows slicing data by
specifying an initial MC step. This is useful e.g., for discarding the
equilibration part of a simulation. In the following snippet we retrieve all
observations of potential starting with the 100-th trial step:
>>> energy = dc.get('potential', start=100)
The get
function also allows extracting
several observables in parallel. Which observables are available, can be
checked using the observables
attribute:
>>> print(sorted(dc.observables))
['acceptance_ratio', 'occupations', 'potential', 'sof_A_Ag', 'sof_A_Au']
The mctrial, potential, and trajectory observables are available by default. potential refers the thermodynamic potential sampled by the trajectory (usually defined by the cluster expansion to run the simulation). trajectory refers to the atomic configurations along the trajectory.
Assume, e.g., that the original simulation
was carried out with a
SiteOccupancyObserver
,
then site occupancy of the sites labeled ‘A’ with Ag could be retrieved as
follows:
>>> mctrial, energy, sro = dc.get('mctrial', 'potential', 'sof_A_Ag')
This enables one to plot observables as a function of the MC trial as demonstrated by the following snippet:
>>> import matplotlib.pyplot as plt
>>> s, p = dc.get('mctrial', 'potential')
>>> _ = plt.plot(s, p)
>>> plt.show(block=False)
The atomic configurations along the trajectory can be retrieved as a list of
Atoms
objects using the trajectory observable.
>>> traj = dc.get('trajectory')
This also allows for pairing the snapshots in the trajectory with observables in the data container.
>>> E_mix, traj = dc.get('potential', 'trajectory')
Updating data container¶
Normally observers are attached to an ensemble at the
beginning of an MC simulation via the attach_observer
function. They can,
however, also be applied after the fact via the apply_observer
function, provided the trajectory is
available via a DataContainer
object.
>>> from mchammer.observers import BinaryShortRangeOrderObserver
>>> obs = BinaryShortRangeOrderObserver(cs, structure, radius=1.1)
>>> dc.apply_observer(obs)
>>> s, sro = dc.get('mctrial', 'sro_Ag_1')
>>> _ = plt.plot(s, sro)
>>> plt.show(block=False)
Afterwards the data container, including the new data, can be written back to
file using the write
function.
Data analysis¶
Data containers also allow more detailed analysis. The analyze_data
function computes average, standard
deviation, correlation length, and 95% error estimate of the average for a
given observable.
>>> summary = dc.analyze_data('potential')
Here, the correlation length, \(s\), is estimated from the autocorrelation function (ACF). When the ACF has decayed below \(\mathrm{e^{-2}}\) observations are said to be uncorrelated, providing an estimate of the correlation length.
An error estimate of the average can be calculated via
where \(\sigma\) is the standard deviation, \(N\) the number of samples, \(s\) the correlation length and \(t\) is the t-factor, which can be adjusted depending on the desired confidence interval.
Obtaining the autocorrelation function directly or carrying out error estimates can be done via functionality provided in the data_analysis module.