# Data container¶

The results of a MC simulation are stored in the form of a data container object. Most ensembles use DataContainer objects; Wang-Landau simulations though employ WangLandauDataContainer objects. For simplicity the examples below are based on the DataContainer class. The use of the WangLandauDataContainer objects is largely analogous. In addition, there are several additional methods that are specific to Wang-Landau simulations.

## Accessing a data container¶

The data container be accessed via the data_container property of the ensemble:

>>> dc = mc.data_container  # here mc is DataContainer


More commonly the data container is written to file during the simulation and can then be read from file for analysis. (N.B.: To trigger saving the data container a filename has to be provided when initializing the ensemble.) The data container can be read via the read function, e.g., (assuming the name of data container file in my_test.dc)

>>> from mchammer import DataContainer


The DataContainer class provides ample functionality for processing data and extracting various observables that are briefly introduced below.

## Extracting data¶

The raw data as a function of MC trial step can be obtained via the get function, which also allows slicing data by specifying an initial MC step. This is useful e.g., for discarding the equilibration part of a simulation. In the following snippet we retrieve all observations of potential starting with the 100-th trial step:

>>> energy = dc.get('potential', start=100)


The get function also allows extracting several observables in parallel. Which observables are available, can be checked using the observables attribute:

>>> print(sorted(dc.observables))
['acceptance_ratio', 'occupations', 'potential', 'sof_A_Ag', 'sof_A_Au']


The mctrial, potential, and trajectory observables are available by default. potential refers the thermodynamic potential sampled by the trajectory (usually defined by the cluster expansion to run the simulation). trajectory refers to the atomic configurations along the trajectory.

Assume, e.g., that the original simulation was carried out with a SiteOccupancyObserver, then site occupancy of the sites labeled ‘A’ with Ag could be retrieved as follows:

>>> mctrial, energy, sro = dc.get('mctrial', 'potential', 'sof_A_Ag')


This enables one to plot observables as a function of the MC trial as demonstrated by the following snippet:

>>> import matplotlib.pyplot as plt
>>> s, p = dc.get('mctrial', 'potential')
>>> _ = plt.plot(s, p)
>>> plt.show()


The atomic configurations along the trajectory can be retrieved as a list of Atoms objects using the trajectory observable.

>>> traj = dc.get('trajectory')


This also allows for pairing the snapshots in the trajectory with observables in the data container.

>>> E_mix, traj = dc.get('potential', 'trajectory')


## Updating data container¶

Normally observers are attached to an ensemble at the beginning of an MC simulation via the attach_observer function. They can, however, also be applied after the fact via the apply_observer function, provided the trajectory is available via a DataContainer object.

>>> from mchammer.observers import BinaryShortRangeOrderObserver
>>> obs = BinaryShortRangeOrderObserver(cs, structure, radius=1.1)
>>> dc.apply_observer(obs)
>>> s, sro = dc.get('mctrial', 'sro_Ag_1')
>>> _ = plt.plot(s, sro)
>>> plt.show()


Afterwards the data container, including the new data, can be written back to file using the write function.

## Data analysis¶

Data containers also allow more detailed analysis. The analyze_data function computes average, standard deviation, correlation length, and 95% error estimate of the average for a given observable.

>>> summary = dc.analyze_data('potential')


Here, the correlation length, $$s$$, is estimated from the autocorrelation function (ACF). When the ACF has decayed below $$\mathrm{e^{-2}}$$ observations are said to be uncorrelated, providing an estimate of the correlation length.

An error estimate of the average can be calculated via

$\mathrm{error} = \frac{t \sigma }{\sqrt{Ns}},$

where $$\sigma$$ is the standard deviation, $$N$$ the number of samples, $$s$$ the correlation length and $$t$$ is the t-factor, which can be adjusted depending on the desired confidence interval.

Obtaining the autocorrelation function directly or carrying out error estimates can be done via functionality provided in the data_analysis module.