# Data container¶

The results of a MC simulation are stored in the form of a data
container object. Most ensembles use `DataContainer`

objects; Wang-Landau simulations though employ
`WangLandauDataContainer`

objects.
For simplicity the examples below are based on the `DataContainer`

class. The use of the
`WangLandauDataContainer`

objects is
largely analogous. In addition, there are several additional methods that are
specific to Wang-Landau simulations.

## Accessing a data container¶

The data container be accessed via the `data_container`

property of the
ensemble:

```
>>> dc = mc.data_container # here mc is DataContainer
```

More commonly the data container is written to file during the simulation and
can then be read from file for analysis. (N.B.: To trigger saving the data
container a filename has to be provided when initializing the ensemble.) The
data container can be read via the `read`

function, e.g., (assuming the name of data container file in my_test.dc)

```
>>> from mchammer import DataContainer
>>> dc = DataContainer.read('my_test.dc')
```

The `DataContainer`

class provides ample
functionality for processing data and extracting various observables that are
briefly introduced below.

## Extracting data¶

The raw data as a function of MC trial step can be obtained via the `get`

function, which also allows slicing data by
specifying an initial MC step. This is useful e.g., for discarding the
equilibration part of a simulation. In the following snippet we retrieve all
observations of potential starting with the 100-th trial step:

```
>>> energy = dc.get('potential', start=100)
```

The `get`

function also allows extracting
several observables in parallel. Which observables are available, can be
checked using the `observables`

attribute:

```
>>> print(sorted(dc.observables))
['acceptance_ratio', 'occupations', 'potential', 'sof_A_Ag', 'sof_A_Au']
```

The mctrial, potential, and trajectory observables are available by default. potential refers the thermodynamic potential sampled by the trajectory (usually defined by the cluster expansion to run the simulation). trajectory refers to the atomic configurations along the trajectory.

Assume, e.g., that the original simulation
was carried out with a
`SiteOccupancyObserver`

,
then site occupancy of the sites labeled ‘A’ with Ag could be retrieved as
follows:

```
>>> mctrial, energy, sro = dc.get('mctrial', 'potential', 'sof_A_Ag')
```

This enables one to plot observables as a function of the MC trial as demonstrated by the following snippet:

```
>>> import matplotlib.pyplot as plt
>>> s, p = dc.get('mctrial', 'potential')
>>> _ = plt.plot(s, p)
>>> plt.show(block=False)
```

The atomic configurations along the trajectory can be retrieved as a list of
`Atoms`

objects using the trajectory observable.

```
>>> traj = dc.get('trajectory')
```

This also allows for pairing the snapshots in the trajectory with observables in the data container.

```
>>> E_mix, traj = dc.get('potential', 'trajectory')
```

## Updating data container¶

Normally observers are attached to an ensemble at the
beginning of an MC simulation via the `attach_observer`

function. They can,
however, also be applied after the fact via the `apply_observer`

function, provided the trajectory is
available via a `DataContainer`

object.

```
>>> from mchammer.observers import BinaryShortRangeOrderObserver
>>> obs = BinaryShortRangeOrderObserver(cs, structure, radius=1.1)
>>> dc.apply_observer(obs)
>>> s, sro = dc.get('mctrial', 'sro_Ag_1')
>>> _ = plt.plot(s, sro)
>>> plt.show(block=False)
```

Afterwards the data container, including the new data, can be written back to
file using the `write`

function.

## Data analysis¶

Data containers also allow more detailed analysis. The `analyze_data`

function computes average, standard
deviation, correlation length, and 95% error estimate of the average for a
given observable.

```
>>> summary = dc.analyze_data('potential')
```

Here, the correlation length, \(s\), is estimated from the autocorrelation function (ACF). When the ACF has decayed below \(\mathrm{e^{-2}}\) observations are said to be uncorrelated, providing an estimate of the correlation length.

An error estimate of the average can be calculated via

where \(\sigma\) is the standard deviation, \(N\) the number of samples, \(s\) the correlation length and \(t\) is the t-factor, which can be adjusted depending on the desired confidence interval.

Obtaining the autocorrelation function directly or carrying out error estimates can be done via functionality provided in the data_analysis module.