# Data container¶

The results of a MC simulation are stored in the form of a DataContainer object, which can be accessed via the data_container property of the MC ensemble. If a file name is provided during ensemble initialization via the data_container parameter the data container is also written to file. The latter can then be easily read at a later time via the read function of the DataContainer.

The DataContainer class provides ample functionality for processing data and extracting various observables that are briefly introduced in this section.

## Extracting data¶

The raw data as a function of MC trial step can be obtained via the get_data function, which also allows slicing data by specifying an initial and final MC step. This is useful e.g., for discarding the equilibration part of a simulation:

energy = dc.get_data('potential', start=5000)


The get_data function also allows extracting several observables in parallel:

mctrial, energy, sro = dc.get_data('mctrial', 'potential', 'sro_Ag_1')


The available observables can be checked using the observables attribute.

## Extracting trajectory¶

The atomic configuration can be extracted using the get_trajectory

traj = dc.get_trajectory()


Alternatively, the trajectory can be obtained via the get_data function, which also allows for pairing the snapshots in the trajectory with observables in the data container.

E_mix, traj = dc.get_data('potential', 'trajectory')


## Updating data container¶

Normally observers are attached to an ensemble at the beginning of an MC simulation via the attach_observer function. They can, however, also be applied after the fact via the apply_observer function, provided the trajectory is available via a DataContainer object.

obs = ClusterExpansionObserver(ce, tag='new_obs')
dc.apply_observer(obs)
new_obs_data = dc.get_data('')


Afterwards the data container, including the new data, can be written back to file using the write function.

## Data analysis¶

Data containers also allow more detailed analysis. The analyze_data function computes average, standard deviation, correlation length, and 95% error estimate of the average for a given observable.

summary = dc.analyze_data('potential')
print(summary)


Here, the correlation length, $$s$$, is estimated from the autocorrelation function (ACF). When the ACF has decayed below $$\mathrm{e^{-2}}$$ observations are said to be uncorrelated, providing an estimate of the correlation length.

An error estimate of the average can be calculated via

$\mathrm{error} = \frac{t \sigma }{\sqrt{Ns}},$

where $$\sigma$$ is the standard deviation, $$N$$ the number of samples, $$s$$ the correlation length and $$t$$ is the t-factor, which can be adjusted depending on the desired confidence interval.

Obtaining the autocorrelation function directly or carrying out error estimates can be done via functionality provided in the data_analysis module.