Data container

The results of a MC simulation are stored in the form of a DataContainer object, which can be accessed via the data_container property of the MC ensemble. If a file name is provided during ensemble initialization via the data_container parameter the data container is also written to file. The latter can then be easily read at a later time via the read function of the DataContainer.

The DataContainer class provides ample functionality for processing data and extracting various observables that are briefly introduced in this section.

Extracting data

The raw data as a function of MC trial step can be obtained via the get_data function, which also allows slicing data by specifying an initial and final MC step. This is useful e.g., for discarding the equilibration part of a simulation:

energy = dc.get_data('potential', start=5000)

The get_data function also allows extracting several observables in parallel:

mctrial, energy, sro = dc.get_data('mctrial', potential', 'sro_Ag_1')

The available observables can be checked using the observables attribute.

Extracting trajectory

The atomic configuration can be extracted using the get_trajectory

traj = dc.get_trajectory()

Alternatively, the trajectory can be obtained via the get_data function, which also allows for pairing the snapshots in the trajectory with observables in the data container.

E_mix, traj = dc.get_trajectory('potential', 'trajectory')

Updating data container

Normally observers are attached to an ensemble at the beginning of an MC simulation via the attach_observer function. They can, however, also be applied after the fact via the apply_observer function, provided the trajectory is available via a DataContainer object.

obs = ClusterExpansionObserver(ce, tag='new_obs')
dc = DataContainer.read('my_dc.dc')
dc.apply_observer(obs)
new_obs_data = dc.get_data('')

Afterwards the data container, including the new data, can be written back to file using the write function.

Data analysis

Data containers also allow more detailed analysis. The analyze_data function computes average, standard deviation, correlation length, and 95% error estimate of the average for a given observable.

summary = dc.analyze_data('potential')
print(summary)

Here, the correlation length, \(s\), is estimated from the autocorrelation function (ACF). When the ACF has decayed below \(\mathrm{e^{-2}}\) observations are said to be uncorrelated, providing an estimate of the correlation length.

../_images/autocorrelation.svg

An error estimate of the average can be calculated via

\[\mathrm{error} = \frac{t \sigma }{\sqrt{Ns}},\]

where \(\sigma\) is the standard deviation, \(N\) the number of samples, \(s\) the correlation length and \(t\) is the t-factor, which can be adjusted depending on the desired confidence interval.

Obtaining the autocorrelation function directly or carrying out error estimates can be done via functionality provided in the data_analysis module.