Data containers

DataContainer

class mchammer.DataContainer(structure, ensemble_parameters, metadata={})[source]

Data container for storing information concerned with Monte Carlo simulations performed with mchammer.

Parameters:
  • structure (Atoms) – Reference atomic structure associated with the data container.

  • ensemble_parameters (dict) – Parameters associated with the underlying ensemble.

  • metadata (dict) – Metadata associated with the data container.

analyze_data(tag, start=None, max_lag=None)[source]

Returns detailed analysis of a scalar observerable.

Parameters:
  • tag (str) – Tag of field over which to average.

  • start (Optional[int]) – Minimum value of trial step to consider. By default the smallest value in the mctrial column will be used.

  • max_lag (Optional[int]) – Maximum lag between two points in data series. By default the largest length of the data series will be used. Used for computing autocorrelation.

Raises:
  • ValueError – If observable is requested that is not in data container.

  • ValueError – If observable is not scalar.

  • ValueError – If observations is not evenly spaced.

Returns:

Calculated properties of the data including mean, standard_deviation, correlation_length and error_estimate (95% confidence).

Return type:

dict

append(mctrial, record)

Appends data to data container.

Parameters:
  • mctrial (int) – Current Monte Carlo trial step.

  • record (Dict[str, Union[int, float, list]]) – Dictionary of tag-value pairs representing observations.

Raises:

TypeError – If input parameters have the wrong type.

apply_observer(observer)

Adds observer data from observer to data container.

The observer will only be run for the mctrials for which the trajectory have been saved.

The interval of the observer is ignored.

Parameters:

observer (BaseObserver) – Observer to be used.

Return type:

None

property data: DataFrame

Data as pandas.DataFrame.

property ensemble_parameters: dict

Parameters associated with Monte Carlo simulation.

get(*tags, start=0)

Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including 'trajectory' as one of the tags.

Parameters:
  • tags (str) – Names of the requested properties.

  • start (int) – Minimum value of trial step to consider. By default the smallest value in the mctrial column will be used.

Raises:
  • ValueError – If tags is empty.

  • ValueError – If observables are requested that are not in data container.

Return type:

Union[ndarray, List[Atoms], Tuple[ndarray, List[Atoms]]]

Examples

Below the get() method is illustrated but first we require a data container.

>>> from ase.build import bulk
>>> from icet import ClusterExpansion, ClusterSpace
>>> from mchammer.calculators import ClusterExpansionCalculator
>>> from mchammer.ensembles import CanonicalEnsemble
>>> # prepare cluster expansion
>>> prim = bulk('Au')
>>> cs = ClusterSpace(prim, cutoffs=[4.3], chemical_symbols=['Ag', 'Au'])
>>> ce = ClusterExpansion(cs, [0, 0, 0.1, -0.02])
>>> # prepare initial configuration
>>> structure = prim.repeat(3)
>>> for k in range(5):
...     structure[k].symbol = 'Ag'
>>> # set up and run MC simulation
>>> calc = ClusterExpansionCalculator(structure, ce)
>>> mc = CanonicalEnsemble(structure=structure, calculator=calc,
...                        temperature=600,
...                        dc_filename='myrun_canonical.dc')
>>> mc.run(100)  # carry out 100 trial swaps

We can now access the data container by reading it from file by using the read() method. For the purpose of this example, however, we access the data container associated with the ensemble directly.

>>> dc = mc.data_container

The following lines illustrate how to use the get() method for extracting data from the data container.

>>> # obtain all values of the potential represented by
>>> # the cluster expansion along the trajectory
>>> p = dc.get('potential')
>>> import matplotlib.pyplot as plt
>>> # as above but this time the MC trial step is included as well
>>> s, p = dc.get('mctrial', 'potential')
>>> _ = plt.plot(s, p)
>>> plt.show(block=False)
>>> # obtain configurations along the trajectory along with
>>> # their potential
>>> p, confs = dc.get('potential', 'trajectory')
get_average(tag, start=None)[source]

Returns average of a scalar observable.

Parameters:
  • tag (str) – Tag of field over which to average.

  • start (Optional[int]) – Minimum value of trial step to consider. By default the smallest value in the mctrial column will be used.

Raises:
  • ValueError – If observable is requested that is not in data container.

  • ValueError – If observable is not scalar.

Return type:

float

get_trajectory(*args, **kwargs)

Returns trajectory as a list of ASE Atoms objects.

Return type:

List[Atoms]

property metadata: dict

Metadata associated with data container.

property observables: List[str]

Observable names.

classmethod read(infile, old_format=False)

Reads data container from file.

Parameters:
  • infile (Union[str, BinaryIO, TextIO]) – File from which to read.

  • old_format (bool) – If True use old json format to read runtime data.

Raises:
  • FileNotFoundError – If infile is not found.

  • ValueError – If file is of incorrect type (not a tarball).

write(outfile)

Writes data container to file.

Parameters:

outfile (Union[bytes, str]) – File to which to write.

WangLandauDataContainer

class mchammer.WangLandauDataContainer(structure, ensemble_parameters, metadata={})[source]

Data container for storing information concerned with Wang-Landau simulation performed with mchammer.

Parameters:
  • structure (Atoms) – Reference atomic structure associated with the data container.

  • ensemble_parameters (dict) – Parameters associated with the underlying ensemble.

  • metadata (dict) – Metadata associated with the data container.

append(mctrial, record)

Appends data to data container.

Parameters:
  • mctrial (int) – Current Monte Carlo trial step.

  • record (Dict[str, Union[int, float, list]]) – Dictionary of tag-value pairs representing observations.

Raises:

TypeError – If input parameters have the wrong type.

apply_observer(observer)

Adds observer data from observer to data container.

The observer will only be run for the mctrials for which the trajectory have been saved.

The interval of the observer is ignored.

Parameters:

observer (BaseObserver) – Observer to be used.

Return type:

None

property data: DataFrame

Data as pandas.DataFrame.

property ensemble_parameters: dict

Parameters associated with Monte Carlo simulation.

property fill_factor: float

Final value of the fill factor in the Wang-Landau algorithm.

property fill_factor_history: DataFrame

Evolution of the fill factor in the Wang-Landau algorithm.

get(*tags, fill_factor_limit=None)[source]

Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including 'trajectory' as one of the tags.

Parameters:
  • tags (str) – Names of the requested properties.

  • fill_factor_limit (Optional[float]) – Return data recorded up to the point when the specified fill factor limit was reached, or None if the entropy history is empty or the last fill factor is above the limit; otherwise return all data.

Raises:
  • ValueError – If tags is empty.

  • ValueError – If observables are requested that are not in the data container.

Return type:

Union[ndarray, List[Atoms], Tuple[ndarray, List[Atoms]]]

Examples

Below the get() method is illustrated but first we require a data container.

>>> from ase import Atoms
>>> from icet import ClusterExpansion, ClusterSpace
>>> from mchammer.calculators import ClusterExpansionCalculator
>>> from mchammer.ensembles import WangLandauEnsemble
>>> # prepare cluster expansion
>>> prim = Atoms('Au', positions=[[0, 0, 0]], cell=[1, 1, 10], pbc=True)
>>> cs = ClusterSpace(prim, cutoffs=[1.1], chemical_symbols=['Ag', 'Au'])
>>> ce = ClusterExpansion(cs, [0, 0, 2])
>>> # prepare initial configuration
>>> structure = prim.repeat((4, 4, 1))
>>> for k in range(8):
...     structure[k].symbol = 'Ag'
>>> # set up and run Wang-Landau simulation
>>> calculator = ClusterExpansionCalculator(structure, ce)
>>> mc = WangLandauEnsemble(structure=structure,
...                         calculator=calculator,
...                         energy_spacing=1,
...                         dc_filename='ising_2d_run.dc',
...                         fill_factor_limit=0.3)
>>> mc.run(number_of_trial_steps=len(structure)*3000)  # in practice one requires more steps

We can now access the data container by reading it from file by using the read() method. For the purpose of this example, however, we access the data container associated with the ensemble directly.

>>> dc = mc.data_container

The following lines illustrate how to use the get() method for extracting data from the data container.

>>> # obtain all values of the potential represented by
>>> # the cluster expansion and the MC trial step along the
>>> # trajectory
>>> import matplotlib.pyplot as plt
>>> s, p = dc.get('mctrial', 'potential')
>>> _ = plt.plot(s, p)
>>> # as above but this time only included data recorded up to
>>> # the point when the fill factor reached below 0.6
>>> s, p = dc.get('mctrial', 'potential', fill_factor_limit=0.6)
>>> _ = plt.plot(s, p)
>>> plt.show(block=False)
>>> # obtain configurations along the trajectory along with
>>> # their potential
>>> p, confs = dc.get('potential', 'trajectory')
get_entropy(fill_factor_limit=None)[source]

Returns the (relative) entropy from this data container accumulated during a Wang-Landau simulation. Returns None if the data container does not contain the required information.

Parameters:

fill_factor_limit (Optional[float]) – Return the entropy recorded up to the point when the specified fill factor limit was reached, or None if the entropy history is empty or the last fill factor is above the limit. Otherwise return the entropy for the last state.

Return type:

DataFrame

get_histogram()[source]

Returns the histogram from this data container accumulated since the last update of the fill factor. Returns None if the data container does not contain the required information.

Return type:

DataFrame

get_trajectory(*args, **kwargs)

Returns trajectory as a list of ASE Atoms objects.

Return type:

List[Atoms]

property metadata: dict

Metadata associated with data container.

property observables: List[str]

Observable names.

classmethod read(infile, old_format=False)[source]

Reads data container from file.

Parameters:
  • infile (Union[str, BinaryIO, TextIO]) – file from which to read

  • old_format (bool) – If true use old json format to read runtime data; default to false

Raises:
  • FileNotFoundError – if file is not found (str)

  • ValueError – if file is of incorrect type (not a tarball)

write(outfile)

Writes data container to file.

Parameters:

outfile (Union[bytes, str]) – File to which to write.

Analysis functions

mchammer.data_containers.get_average_observables_wl(dcs, temperatures, observables=None, boltzmann_constant=8.617330337217213e-05, fill_factor_limit=None)[source]

Returns the average and the standard deviation of the energy from a Wang-Landau simulation for the temperatures specified. If the observables keyword argument is specified the function will also return the mean and standard deviation of the specified observables.

Parameters:
  • dcs (Union[WangLandauDataContainer, Dict[Any, WangLandauDataContainer]]) – Data container(s) from which to extract density of states as well as observables.

  • temperatures (List[float]) – Temperatures at which to compute the averages.

  • observables (Optional[List[str]]) – Observables for which to compute averages; the observables must refer to fields in the data container.

  • boltzmann_constant (float) – Boltzmann constant \(k_B\) in appropriate units, i.e., units that are consistent with the underlying cluster expansion and the temperature units [default: eV/K].

  • fill_factor_limit (Optional[float]) – Use data recorded up to the point when the specified fill factor limit was reached when computing averages. Otherwise use data for the last state.

Raises:
  • ValueError – If the data container(s) do(es) not contain entropy data from Wang-Landau simulation.

  • ValueError – If data container(s) do(es) not contain requested observable.

Return type:

DataFrame

mchammer.data_containers.get_average_cluster_vectors_wl(dcs, cluster_space, temperatures, boltzmann_constant=8.617330337217213e-05, fill_factor_limit=None)[source]

Returns the average cluster vectors from a Wang-Landau simulation for the temperatures specified.

Parameters:
  • dcs (Union[WangLandauDataContainer, dict]) – Data container(s), from which to extract density of states as well as observables.

  • cluster_space (ClusterSpace) – Cluster space to use for calculation of cluster vectors.

  • temperatures (List[float]) – Temperatures at which to compute the averages.

  • boltzmann_constant (float) – Boltzmann constant \(k_B\) in appropriate units, i.e., units that are consistent with the underlying cluster expansion and the temperature units [default: eV/K].

  • fill_factor_limit (Optional[float]) – Use data recorded up to the point when the specified fill factor limit was reached when computing the average cluster vectors. Otherwise use data for the last state.

Raises:

ValueError – If the data container(s) do(es) not contain entropy data from Wang-Landau simulation.

Return type:

DataFrame

mchammer.data_containers.get_density_of_states_wl(dcs, fill_factor_limit=None)[source]

Returns a pandas DataFrame with the total density of states from a Wang-Landau simulation. If a dict of data containers is provided the function also returns a dictionary that contains the standard deviation between the entropy of neighboring data containers in the overlap region. These errors should be small compared to the variation of the entropy across each bin.

The function can handle both a single data container and a dict thereof. In the latter case the data containers must cover a contiguous energy range and must at least partially overlap.

Parameters:
  • dcs (Union[WangLandauDataContainer, Dict[Any, WangLandauDataContainer]]) – Data container(s), from which to extract the density of states.

  • fill_factor_limit (Optional[float]) – Calculate the density of states using the entropy recorded up to the point when the specified fill factor limit was reached. Otherwise return the density of states for the last state.

Raises:
  • TypeError – If dcs does not correspond to not a single (dictionary) of data container(s) from which the entropy can retrieved.

  • ValueError – If the data container does not contain entropy information.

  • ValueError – If a fill factor limit has been provided and the data container either does not contain information about the entropy history or if the last fill factor is higher than the specified limit.

  • ValueError – If multiple data containers are provided and there are inconsistencies with regard to basic simulation parameters such as system size or energy spacing.

  • ValueError – If multiple data containers are provided and there is at least one energy region without overlap.

Return type:

Tuple[DataFrame, dict]

mchammer.free_energy_tools.get_free_energy_thermodynamic_integration(dc, cluster_space, forward, max_temperature=inf, sublattice_probabilities=None, boltzmann_constant=8.617330337217213e-05)[source]

Returns the free energy calculated via thermodynamic integration using the ThermodynamicIntegrationEnsemble.

The temperature dependence of the free energy can be extracted from the thermodynamic integration as

\[F(T) = \frac{F_0(\lambda)}{\lambda} + \frac{T_0}{\lambda} S_\text{B}\]

where \(S_\text{B}\) is the Boltzmann entropy,

\[T = \frac{T_0}{\lambda}\]

and

\[F_0(\lambda) = \int_0^\lambda \left\langle\frac{\mathrm{d}H(\lambda)} {\mathrm{d}\lambda}\right\rangle_{H} \mathrm{d}\lambda\]
Parameters:
  • dc (DataContainer) – Data container from the thermodynamic integration simulation.

  • cluster_space (ClusterSpace) – Cluster space used to construct the cluster expansion used for the simulation.

  • forward (bool) – Whether or not the thermodynamic integration was carried out forward or backward.

  • max_temperature (float) – Largest temperature to extract from the thermodynamic integration.

  • sublattice_probababilites – Sublattice probabilties that were provided to the thermodynamic integration simulation.

  • boltzmann_constant (float) – Boltzmann constant in the units used for the thermodynamic integration simulation.

Return type:

(ndarray, ndarray)

mchammer.free_energy_tools.get_free_energy_temperature_integration(dc, cluster_space, forward, temperature_reference, free_energy_reference=None, sublattice_probabilities=None, max_temperature=inf, boltzmann_constant=8.617330337217213e-05)[source]

Returns the free energy calculated using temperature integration and the corresponding temperature.

\[\frac{A(T_{2})}{T_{2}} = \frac{A(T_{1})}{T_{1}} - \int_{T_{1}}^{T_{2}}\frac{U(T)}{T^2}\mathrm{d}T\]
Parameters:
  • dc (DataContainer) – Data container from a canonical annealing simulation. The first (last for forward=False) temperature in the data container has to be at the same temperature as temperature_reference.

  • cluster_space (ClusterSpace) – Cluster space used to construct the cluster expansion that was used in the simulations.

  • forward (bool) – If True the canonical annealing simulation was carried out from high to low temperature, otherwise the opposite is assumed.

  • temperature_reference (float) – Temperature at which free_energy_reference was calculated

  • free_energy_reference (Optional[float]) – Reference free energy. If set to None it will be assumeed that the free energy at temperature_reference can be approximated by \(T S_B\) where \(S_B\) is the ideal mixing entropy.

  • sublattice_probababilites – Sublattice probabilties that were provided to the canonical annealing simulation.

  • max_temperature (float) – Largest temperature to extract from the temperature integration.

  • boltzmann_constant (float) – Boltzmann constant in the units used for the thermodynamic integration simulation.

Return type:

(ndarray, ndarray)

mchammer.data_analysis.analyze_data(data, max_lag=None)[source]

Carries out an extensive analysis of the data series and returns a dictionary containing the mean, standard deviation, correlation length and a 95% error estimate.

Parameters:
  • data (ndarray) – Data series for which to compute autocorrelation function.

  • max_lag (Optional[int]) – Maximum lag between two data points used for computing autocorrelation.

Return type:

dict

mchammer.data_analysis.get_autocorrelation_function(data, max_lag=None)[source]

Returns autocorrelation function.

The autocorrelation function is computed using pandas.Series.autocorr.

Parameters:
  • data (ndarray) – Data series for which to compute autocorrelation function.

  • max_lag (Optional[int]) – Maximum lag between two data points.

Return type:

ndarray

mchammer.data_analysis.get_correlation_length(data)[source]

Returns estimate of the correlation length of data.

The correlation length is taken as the first point where the autocorrelation functions is less than \(\exp(-2)\). If the correlation function never drops below \(\exp(-2)\) np.nan is returned.

If the correlation length cannot be computed since the autocorrelation function is unconverged the function returns None.

Parameters:

data (ndarray) – Data series for which to the compute autocorrelation function.

Return type:

Optional[int]

mchammer.data_analysis.get_error_estimate(data, confidence=0.95)[source]

Returns estimate of standard error \(\mathrm{error}\) with confidence interval via

\[\mathrm{error} = t_\mathrm{factor} * \mathrm{std}(\mathrm{data}) / \sqrt{N_s}\]

where \(t_\mathrm{factor}\) is the factor corresponding to the confidence interval and \(N_s\) is the number of independent measurements (with correlation taken into account).

If the correlation length cannot be computed since the autocorrelation function is unconverged the function returns None.

Parameters:

data (ndarray) – Eata series for which to estimate the error.

Return type:

Optional[float]