Data containers¶
DataContainer¶

class
mchammer.
DataContainer
(structure, ensemble_parameters, metadata={})[source]¶ Data container for storing information concerned with Monte Carlo simulations performed with mchammer.
 Parameters
structure (ase.Atoms) – reference atomic structure associated with the data container
ensemble_parameters (dict) – parameters associated with the underlying ensemble
metadata (dict) – metadata associated with the data container

analyze_data
(tag, start=None, max_lag=None)[source]¶ Returns detailed analysis of a scalar observerable.
 Parameters
tag (
str
) – tag of field over which to averagestart (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.max_lag (
Optional
[int
]) – maximum lag between two points in data series, by default the largest length of the data series will be used. Used for computing autocorrelation
 Raises
ValueError – if observable is requested that is not in data container
ValueError – if observable is not scalar
ValueError – if observations is not evenly spaced
 Returns
calculated properties of the data including mean, standard_deviation, correlation_length and error_estimate (95% confidence)
 Return type
dict

append
(mctrial, record)¶ Appends data to data container.
 Parameters
mctrial (
int
) – current Monte Carlo trial steprecord (
Dict
[str
,Union
[int
,float
,list
]]) – dictionary of tagvalue pairs representing observations
 Raises
TypeError – if input parameters have the wrong type

apply_observer
(observer)¶ Adds observer data from observer to data container.
The observer will only be run for the mctrials for which the trajectory have been saved.
The interval of the observer is ignored.
 Parameters
observer (
BaseObserver
) – observer to be used

property
data
¶ pandas data frame (see
pandas.DataFrame
) Return type
DataFrame

property
ensemble_parameters
¶ parameters associated with Monte Carlo simulation
 Return type
dict

get
(*input_tags, start=0)¶ Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including ‘trajectory’ as a tag.
 Parameters
tags – tuples of the requested properties
start (
int
) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
 Raises
ValueError – if tags is empty
ValueError – if observables are requested that are not in data container
Examples
Below the get method is illustrated but first we require a data container.
>>> from ase.build import bulk >>> from icet import ClusterExpansion, ClusterSpace >>> from mchammer.calculators import ClusterExpansionCalculator >>> from mchammer.ensembles import CanonicalEnsemble
>>> # prepare cluster expansion >>> prim = bulk('Au') >>> cs = ClusterSpace(prim, cutoffs=[4.3], chemical_symbols=['Ag', 'Au']) >>> ce = ClusterExpansion(cs, [0, 0, 0.1, 0.02])
>>> # prepare initial configuration >>> structure = prim.repeat(3) >>> for k in range(5): ... structure[k].symbol = 'Ag'
>>> # set up and run MC simulation >>> calc = ClusterExpansionCalculator(structure, ce) >>> mc = CanonicalEnsemble(structure=structure, calculator=calc, ... temperature=600, ... dc_filename='myrun_canonical.dc') >>> mc.run(100) # carry out 100 trial swaps
We can now access the data container by reading it from file by using the read method. For the purpose of this example, however, we access the data container associated with the ensemble directly.
>>> dc = mc.data_container
The following lines illustrate how to use the get method for extracting data from the data container.
>>> # obtain all values of the potential represented by >>> # the cluster expansion along the trajectory >>> p = dc.get('potential')
>>> import matplotlib.pyplot as plt >>> # as above but this time the MC trial step and the temperature >>> # are included as well >>> s, p = dc.get('mctrial', 'potential') >>> _ = plt.plot(s, p) >>> plt.show()
>>> # obtain configurations along the trajectory along with >>> # their potential >>> p, confs = dc.get('potential', 'trajectory')

get_average
(tag, start=None)[source]¶ Returns average of a scalar observable.
 Parameters
tag (
str
) – tag of field over which to averagestart (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
 Raises
ValueError – if observable is requested that is not in data container
ValueError – if observable is not scalar
 Return type
float

get_trajectory
(*args, **kwargs)¶ Returns trajectory as a list of ASE Atoms objects.

property
metadata
¶ metadata associated with data container
 Return type
dict

property
observables
¶ observable names
 Return type
List
[str
]

classmethod
read
(infile, old_format=False)¶ Reads data container from file.
 Parameters
infile (
Union
[str
,BinaryIO
,TextIO
]) – file from which to readold_format (
bool
) – If true use old json format to read runtime data; default to false
 Raises
FileNotFoundError – if file is not found (str)
ValueError – if file is of incorrect type (not a tarball)

write
(outfile)¶ Writes BaseDataContainer object to file.
 Parameters
outfile (
Union
[str
,BinaryIO
,TextIO
]) – file to which to write
WangLandauDataContainer¶

class
mchammer.
WangLandauDataContainer
(structure, ensemble_parameters, metadata={})[source]¶ Data container for storing information concerned with WangLandau simulation performed with mchammer.
 Parameters
structure (ase.Atoms) – reference atomic structure associated with the data container
ensemble_parameters (dict) – parameters associated with the underlying ensemble
metadata (dict) – metadata associated with the data container

append
(mctrial, record)¶ Appends data to data container.
 Parameters
mctrial (
int
) – current Monte Carlo trial steprecord (
Dict
[str
,Union
[int
,float
,list
]]) – dictionary of tagvalue pairs representing observations
 Raises
TypeError – if input parameters have the wrong type

apply_observer
(observer)¶ Adds observer data from observer to data container.
The observer will only be run for the mctrials for which the trajectory have been saved.
The interval of the observer is ignored.
 Parameters
observer (
BaseObserver
) – observer to be used

property
data
¶ pandas data frame (see
pandas.DataFrame
) Return type
DataFrame

property
ensemble_parameters
¶ parameters associated with Monte Carlo simulation
 Return type
dict

property
fill_factor
¶ final value of the fill factor in the WangLandau algorithm
 Return type
float

property
fill_factor_history
¶ evolution of the fill factor in the WangLandau algorithm
 Return type
DataFrame

get
(*input_tags, start=0)¶ Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including ‘trajectory’ as a tag.
 Parameters
tags – tuples of the requested properties
start (
int
) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
 Raises
ValueError – if tags is empty
ValueError – if observables are requested that are not in data container
Examples
Below the get method is illustrated but first we require a data container.
>>> from ase.build import bulk >>> from icet import ClusterExpansion, ClusterSpace >>> from mchammer.calculators import ClusterExpansionCalculator >>> from mchammer.ensembles import CanonicalEnsemble
>>> # prepare cluster expansion >>> prim = bulk('Au') >>> cs = ClusterSpace(prim, cutoffs=[4.3], chemical_symbols=['Ag', 'Au']) >>> ce = ClusterExpansion(cs, [0, 0, 0.1, 0.02])
>>> # prepare initial configuration >>> structure = prim.repeat(3) >>> for k in range(5): ... structure[k].symbol = 'Ag'
>>> # set up and run MC simulation >>> calc = ClusterExpansionCalculator(structure, ce) >>> mc = CanonicalEnsemble(structure=structure, calculator=calc, ... temperature=600, ... dc_filename='myrun_canonical.dc') >>> mc.run(100) # carry out 100 trial swaps
We can now access the data container by reading it from file by using the read method. For the purpose of this example, however, we access the data container associated with the ensemble directly.
>>> dc = mc.data_container
The following lines illustrate how to use the get method for extracting data from the data container.
>>> # obtain all values of the potential represented by >>> # the cluster expansion along the trajectory >>> p = dc.get('potential')
>>> import matplotlib.pyplot as plt >>> # as above but this time the MC trial step and the temperature >>> # are included as well >>> s, p = dc.get('mctrial', 'potential') >>> _ = plt.plot(s, p) >>> plt.show()
>>> # obtain configurations along the trajectory along with >>> # their potential >>> p, confs = dc.get('potential', 'trajectory')

get_entropy
()[source]¶ Returns the (relative) entropy from this data container accumulated during a WangLandau simulation. Returns
None
if the data container does not contain the required information. Return type
DataFrame

get_histogram
()[source]¶ Returns the histogram from this data container accumulated since the last update of the fill factor. Returns
None
if the data container does not contain the required information. Return type
DataFrame

get_trajectory
(*args, **kwargs)¶ Returns trajectory as a list of ASE Atoms objects.

property
metadata
¶ metadata associated with data container
 Return type
dict

property
observables
¶ observable names
 Return type
List
[str
]

classmethod
read
(infile, old_format=False)¶ Reads data container from file.
 Parameters
infile (
Union
[str
,BinaryIO
,TextIO
]) – file from which to readold_format (
bool
) – If true use old json format to read runtime data; default to false
 Raises
FileNotFoundError – if file is not found (str)
ValueError – if file is of incorrect type (not a tarball)

write
(outfile)¶ Writes BaseDataContainer object to file.
 Parameters
outfile (
Union
[str
,BinaryIO
,TextIO
]) – file to which to write
Analysis functions¶

mchammer.data_containers.
get_average_observables_wl
(dcs, temperatures, observables=None, boltzmann_constant=8.617330337217213e05)[source]¶ Returns the average and the standard deviation of the energy from a WangLandau simulation for the temperatures specified. If the
observables
keyword argument is specified the function will also return the mean and standard deviation of the specified observables. Parameters
dcs (
Union
[BaseDataContainer
,dict
]) – data container(s), from which to extract density of states as well as observablestemperatures (
List
[float
]) – temperatures, at which to compute the averagesobservables (
Optional
[List
[str
]]) – observables, for which to compute averages; the observables must refer to fields in the data containerboltzmann_constant (
float
) – Boltzmann constant \(k_B\) in appropriate units, i.e. units that are consistent with the underlying cluster expansion and the temperature units [default: eV/K]
 Raises
ValueError – if the data container(s) do(es) not contain entropy data from WangLandau simulation
ValueError – if data container(s) do(es) not contain requested observable
 Return type
DataFrame

mchammer.data_containers.
get_average_cluster_vectors_wl
(dcs, cluster_space, temperatures, boltzmann_constant=8.617330337217213e05)[source]¶ Returns the average cluster vectors from a WangLandau simulation for the temperatures specified.
 Parameters
dcs (
Union
[BaseDataContainer
,dict
]) – data container(s), from which to extract density of states as well as observablescluster_space (
ClusterSpace
) – cluster space to use for calculation of cluster vectorstemperatures (
List
[float
]) – temperatures, at which to compute the averagesboltzmann_constant (
float
) – Boltzmann constant \(k_B\) in appropriate units, i.e. units that are consistent with the underlying cluster expansion and the temperature units [default: eV/K]
 Return type
DataFrame

mchammer.data_containers.
get_density_of_states_wl
(dcs)[source]¶ Returns a pandas DataFrame with the total density of states from a WangLandau simulation. If a dict of data containers is provided the function also returns a dictionary that contains the standard deviation between the entropy of neighboring data containers in the overlap region. These errors should be small compared to the variation of the entropy across each bin.
The function can handle both a single data container and a dict thereof. In the latter case the data containers must cover a contiguous energy range and must at least partially overlap.
 Parameters
dcs (
Union
[BaseDataContainer
,dict
]) – data container(s), from which to extract the density of states Raises
ValueError – if multiple data containers are provided and there are inconsistencies with regard to basic simulation parameters such as system size or energy spacing
ValueError – if multiple data containers are provided and there is at least one energy region without overlap
 Return type
Tuple
[DataFrame
,dict
]

mchammer.data_analysis.
analyze_data
(data, max_lag=None)[source]¶ Carries out an extensive analysis of the data series.
 Parameters
data (
ndarray
) – data series to compute autocorrelation function formax_lag (
Optional
[int
]) – maximum lag between two data points, used for computing autocorrelation
 Returns
calculated properties of the data including, mean, standard deviation, correlation length and a 95% error estimate.
 Return type
dict

mchammer.data_analysis.
get_autocorrelation_function
(data, max_lag=None)[source]¶ Returns autocorrelation function.
The autocorrelation function is computed using pandas.Series.autocorr.
 Parameters
data (
ndarray
) – data series to compute autocorrelation function formax_lag (
Optional
[int
]) – maximum lag between two data points
 Returns
 Return type
calculated autocorrelation function

mchammer.data_analysis.
get_correlation_length
(data)[source]¶ Returns estimate of the correlation length of data.
The correlation length is taken as the first point where the autocorrelation functions is less than \(\exp(2)\). If the correlation function never drops below \(\exp(2)\)
np.nan
is returned. Parameters
data (
ndarray
) – data series for which to the compute autocorrelation function Returns
 Return type
correlation length

mchammer.data_analysis.
get_error_estimate
(data, confidence=0.95)[source]¶ Returns estimate of standard error \(\mathrm{error}\) with confidence interval.
\[\mathrm{error} = t_\mathrm{factor} * \mathrm{std}(\mathrm{data}) / \sqrt{N_s}\]where \(t_{factor}\) is the factor corresponding to the confidence interval and \(N_s\) is the number of independent measurements (with correlation taken into account).
 Parameters
data (
ndarray
) – data series for which to estimate the error Returns
 Return type
error estimate