# Data container¶

class mchammer.DataContainer(structure, ensemble_parameters, metadata={})[source]

Data container for storing information concerned with Monte Carlo simulations performed with mchammer.

Parameters: structure (ASE Atoms object) – reference atomic structure associated with the data container ensemble_parameters (dict) – parameters associated with the underlying ensemble metadata (dict) – metadata associated with the data container
analyze_data(tag, start=None, stop=None, max_lag=None)[source]

Returns detailed analysis of a scalar observerable.

Parameters: tag (str) – tag of field over which to average start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used. stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used. max_lag (Optional[int]) – maximum lag between two points in data series, by default the largest length of the data series will be used. Used for computing autocorrelation ValueError – if observable is requested that is not in data container ValueError – if observable is not scalar ValueError – if observations is not evenly spaced calculated properties of the data including mean, standard_deviation, correlation_length and error_estimate (95% confidence) dict
append(mctrial, record)[source]

Appends data to data container.

Parameters: mctrial (int) – current Monte Carlo trial step record (Dict[str, Union[int, float, list]]) – dictionary of tag-value pairs representing observations TypeError – if input parameters have the wrong type
apply_observer(observer)[source]

Adds observer data from observer to data container.

The observer will only be run for the mctrials for which the trajectory have been saved.

The interval of the observer is ignored.

Parameters: observer (BaseObserver) – observer to be used
data

pandas data frame (see pandas.DataFrame)

Return type: DataFrame
ensemble_parameters

parameters associated with Monte Carlo simulation

Return type: dict
get_average(tag, start=None, stop=None)[source]

Returns average of a scalar observable.

Parameters: tag (str) – tag of field over which to average start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used. stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used. ValueError – if observable is requested that is not in data container ValueError – if observable is not scalar float
get_data(*tags, start=None, stop=None, interval=1, fill_method='skip_none', apply_to=None)[source]

Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including ‘trajectory’ as a tag.

Parameters: tags – tuples of the requested properties start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used. stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used. interval (int) – increment for mctrial; by default the smallest available interval will be used. fill_method ({'skip_none', 'fill_backward', 'fill_forward',) – ‘linear_interpolate’, None} method employed for dealing with missing values; by default uses ‘skip_none’. apply_to (Optional[List[str]]) – tags of columns for which fill_method will be employed; by default parse all columns with fill_method. ValueError – if tags is empty ValueError – if observables are requested that are not in data container ValueError – if fill method is unknown ValueError – if trajectory is requested and fill method is not skip_none

Examples

The following lines illustrate how to use the get_data method for extracting data from the trajectory:

# obtain a list of all values of the potential represented by
# the cluster expansion along the trajectory
p = dc.get_data('potential')

# as above but this time the MC trial step and the temperature
# are included as well
s, p, t = dc.get_data('mctrial', 'potential', 'temperature')

# obtain configurations along the trajectory along with
# their potential
p, confs = dc.get_data('potential', 'trajectory')

Return type: Union[ndarray, List[Atoms], Tuple[ndarray, List[Atoms]]]
get_number_of_entries(tag=None)[source]

Returns the total number of entries with the given observable tag.

Parameters: tag (Optional[str]) – name of observable; by default the total number of rows in the data frame will be returned. ValueError – if observable is requested that is not in data container int
get_trajectory(start=None, stop=None, interval=1)[source]

Returns trajectory as a list of ASE Atoms objects.

Parameters: start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used. stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used. interval (int) – increment for mctrial; by default the smallest available interval will be used. List[Atoms]
last_state

last state to be used to restart Monte Carlo simulation

Return type: Dict[str, Union[int, List[int]]]
metadata

metadata associated with data container

Return type: dict
observables

observable names

Return type: List[str]
static read(infile, old_format=False)[source]

Reads DataContainer object from file.

Parameters: infile (Union[str, Binaryio, Textio]) – file from which to read old_format (bool) – If true use old json format to read runtime data; default to false FileNotFoundError – if file is not found (str) ValueError – if file is of incorrect type (not a tarball)
write(outfile)[source]

Writes DataContainer object to file.

Parameters: outfile (Union[str, Binaryio, Textio]) – file to which to write
write_trajectory(outfile)[source]

Writes the configurations along the trajectory to file in ASE trajectory format. The file also includes the respectives values of the potential for each configuration. If the file exists the trajectory will be appended. The ASE convert command can be used to convert the trajectory file to other formats. The ASE gui can be used to visualize the trajectory.

Parameters: outfile (Union[str, Binaryio, Textio]) – output file name or file object None

## Supporting functions¶

mchammer.data_analysis.analyze_data(data, max_lag=None)[source]

Carries out an extensive analysis of the data series.

Parameters: data (ndarray) – data series to compute autocorrelation function for max_lag (Optional[int]) – maximum lag between two data points, used for computing autocorrelation calculated properties of the data including, mean, standard deviation, correlation length and a 95% error estimate. dict
mchammer.data_analysis.get_autocorrelation_function(data, max_lag=None)[source]

Returns autocorrelation function.

The autocorrelation function is computed using Pandas.Series.autocorr

Parameters: data (ndarray) – data series to compute autocorrelation function for max_lag (Optional[int]) – maximum lag between two data points calculated autocorrelation function
mchammer.data_analysis.get_correlation_length(data)[source]

Returns estimate of the correlation length of data.

The correlation length is taken as the first point where the autocorrelation functions is less than exp(-2).

If correlation function never goes below exp(-2) then np.nan is returned

Parameters: data (ndarray) – data series to compute autocorrelation function for correlation length
mchammer.data_analysis.get_error_estimate(data, confidence=0.95)[source]

Returns estimate of standard error with confidence interval.

error = t_factor * std(data) / sqrt(Ns) where t_factor is the factor corresponding to the confidence interval Ns is the number of independent measurements (with correlation taken into account)

Parameters: data (ndarray) – data series to to estimate error for error estimate