Data container

class mchammer.DataContainer(structure, ensemble_parameters, metadata={})[source]

Data container for storing information concerned with Monte Carlo simulations performed with mchammer.

Parameters:
  • structure (ASE Atoms object) – reference atomic structure associated with the data container
  • ensemble_parameters (dict) – parameters associated with the underlying ensemble
  • metadata (dict) – metadata associated with the data container
analyze_data(tag, start=None, stop=None, max_lag=None)[source]

Returns detailed analysis of a scalar observerable.

Parameters:
  • tag (str) – tag of field over which to average
  • start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
  • stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.
  • max_lag (Optional[int]) – maximum lag between two points in data series, by default the largest length of the data series will be used. Used for computing autocorrelation
Raises:
  • ValueError – if observable is requested that is not in data container
  • ValueError – if observable is not scalar
  • ValueError – if observations is not evenly spaced
Returns:

calculated properties of the data including mean, standard_deviation, correlation_length and error_estimate (95% confidence)

Return type:

dict

append(mctrial, record)[source]

Appends data to data container.

Parameters:
  • mctrial (int) – current Monte Carlo trial step
  • record (Dict[str, Union[int, float, list]]) – dictionary of tag-value pairs representing observations
Raises:

TypeError – if input parameters have the wrong type

apply_observer(observer)[source]

Adds observer data from observer to data container.

The observer will only be run for the mctrials for which the trajectory have been saved.

The interval of the observer is ignored.

Parameters:observer (BaseObserver) – observer to be used
data

pandas data frame (see pandas.DataFrame)

Return type:DataFrame
ensemble_parameters

parameters associated with Monte Carlo simulation

Return type:dict
get_average(tag, start=None, stop=None)[source]

Returns average of a scalar observable.

Parameters:
  • tag (str) – tag of field over which to average
  • start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
  • stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.
Raises:
  • ValueError – if observable is requested that is not in data container
  • ValueError – if observable is not scalar
Return type:

float

get_data(*tags, start=None, stop=None, interval=1, fill_method='skip_none', apply_to=None)[source]

Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including ‘trajectory’ as a tag.

Parameters:
  • tags – tuples of the requested properties
  • start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
  • stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.
  • interval (int) – increment for mctrial; by default the smallest available interval will be used.
  • fill_method ({'skip_none', 'fill_backward', 'fill_forward',) – ‘linear_interpolate’, None} method employed for dealing with missing values; by default uses ‘skip_none’.
  • apply_to (Optional[List[str]]) – tags of columns for which fill_method will be employed; by default parse all columns with fill_method.
Raises:
  • ValueError – if tags is empty
  • ValueError – if observables are requested that are not in data container
  • ValueError – if fill method is unknown
  • ValueError – if trajectory is requested and fill method is not skip_none

Examples

The following lines illustrate how to use the get_data method for extracting data from the trajectory:

# obtain a list of all values of the potential represented by
# the cluster expansion along the trajectory
p = dc.get_data('potential')

# as above but this time the MC trial step and the temperature
# are included as well
s, p, t = dc.get_data('mctrial', 'potential', 'temperature')

# obtain configurations along the trajectory along with
# their potential
p, confs = dc.get_data('potential', 'trajectory')
Return type:Union[ndarray, List[Atoms], Tuple[ndarray, List[Atoms]]]
get_number_of_entries(tag=None)[source]

Returns the total number of entries with the given observable tag.

Parameters:tag (Optional[str]) – name of observable; by default the total number of rows in the data frame will be returned.
Raises:ValueError – if observable is requested that is not in data container
Return type:int
get_trajectory(start=None, stop=None, interval=1)[source]

Returns trajectory as a list of ASE Atoms objects.

Parameters:
  • start (Optional[int]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.
  • stop (Optional[int]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.
  • interval (int) – increment for mctrial; by default the smallest available interval will be used.
Return type:

List[Atoms]

last_state

last state to be used to restart Monte Carlo simulation

Return type:Dict[str, Union[int, List[int]]]
metadata

metadata associated with data container

Return type:dict
observables

observable names

Return type:List[str]
static read(infile, old_format=False)[source]

Reads DataContainer object from file.

Parameters:
  • infile (Union[str, Binaryio, Textio]) – file from which to read
  • old_format (bool) – If true use old json format to read runtime data; default to false
Raises:
  • FileNotFoundError – if file is not found (str)
  • ValueError – if file is of incorrect type (not a tarball)
reset()[source]

Resets (clears) internal data list of data container.

write(outfile)[source]

Writes DataContainer object to file.

Parameters:outfile (Union[str, Binaryio, Textio]) – file to which to write
write_trajectory(outfile)[source]

Writes the configurations along the trajectory to file in ASE trajectory format. The file also includes the respectives values of the potential for each configuration. If the file exists the trajectory will be appended. The ASE convert command can be used to convert the trajectory file to other formats. The ASE gui can be used to visualize the trajectory.

Parameters:outfile (Union[str, Binaryio, Textio]) – output file name or file object
Return type:None

Supporting functions

mchammer.data_analysis.analyze_data(data, max_lag=None)[source]

Carries out an extensive analysis of the data series.

Parameters:
  • data (ndarray) – data series to compute autocorrelation function for
  • max_lag (Optional[int]) – maximum lag between two data points, used for computing autocorrelation
Returns:

calculated properties of the data including, mean, standard deviation, correlation length and a 95% error estimate.

Return type:

dict

mchammer.data_analysis.get_autocorrelation_function(data, max_lag=None)[source]

Returns autocorrelation function.

The autocorrelation function is computed using Pandas.Series.autocorr

Parameters:
  • data (ndarray) – data series to compute autocorrelation function for
  • max_lag (Optional[int]) – maximum lag between two data points
Returns:

Return type:

calculated autocorrelation function

mchammer.data_analysis.get_correlation_length(data)[source]

Returns estimate of the correlation length of data.

The correlation length is taken as the first point where the autocorrelation functions is less than exp(-2).

If correlation function never goes below exp(-2) then np.nan is returned

Parameters:data (ndarray) – data series to compute autocorrelation function for
Returns:
Return type:correlation length
mchammer.data_analysis.get_error_estimate(data, confidence=0.95)[source]

Returns estimate of standard error with confidence interval.

error = t_factor * std(data) / sqrt(Ns) where t_factor is the factor corresponding to the confidence interval Ns is the number of independent measurements (with correlation taken into account)

Parameters:data (ndarray) – data series to to estimate error for
Returns:
Return type:error estimate