Data container¶

class
mchammer.
DataContainer
(structure, ensemble_parameters, metadata={})[source]¶ Data container for storing information concerned with Monte Carlo simulations performed with mchammer.
Parameters:  structure (ASE Atoms object) – reference atomic structure associated with the data container
 ensemble_parameters (dict) – parameters associated with the underlying ensemble
 metadata (dict) – metadata associated with the data container

analyze_data
(tag, start=None, stop=None, max_lag=None)[source]¶ Returns detailed analysis of a scalar observerable.
Parameters:  tag (
str
) – tag of field over which to average  start (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.  stop (
Optional
[int
]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.  max_lag (
Optional
[int
]) – maximum lag between two points in data series, by default the largest length of the data series will be used. Used for computing autocorrelation
Raises: ValueError
– if observable is requested that is not in data containerValueError
– if observable is not scalarValueError
– if observations is not evenly spaced
Returns: calculated properties of the data including mean, standard_deviation, correlation_length and error_estimate (95% confidence)
Return type: dict
 tag (

append
(mctrial, record)[source]¶ Appends data to data container.
Parameters:  mctrial (
int
) – current Monte Carlo trial step  record (
Dict
[str
,Union
[int
,float
,list
]]) – dictionary of tagvalue pairs representing observations
Raises: TypeError
– if input parameters have the wrong type mctrial (

apply_observer
(observer)[source]¶ Adds observer data from observer to data container.
The observer will only be run for the mctrials for which the trajectory have been saved.
The interval of the observer is ignored.
Parameters: observer ( BaseObserver
) – observer to be used

data
¶ pandas data frame (see
pandas.DataFrame
)Return type: DataFrame

ensemble_parameters
¶ parameters associated with Monte Carlo simulation
Return type: dict

get_average
(tag, start=None, stop=None)[source]¶ Returns average of a scalar observable.
Parameters:  tag (
str
) – tag of field over which to average  start (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.  stop (
Optional
[int
]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.
Raises: ValueError
– if observable is requested that is not in data containerValueError
– if observable is not scalar
Return type: float
 tag (

get_data
(*tags, start=None, stop=None, interval=1, fill_method='skip_none', apply_to=None)[source]¶ Returns the accumulated data for the requested observables, including configurations stored in the data container. The latter can be achieved by including ‘trajectory’ as a tag.
Parameters:  tags – tuples of the requested properties
 start (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.  stop (
Optional
[int
]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.  interval (
int
) – increment for mctrial; by default the smallest available interval will be used.  fill_method ({'skip_none', 'fill_backward', 'fill_forward',) – ‘linear_interpolate’, None} method employed for dealing with missing values; by default uses ‘skip_none’.
 apply_to (
Optional
[List
[str
]]) – tags of columns for which fill_method will be employed; by default parse all columns with fill_method.
Raises: ValueError
– if tags is emptyValueError
– if observables are requested that are not in data containerValueError
– if fill method is unknownValueError
– if trajectory is requested and fill method is not skip_none
Examples
The following lines illustrate how to use the get_data method for extracting data from the trajectory:
# obtain a list of all values of the potential represented by # the cluster expansion along the trajectory p = dc.get_data('potential') # as above but this time the MC trial step and the temperature # are included as well s, p, t = dc.get_data('mctrial', 'potential', 'temperature') # obtain configurations along the trajectory along with # their potential p, confs = dc.get_data('potential', 'trajectory')
Return type: Union
[ndarray
,List
[Atoms
],Tuple
[ndarray
,List
[Atoms
]]]

get_number_of_entries
(tag=None)[source]¶ Returns the total number of entries with the given observable tag.
Parameters: tag ( Optional
[str
]) – name of observable; by default the total number of rows in the data frame will be returned.Raises: ValueError
– if observable is requested that is not in data containerReturn type: int

get_trajectory
(start=None, stop=None, interval=1)[source]¶ Returns trajectory as a list of ASE Atoms objects.
Parameters:  start (
Optional
[int
]) – minimum value of trial step to consider; by default the smallest value in the mctrial column will be used.  stop (
Optional
[int
]) – maximum value of trial step to consider; by default the largest value in the mctrial column will be used.  interval (
int
) – increment for mctrial; by default the smallest available interval will be used.
Return type: List
[Atoms
] start (

last_state
¶ last state to be used to restart Monte Carlo simulation
Return type: Dict
[str
,Union
[int
,List
[int
]]]

metadata
¶ metadata associated with data container
Return type: dict

observables
¶ observable names
Return type: List
[str
]

static
read
(infile, old_format=False)[source]¶ Reads DataContainer object from file.
Parameters:  infile (
Union
[str
,Binaryio
,Textio
]) – file from which to read  old_format (
bool
) – If true use old json format to read runtime data; default to false
Raises: FileNotFoundError
– if file is not found (str)ValueError
– if file is of incorrect type (not a tarball)
 infile (

write
(outfile)[source]¶ Writes DataContainer object to file.
Parameters: outfile ( Union
[str
,Binaryio
,Textio
]) – file to which to write

write_trajectory
(outfile)[source]¶ Writes the configurations along the trajectory to file in ASE trajectory format. The file also includes the respectives values of the potential for each configuration. If the file exists the trajectory will be appended. The ASE convert command can be used to convert the trajectory file to other formats. The ASE gui can be used to visualize the trajectory.
Parameters: outfile ( Union
[str
,Binaryio
,Textio
]) – output file name or file objectReturn type: None
Supporting functions¶

mchammer.data_analysis.
analyze_data
(data, max_lag=None)[source]¶ Carries out an extensive analysis of the data series.
Parameters:  data (
ndarray
) – data series to compute autocorrelation function for  max_lag (
Optional
[int
]) – maximum lag between two data points, used for computing autocorrelation
Returns: calculated properties of the data including, mean, standard deviation, correlation length and a 95% error estimate.
Return type: dict
 data (

mchammer.data_analysis.
get_autocorrelation_function
(data, max_lag=None)[source]¶ Returns autocorrelation function.
The autocorrelation function is computed using Pandas.Series.autocorr
Parameters:  data (
ndarray
) – data series to compute autocorrelation function for  max_lag (
Optional
[int
]) – maximum lag between two data points
Returns: Return type: calculated autocorrelation function
 data (

mchammer.data_analysis.
get_correlation_length
(data)[source]¶ Returns estimate of the correlation length of data.
The correlation length is taken as the first point where the autocorrelation functions is less than exp(2).
If correlation function never goes below exp(2) then np.nan is returned
Parameters: data ( ndarray
) – data series to compute autocorrelation function forReturns: Return type: correlation length

mchammer.data_analysis.
get_error_estimate
(data, confidence=0.95)[source]¶ Returns estimate of standard error with confidence interval.
error = t_factor * std(data) / sqrt(Ns) where t_factor is the factor corresponding to the confidence interval Ns is the number of independent measurements (with correlation taken into account)
Parameters: data ( ndarray
) – data series to to estimate error forReturns: Return type: error estimate