Hyper-parameter scans#

Here we consider how different optimization algorithms behave (for this system) and how their hyper parameters influence the resulting cluster expansion. We use the cutoffs [8.0, 6.5, 6.0] and consider the following optimization algorithms

  • ARDR (Automatic Relevance Determination Regression)

  • RFE (Recursive Feature Elimination)

  • LASSO (Least Absolute Shrinkage and Selection Operator)

  • Adaptive-LASSO

From the analyis above one can conclude that around 20 to 30 non-zero ECIs are a good choice with cutoffs [8.0, 6.5, 6.0]. We also note here that LASSO performs quite poorly compared to the other methods, whereas the other three methods all yield similar results.

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ase.db import connect
from icet import ClusterSpace, StructureContainer
from trainstation import CrossValidationEstimator

Preparations#

We set up the cluster space and prepare the reference data.

[2]:
# access database with reference data
db = connect('../../tutorial/reference_data.db')

# set up cluster space
primitive_structure = db.get(id=1).toatoms()
cs = ClusterSpace(
    structure=primitive_structure,
    cutoffs=[8.0, 6.5, 6.0],
    chemical_symbols=['Ag', 'Pd']
)

# compile fit data
sc = StructureContainer(cluster_space=cs)
for row in db.select():
    sc.add_structure(
        structure=row.toatoms(),
        user_tag=row.tag,
        properties={'mixing_energy': row.mixing_energy},
    )
A, y = sc.get_fit_data(key='mixing_energy')

Furthermore we define a convenience function that extracts pertinent information from a CrossValidationEstimator object.

[3]:
def get_row(cve: CrossValidationEstimator):
    row = dict()
    row['rmse_validation'] = cve.rmse_validation
    row['rmse_train'] = cve.rmse_train
    row['BIC'] = cve.model.BIC
    row['n_parameters'] = cve.n_parameters
    row['n_nonzero_parameters'] = cve.n_nonzero_parameters
    return row

ARDR: Hyperparameter scan#

ARDR has the hyper-parameter threshold-lambda which controls the sparsity of the solution.

[4]:
lambda_values = [250, 500, 1000, 1400, 2000, 2500, 4500, 7500,
                 13000, 18000, 25000, 40000, 60000, 90000, 200000]
records = []
for lam in lambda_values:
    cve = CrossValidationEstimator((A, y), fit_method='ardr', threshold_lambda=lam)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['threshold_lambda'] = lam
    records.append(row)
df_ardr = pd.DataFrame(records)

RFE: Hyperparameter scan#

RFE has the hyper-parameter n_features which controls the sparsity of the solution.

[5]:
nf_values = np.arange(10, len(cs), 4)
records = []
for nf in nf_values:
    cve = CrossValidationEstimator((A, y), fit_method='rfe', n_features=nf)
    cve.validate()
    cve.train()
    row = get_row(cve)
    records.append(row)
df_rfe = pd.DataFrame(records)

LASSO: Hyperparameter scan#

[6]:
alpha_values = np.logspace(-5, -0.5, 20)
records = []
for alpha in alpha_values:
    cve = CrossValidationEstimator((A, y), max_iter=50000, fit_method='lasso', alpha=alpha)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['alpha'] = alpha
    records.append(row)
df_lasso = pd.DataFrame(records)

Adaptive-LASSO: Hyperparameter scan#

[7]:
alpha_values = np.logspace(-5, -1.5, 20)
records = []
for alpha in alpha_values:
    cve = CrossValidationEstimator((A, y), max_iter=50000,
                                   fit_method='adaptive-lasso', alpha=alpha)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['alpha'] = alpha
    records.append(row)
df_adlasso = pd.DataFrame(records)

Plot results#

[8]:
fig, axes = plt.subplots(
    figsize=(4, 4.5),
    dpi=120,
    sharex=True,
    nrows=2,
)

ax = axes[0]
conv = 1e3
ax.set_ylabel('CV-RMSE (meV/atom)')
ax.plot(df_ardr.n_nonzero_parameters, conv * df_ardr.rmse_validation,
        '-o', label='ARDR')
ax.plot(df_rfe.n_nonzero_parameters, conv * df_rfe.rmse_validation,
        '-s', label='RFE')
ax.plot(df_lasso.n_nonzero_parameters, conv * df_lasso.rmse_validation,
        '-x', label='LASSO')
ax.plot(df_adlasso.n_nonzero_parameters, conv * df_adlasso.rmse_validation,
        '-v', label='ad-LASSO')
ax.legend()

ax = axes[1]
conv = 1e-3
ax.set_ylabel(r'BIC ($\times 10^{3}$)')
ax.plot(df_ardr.n_nonzero_parameters, conv * df_ardr.BIC, '-o', label='ARDR')
ax.plot(df_rfe.n_nonzero_parameters, conv * df_rfe.BIC, '-s', label='RFE')
ax.plot(df_lasso.n_nonzero_parameters, conv * df_lasso.BIC, '-x', label='LASSO')
ax.plot(df_adlasso.n_nonzero_parameters, conv * df_adlasso.BIC, '-v', label='ad-LASSO')

axes[-1].set_xlabel('Number of nonzero parameters')

fig.tight_layout()
fig.subplots_adjust(hspace=0)
fig.align_ylabels(axes)
../_images/advanced_topics_training_hyper_parameter_scans_16_0.png