Hyper-parameter scans

Here we consider how different optimization algorithms behave (for this system) and how their hyper parameters influence the resulting cluster expansion. We use the cutoffs [8.0, 6.5, 6.0] and consider the following optimization algorithms

  • ARDR (Automatic Relevance Determination Regression)

  • RFE (Recursive Feature Elimination)

  • LASSO (Least Absolute Shrinkage and Selection Operator)

  • Adaptive-LASSO

From the analyis above one can conclude that around 20 to 30 non-zero ECIs are a good choice with cutoffs [8.0, 6.5, 6.0]. We also note here that LASSO performs quite poorly compared to the other methods, whereas the other three methods all yield similar results.

[1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from ase.db import connect
from icet import ClusterSpace, StructureContainer
from trainstation import CrossValidationEstimator

Preparations

We set up the cluster space and prepare the reference data.

[2]:
# access database with reference data
db = connect('../../tutorial/reference_data.db')

# set up cluster space
primitive_structure = db.get(id=1).toatoms()
cs = ClusterSpace(
    structure=primitive_structure,
    cutoffs=[8.0, 6.5, 6.0],
    chemical_symbols=['Ag', 'Pd']
)

# compile fit data
sc = StructureContainer(cluster_space=cs)
for row in db.select():
    sc.add_structure(
        structure=row.toatoms(),
        user_tag=row.tag,
        properties={'mixing_energy': row.mixing_energy},
    )
A, y = sc.get_fit_data(key='mixing_energy')

Furthermore we define a convenience function that extracts pertinent information from a CrossValidationEstimator object.

[3]:
def get_row(cve: CrossValidationEstimator):
    row = dict()
    row['rmse_validation'] = cve.rmse_validation
    row['rmse_train'] = cve.rmse_train
    row['BIC'] = cve.model.BIC
    row['n_parameters'] = cve.n_parameters
    row['n_nonzero_parameters'] = cve.n_nonzero_parameters
    return row

ARDR: Hyperparameter scan

ARDR has the hyper-parameter threshold-lambda which controls the sparsity of the solution.

[4]:
lambda_values = [250, 500, 1000, 1400, 2000, 2500, 4500, 7500,
                 13000, 18000, 25000, 40000, 60000, 90000, 200000]
records = []
for lam in lambda_values:
    cve = CrossValidationEstimator((A, y), fit_method='ardr', threshold_lambda=lam)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['threshold_lambda'] = lam
    records.append(row)
df_ardr = pd.DataFrame(records)

RFE: Hyperparameter scan

RFE has the hyper-parameter n_features which controls the sparsity of the solution.

[5]:
nf_values = np.arange(10, len(cs), 4)
records = []
for nf in nf_values:
    cve = CrossValidationEstimator((A, y), fit_method='rfe', n_features=nf)
    cve.validate()
    cve.train()
    row = get_row(cve)
    records.append(row)
df_rfe = pd.DataFrame(records)

LASSO: Hyperparameter scan

[6]:
alpha_values = np.logspace(-5, -0.5, 20)
records = []
for alpha in alpha_values:
    cve = CrossValidationEstimator((A, y), max_iter=50000, fit_method='lasso', alpha=alpha)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['alpha'] = alpha
    records.append(row)
df_lasso = pd.DataFrame(records)

Adaptive-LASSO: Hyperparameter scan

[7]:
alpha_values = np.logspace(-5, -1.5, 20)
records = []
for alpha in alpha_values:
    cve = CrossValidationEstimator((A, y), max_iter=50000,
                                   fit_method='adaptive-lasso', alpha=alpha)
    cve.validate()
    cve.train()
    row = get_row(cve)
    row['alpha'] = alpha
    records.append(row)
df_adlasso = pd.DataFrame(records)

Plot results

[8]:
fig, axes = plt.subplots(
    figsize=(4, 4.5),
    dpi=120,
    sharex=True,
    nrows=2,
)

ax = axes[0]
conv = 1e3
ax.set_ylabel('CV-RMSE (meV/atom)')
ax.plot(df_ardr.n_nonzero_parameters, conv * df_ardr.rmse_validation,
        '-o', label='ARDR')
ax.plot(df_rfe.n_nonzero_parameters, conv * df_rfe.rmse_validation,
        '-s', label='RFE')
ax.plot(df_lasso.n_nonzero_parameters, conv * df_lasso.rmse_validation,
        '-x', label='LASSO')
ax.plot(df_adlasso.n_nonzero_parameters, conv * df_adlasso.rmse_validation,
        '-v', label='ad-LASSO')
ax.legend()

ax = axes[1]
conv = 1e-3
ax.set_ylabel(r'BIC ($\times 10^{3}$)')
ax.plot(df_ardr.n_nonzero_parameters, conv * df_ardr.BIC, '-o', label='ARDR')
ax.plot(df_rfe.n_nonzero_parameters, conv * df_rfe.BIC, '-s', label='RFE')
ax.plot(df_lasso.n_nonzero_parameters, conv * df_lasso.BIC, '-x', label='LASSO')
ax.plot(df_adlasso.n_nonzero_parameters, conv * df_adlasso.BIC, '-v', label='ad-LASSO')

axes[-1].set_xlabel('Number of nonzero parameters')

fig.tight_layout()
fig.subplots_adjust(hspace=0)
fig.align_ylabels(axes)
../_images/advanced_topics_training_hyper_parameter_scans_16_0.png