matcalc.benchmark module

This module implements classes for running benchmarks on materials properties.

class Benchmark(benchmark_name: str | Path, properties: Sequence[str], index_name: str, other_fields: tuple = (), property_rename_map: dict[str, str] | None = None, suffix_ground_truth: str = 'DFT', n_samples: int | None = None, seed: int = 42, **kwargs)[source]

Bases: object

Abstract base for property benchmarks against published datasets.

Loads benchmark JSON, builds per-row ground-truth records and structure list, and runs a PropCalc over structures (with optional checkpointing).

properties[source]

Target property keys to compare.

other_fields[source]

Extra benchmark fields copied into each row.

index_name[source]

Primary key column (e.g. material id).

structures[source]

Pymatgen structures in benchmark order.

kwargs[source]

Kwargs forwarded to get_prop_calc.

property_rename_map[source]

Optional column renames on the results frame.

ground_truth[source]

List of dict rows (converted to DataFrame in run).

Parameters:
  • benchmark_name – Remote benchmark filename or local path to JSON.

  • properties – Keys to pull from each benchmark entry as targets.

  • index_name – Row id field name.

  • other_fields – Extra fields to copy into each ground-truth row.

  • property_rename_map – Renames applied to result columns.

  • suffix_ground_truth – Suffix for reference columns (e.g. _DFT).

  • n_samples – Random subsample size, or None for all.

  • seed – RNG seed when subsampling.

  • **kwargs – Forwarded to get_prop_calc.

Raises:
  • FileNotFoundError – If benchmark_name is a missing local path.

  • ValueError – If entries lack required fields.

_abc_impl = <_abc._abc_data object>[source]
abstractmethod get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]

Abstract method to retrieve a property calculation object using the provided calculator and additional parameters. This method must be implemented by subclasses and will utilize the provided calculator to create a PropCalc instance, possibly influenced by additional keyword arguments.

Parameters:
  • calculator – ASE calculator or model name string.

  • **kwargs – Merged with self.kwargs in run (subclass dependent).

Returns:

Configured PropCalc instance.

abstractmethod process_result(result: dict | None, model_name: str) dict[source]

Implements post-processing of results. A default implementation is provided that simply appends the model name as a suffix to the key of the input dictionary for all properties. Subclasses can override this method to provide more sophisticated processing.

Parameters:
  • result – Output dict from PropCalc.calc, or None on failure.

  • model_name – Suffix tag for predicted columns.

Returns:

Flat dict mapping {prop}_{model_name} to values (or None).

run(calculator: str | Calculator, model_name: str, *, n_jobs: None | int = -1, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True, include_full_results: bool = False, **kwargs) pd.DataFrame[source]

Processes a collection of structures using a calculator, saves intermittent checkpoints, and returns the results in a DataFrame. This function supports parallel computation and allows for error tolerance during processing.

The function also retrieves a property calculator and utilizes it to calculate desired results for the given set of structures. Checkpoints are saved periodically based on the specified frequency, ensuring that progress is not lost in case of interruptions.

Parameters:
  • calculator – ASE calculator or universal model name.

  • model_name – Label appended to predicted property columns.

  • n_jobsjoblib parallelism for calc_many (-1 = all cores).

  • checkpoint_file – Optional path to resume/save partial results.

  • checkpoint_freq – Save checkpoint every this many completed structures.

  • delete_checkpoint_on_finish – Remove checkpoint file after success.

  • include_full_results – Keep all calc keys, not only properties.

  • **kwargs – Forwarded to calc_many / calculator.

Returns:

DataFrame of ground-truth rows plus model predictions.

class BenchmarkSuite(benchmarks: list)[source]

Bases: object

Represents a suite for handling and executing a list of benchmarks. This class is designed for the comprehensive execution and management of benchmarks with support for configurable parallel computation and checkpointing.

The purpose of this class is to facilitate the execution of multiple benchmarks using various computational models (calculators) while enabling efficient resource utilization and result aggregation. It supports checkpointing to handle long computations reliably.

benchmarks[source]

Sequence of Benchmark (or compatible) instances.

Parameters:

benchmarks – Benchmark objects to run in sequence.

run(calculators: dict[str, Calculator], *, n_jobs: int | None = -1, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True) list[pd.DataFrame][source]

Executes benchmarks using the provided calculators and combines the results into a list of dataframes. Each benchmark runs for all models provided by calculators, collecting individual results and joining columns without duplicate reference fields.

Parameters:
  • calculators – Map of model label to ASE calculator.

  • n_jobs – Parallelism forwarded to each benchmark.run.

  • checkpoint_freq – Checkpoint interval per benchmark run.

  • delete_checkpoint_on_finish – Remove per-model checkpoint files when done.

Returns:

One combined DataFrame per benchmark (joined across models).

class CheckpointFile(path: str | Path)[source]

Bases: object

Represents a checkpoint file system management utility.

This class provides mechanisms to manage and process a file path and its associated actions such as loading and saving data. It ensures standardized path handling through the use of Path objects, enables loading checkpoint data from a file, and facilitates the saving of resulting data.

path[source]

Checkpoint file path as pathlib.Path.

Represents an initialization process for handling a filesystem path. The provided path is converted into a Path object for standardized path management in the application.

Parameters:

path – Filesystem path as str or Path.

load(*args: list) tuple[source]

Loads checkpoint data from a specified path if it exists, returning the loaded entries along with remaining portions of the given input arguments.

The method checks if the file path exists, and if so, it loads data from the specified file using a predefined loadfn function. It logs the number of loaded entries and returns the successfully loaded entries alongside sliced input arguments based on the number of loaded entries. If the file path does not exist, it returns empty results and the original input arguments unchanged.

Parameters:

*args – Additional list arguments aligned with checkpoint rows (sliced after load).

Returns:

Tuple (loaded_rows, *tail_slices); loaded_rows is empty if missing file.

save(results: list[dict[str, Any]]) None[source]

Saves a list of results at the specified checkpoint location.

Parameters:

results – Rows to serialize to the checkpoint path.

class ElasticityBenchmark(index_name: str = 'mp_id', benchmark_name: str | Path = 'mp-binary-pbe-elasticity-2025.1.json.gz', **kwargs)[source]

Bases: Benchmark

Represents a benchmark for evaluating and analyzing mechanical properties such as bulk modulus and shear modulus for various materials. The benchmark primarily utilizes a dataset and provides functionality for property calculation and result processing.

The class is designed to work with a predefined framework for benchmarking mechanical properties. The benchmark dataset contains values such as bulk modulus and shear modulus along with additional metadata. This class supports configurability through metadata files, index names, and additional benchmark properties. It relies on external calculators and utility classes for property computations and result handling.

Initializes the ElasticityBenchmark instance by taking benchmark metadata and additional configuration parameters. Sets up the benchmark framework with specified mechanical properties and metadata.

Parameters:
  • index_name – Primary key field name.

  • benchmark_name – Remote filename or local benchmark path.

  • **kwargs – Forwarded to Benchmark.

_abc_impl = <_abc._abc_data object>[source]
get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]

Calculates and returns a property calculation object based on the provided calculator and optional parameters. This is useful for initializing and configuring a property calculation.

Parameters:
  • calculator – ASE calculator or model name.

  • **kwargs – Merged into ElasticityCalc (default fmax 0.05).

Returns:

Configured ElasticityCalc.

process_result(result: dict | None, model_name: str) dict[source]

Processes the result dictionary containing bulk and shear modulus values, adjusts them by multiplying with a predefined conversion factor, and formats the keys according to the provided model name. If the result is None, default values of NaN are returned for both bulk and shear modulus.

Parameters:
  • resultElasticityCalc output or None.

  • model_name – Column suffix for predictions.

Returns:

K and G in GPa as bulk_modulus_vrh_{model}, shear_modulus_vrh_{model}.

class EquilibriumBenchmark(index_name: str = 'material_id', benchmark_name: str | Path = 'wbm-random-pbe52-equilibrium-2025.1.json.gz', folder_name: str = 'default_folder', **kwargs)[source]

Bases: Benchmark

Represents a benchmark for evaluating and analyzing equilibrium properties of materials. This benchmark utilizes a dataset and provides functionality for property calculation and result processing. The class is designed to work with a predefined framework for benchmarking equilibrium properties. The benchmark dataset contains data such as relaxed structures, un-/corrected formation energy along with additional metadata. This class supports configurability through metadata files, index names, and additional benchmark properties. It relies on external calculators and utility classes for property computations and result handling.

Initializes the EquilibriumBenchmark instance with specified benchmark metadata and configuration parameters. It sets up the benchmark with the necessary properties required for equilibrium benchmark analysis.

Parameters:
  • index_name – Primary key field in benchmark rows.

  • benchmark_name – Remote filename or local path to benchmark JSON.

  • folder_name – Label for file/artifact grouping.

  • **kwargs – Forwarded to Benchmark (properties, sampling, etc.).

_abc_impl = <_abc._abc_data object>[source]
_prepare_elemental_refs(calculator: str | Calculator) None[source]

Helper function to prepare and cache ground-state reference energies for all elements in the benchmark.

This method performs the following steps exactly once per Benchmark instance: 1. Load the full elemental references. 2. Traverse self.structures to collect the set of unique element symbols needed. 3. For each symbol:

  1. Retrieve its reference structure(s) from the full dataset.

  2. Use RelaxCalc to relax each structure and calculate the energy.

  3. Screen out the minimum energy per atom among those structures.

  1. Populate self.elemental_refs as a dict mapping each elemental symbol.

After this run, subsequent calls into EnergeticsCalc with use_gs_reference=True will simply look up values in self.elemental_refs, avoiding repeated relaxations.

Parameters:

calculator – ASE calculator or universal model name for elemental relaxations.

get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]

Returns a property calculation object for performing relaxation and formation energy calculations. This method initializes the stability calculator using the provided Calculator object and any additional configuration parameters.

Parameters:
  • calculator – ASE calculator or model name.

  • **kwargs – Merged into EnergeticsCalc (after elemental ref prep).

Returns:

EnergeticsCalc with MP-PBE elemental references relaxed per element.

process_result(result: dict | None, model_name: str) dict[source]

Processes the result dictionary containing final structures and formation energy per atom, formats the keys according to the provided model name. If the result is None, default values of NaN are returned for final structures or formation energy per atom.

Parameters:
  • resultEnergeticsCalc output dict, or None if calculation failed.

  • model_name – Suffix for column keys.

Returns:

Dict with structure_{model_name} and formation_energy_per_atom_{model_name}.

run(calculator: str | Calculator, model_name: str, *, n_jobs: None | int = -1, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True, include_full_results: bool = False, **kwargs) pd.DataFrame[source]

Processes a collection of structures using a calculator, saves intermittent checkpoints, and returns the results in a DataFrame. In addition to the base processing performed by the parent class, this method computes the Euclidean distance between the relaxed structure (obtained from the property calculation) and the reference DFT structure, using SiteStatsFingerprint. The computed distance is added as a new column in the results DataFrame with the key “distance_{model_name}”.

This function supports parallel computation and allows for error tolerance during processing. It retrieves a property calculator and utilizes it to calculate desired results for the given set of structures. Checkpoints are saved periodically based on the specified frequency, ensuring that progress is not lost in case of interruptions.

Parameters:
  • calculator – ASE calculator or model name.

  • model_name – Label for predicted columns.

  • n_jobs – Parallelism for calc_many.

  • checkpoint_file – Optional resume path.

  • checkpoint_freq – Checkpoint interval (structures).

  • delete_checkpoint_on_finish – Remove checkpoint after success.

  • include_full_results – Keep full calc dict keys.

  • **kwargs – Forwarded to Benchmark.run.

Returns:

Results frame plus d_{model_name} structure fingerprint distance to DFT.

class PhononBenchmark(index_name: str = 'mp_id', benchmark_name: str | Path = 'alexandria-binary-pbe-phonon-2025.1.json.gz', **kwargs)[source]

Bases: Benchmark

Phonon benchmark: compares heat capacity at 300 K (configurable index) to reference data.

Wraps PhononCalc with benchmark-specific defaults and result extraction.

Initializes an instance with specified index and benchmark details.

This constructor sets up an object with predefined properties such as heat capacity and additional fields such as the formula. It supports customizations via keyword arguments for further configurations.

Parameters:
  • index_name – Primary key field name.

  • benchmark_name – Remote filename or local path.

  • **kwargs – Forwarded to Benchmark.

_abc_impl = <_abc._abc_data object>[source]
get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]

Retrieves a phonon calculation instance based on the given calculator and additional keyword arguments.

This function initializes and returns a PhononCalc object using the provided calculator instance and any optional keyword arguments to configure the calculation further.

Parameters:
  • calculator – ASE calculator or model name.

  • **kwargs – Merged into PhononCalc (defaults fmax=0.05, no phonon YAML).

Returns:

PhononCalc instance.

process_result(result: dict | None, model_name: str) dict[source]

Processes the result dictionary to extract specific thermal property information for the provided model name.

Parameters:
  • resultPhononCalc output dict or None.

  • model_name – Column suffix.

Returns:

heat_capacity_{model_name} at fixed thermal-properties index (default grid).

class SofteningBenchmark(benchmark_name: str | Path = 'wbm-high-energy-states.json.gz', index_name: str = 'wbm_id', n_samples: int | None = None, seed: int = 42, **kwargs)[source]

Bases: object

A benchmark for the systematic softening of a PES, as described in:

B. Deng, et al. npj Comput. Mater. 11, 9 (2025). doi: 10.1038/s41524-024-01500-6

The dataset used here can be found in figshare through:

https://figshare.com/articles/dataset/WBM_high_energy_states/27307776?file=50005317

This benchmark essentially performs static calculation on pre-sampled high-energy PES configurations, and then compare the systematic underestimation of forces predicted between GGA-DFT and the provided force field.

Parameters:
  • benchmark_name – Remote filename or local path to high-energy-state dataset.

  • index_name – Id field name for each material block.

  • n_samples – Optional random subset of material keys.

  • seed – RNG seed for subsampling.

  • **kwargs – Stored on self.kwargs for extensions.

data: dict[str, Any][source]
static get_linear_fitted_slope(x: list | ndarray, y: list | ndarray) float[source]

Linear least-squares slope for y a x (proportional fit).

Parameters:
  • x – Reference force components (flattened).

  • y – Predicted force components (flattened).

Returns:

Fitted proportionality constant a.

run(calculator: Calculator, model_name: str, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 10, *, include_full_results: bool = False) pd.DataFrame[source]

Process all the material ids by 1. calculate the forces on all the sampled structures. 2. perform a linear fit on the predicted forces w.r.t. provided DFT forces. 3. returning the fitted slopes as the softening scales.

Parameters:
  • calculator – ASE calculator for force evaluation.

  • model_name – Column suffix for softening scale.

  • checkpoint_file – Optional resume path.

  • checkpoint_freq – Checkpoint every N materials completed.

  • include_full_results – Keep raw_force_predictions column when True.

Returns:

DataFrame with per-material softening scale and metadata.

get_available_benchmarks() list[str][source]

Fetches and returns a list of available benchmarks from the Materialyze/matcalc-bench Hugging Face dataset.

Returns:

Benchmark archive filenames ending in .json.gz.

get_benchmark_data(name: str) list[Any][source]

Retrieve a benchmark dataset from the Materialyze/matcalc-bench Hugging Face dataset. Files are cached locally by huggingface_hub.

Parameters:

name – Benchmark JSON archive filename (e.g. *.json.gz).

Returns:

List of entries decoded with MontyDecoder (typically dicts).

Raises:

huggingface_hub.errors.EntryNotFoundError – If the file does not exist in the dataset.