matcalc.benchmark module
This module implements classes for running benchmarks on materials properties.
- class Benchmark(benchmark_name: str | Path, properties: Sequence[str], index_name: str, other_fields: tuple = (), property_rename_map: dict[str, str] | None = None, suffix_ground_truth: str = 'DFT', n_samples: int | None = None, seed: int = 42, **kwargs)[source]
Bases:
objectAbstract base for property benchmarks against published datasets.
Loads benchmark JSON, builds per-row ground-truth records and structure list, and runs a
PropCalcover structures (with optional checkpointing).- Parameters:
benchmark_name – Remote benchmark filename or local path to JSON.
properties – Keys to pull from each benchmark entry as targets.
index_name – Row id field name.
other_fields – Extra fields to copy into each ground-truth row.
property_rename_map – Renames applied to result columns.
suffix_ground_truth – Suffix for reference columns (e.g.
_DFT).n_samples – Random subsample size, or None for all.
seed – RNG seed when subsampling.
**kwargs – Forwarded to
get_prop_calc.
- Raises:
FileNotFoundError – If
benchmark_nameis a missing local path.ValueError – If entries lack required fields.
- abstractmethod get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]
Abstract method to retrieve a property calculation object using the provided calculator and additional parameters. This method must be implemented by subclasses and will utilize the provided calculator to create a PropCalc instance, possibly influenced by additional keyword arguments.
- Parameters:
calculator – ASE calculator or model name string.
**kwargs – Merged with
self.kwargsinrun(subclass dependent).
- Returns:
Configured
PropCalcinstance.
- abstractmethod process_result(result: dict | None, model_name: str) dict[source]
Implements post-processing of results. A default implementation is provided that simply appends the model name as a suffix to the key of the input dictionary for all properties. Subclasses can override this method to provide more sophisticated processing.
- Parameters:
result – Output dict from
PropCalc.calc, or None on failure.model_name – Suffix tag for predicted columns.
- Returns:
Flat dict mapping
{prop}_{model_name}to values (or None).
- run(calculator: str | Calculator, model_name: str, *, n_jobs: None | int = -1, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True, include_full_results: bool = False, **kwargs) pd.DataFrame[source]
Processes a collection of structures using a calculator, saves intermittent checkpoints, and returns the results in a DataFrame. This function supports parallel computation and allows for error tolerance during processing.
The function also retrieves a property calculator and utilizes it to calculate desired results for the given set of structures. Checkpoints are saved periodically based on the specified frequency, ensuring that progress is not lost in case of interruptions.
- Parameters:
calculator – ASE calculator or universal model name.
model_name – Label appended to predicted property columns.
n_jobs –
joblibparallelism forcalc_many(-1 = all cores).checkpoint_file – Optional path to resume/save partial results.
checkpoint_freq – Save checkpoint every this many completed structures.
delete_checkpoint_on_finish – Remove checkpoint file after success.
include_full_results – Keep all
calckeys, not onlyproperties.**kwargs – Forwarded to
calc_many/ calculator.
- Returns:
DataFrame of ground-truth rows plus model predictions.
- class BenchmarkSuite(benchmarks: list)[source]
Bases:
objectRepresents a suite for handling and executing a list of benchmarks. This class is designed for the comprehensive execution and management of benchmarks with support for configurable parallel computation and checkpointing.
The purpose of this class is to facilitate the execution of multiple benchmarks using various computational models (calculators) while enabling efficient resource utilization and result aggregation. It supports checkpointing to handle long computations reliably.
- Parameters:
benchmarks – Benchmark objects to run in sequence.
- run(calculators: dict[str, Calculator], *, n_jobs: int | None = -1, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True) list[pd.DataFrame][source]
Executes benchmarks using the provided calculators and combines the results into a list of dataframes. Each benchmark runs for all models provided by calculators, collecting individual results and joining columns without duplicate reference fields.
- Parameters:
calculators – Map of model label to ASE calculator.
n_jobs – Parallelism forwarded to each
benchmark.run.checkpoint_freq – Checkpoint interval per benchmark run.
delete_checkpoint_on_finish – Remove per-model checkpoint files when done.
- Returns:
One combined DataFrame per benchmark (joined across models).
- class CheckpointFile(path: str | Path)[source]
Bases:
objectRepresents a checkpoint file system management utility.
This class provides mechanisms to manage and process a file path and its associated actions such as loading and saving data. It ensures standardized path handling through the use of Path objects, enables loading checkpoint data from a file, and facilitates the saving of resulting data.
Represents an initialization process for handling a filesystem path. The provided path is converted into a Path object for standardized path management in the application.
- Parameters:
path – Filesystem path as str or
Path.
- load(*args: list) tuple[source]
Loads checkpoint data from a specified path if it exists, returning the loaded entries along with remaining portions of the given input arguments.
The method checks if the file path exists, and if so, it loads data from the specified file using a predefined loadfn function. It logs the number of loaded entries and returns the successfully loaded entries alongside sliced input arguments based on the number of loaded entries. If the file path does not exist, it returns empty results and the original input arguments unchanged.
- Parameters:
*args – Additional list arguments aligned with checkpoint rows (sliced after load).
- Returns:
Tuple
(loaded_rows, *tail_slices);loaded_rowsis empty if missing file.
- class ElasticityBenchmark(index_name: str = 'mp_id', benchmark_name: str | Path = 'mp-binary-pbe-elasticity-2025.1.json.gz', **kwargs)[source]
Bases:
BenchmarkRepresents a benchmark for evaluating and analyzing mechanical properties such as bulk modulus and shear modulus for various materials. The benchmark primarily utilizes a dataset and provides functionality for property calculation and result processing.
The class is designed to work with a predefined framework for benchmarking mechanical properties. The benchmark dataset contains values such as bulk modulus and shear modulus along with additional metadata. This class supports configurability through metadata files, index names, and additional benchmark properties. It relies on external calculators and utility classes for property computations and result handling.
Initializes the ElasticityBenchmark instance by taking benchmark metadata and additional configuration parameters. Sets up the benchmark framework with specified mechanical properties and metadata.
- Parameters:
index_name – Primary key field name.
benchmark_name – Remote filename or local benchmark path.
**kwargs – Forwarded to
Benchmark.
- get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]
Calculates and returns a property calculation object based on the provided calculator and optional parameters. This is useful for initializing and configuring a property calculation.
- Parameters:
calculator – ASE calculator or model name.
**kwargs – Merged into
ElasticityCalc(defaultfmax0.05).
- Returns:
Configured
ElasticityCalc.
- process_result(result: dict | None, model_name: str) dict[source]
Processes the result dictionary containing bulk and shear modulus values, adjusts them by multiplying with a predefined conversion factor, and formats the keys according to the provided model name. If the result is None, default values of NaN are returned for both bulk and shear modulus.
- Parameters:
result –
ElasticityCalcoutput or None.model_name – Column suffix for predictions.
- Returns:
K and G in GPa as
bulk_modulus_vrh_{model},shear_modulus_vrh_{model}.
- class EquilibriumBenchmark(index_name: str = 'material_id', benchmark_name: str | Path = 'wbm-random-pbe52-equilibrium-2025.1.json.gz', folder_name: str = 'default_folder', **kwargs)[source]
Bases:
BenchmarkRepresents a benchmark for evaluating and analyzing equilibrium properties of materials. This benchmark utilizes a dataset and provides functionality for property calculation and result processing. The class is designed to work with a predefined framework for benchmarking equilibrium properties. The benchmark dataset contains data such as relaxed structures, un-/corrected formation energy along with additional metadata. This class supports configurability through metadata files, index names, and additional benchmark properties. It relies on external calculators and utility classes for property computations and result handling.
Initializes the EquilibriumBenchmark instance with specified benchmark metadata and configuration parameters. It sets up the benchmark with the necessary properties required for equilibrium benchmark analysis.
- Parameters:
index_name – Primary key field in benchmark rows.
benchmark_name – Remote filename or local path to benchmark JSON.
folder_name – Label for file/artifact grouping.
**kwargs – Forwarded to
Benchmark(properties, sampling, etc.).
- _prepare_elemental_refs(calculator: str | Calculator) None[source]
Helper function to prepare and cache ground-state reference energies for all elements in the benchmark.
This method performs the following steps exactly once per Benchmark instance: 1. Load the full elemental references. 2. Traverse self.structures to collect the set of unique element symbols needed. 3. For each symbol:
Retrieve its reference structure(s) from the full dataset.
Use RelaxCalc to relax each structure and calculate the energy.
Screen out the minimum energy per atom among those structures.
Populate self.elemental_refs as a dict mapping each elemental symbol.
After this run, subsequent calls into EnergeticsCalc with use_gs_reference=True will simply look up values in self.elemental_refs, avoiding repeated relaxations.
- Parameters:
calculator – ASE calculator or universal model name for elemental relaxations.
- get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]
Returns a property calculation object for performing relaxation and formation energy calculations. This method initializes the stability calculator using the provided Calculator object and any additional configuration parameters.
- Parameters:
calculator – ASE calculator or model name.
**kwargs – Merged into
EnergeticsCalc(after elemental ref prep).
- Returns:
EnergeticsCalcwith MP-PBE elemental references relaxed per element.
- process_result(result: dict | None, model_name: str) dict[source]
Processes the result dictionary containing final structures and formation energy per atom, formats the keys according to the provided model name. If the result is None, default values of NaN are returned for final structures or formation energy per atom.
- Parameters:
result –
EnergeticsCalcoutput dict, or None if calculation failed.model_name – Suffix for column keys.
- Returns:
Dict with
structure_{model_name}andformation_energy_per_atom_{model_name}.
- run(calculator: str | Calculator, model_name: str, *, n_jobs: None | int = -1, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 1000, delete_checkpoint_on_finish: bool = True, include_full_results: bool = False, **kwargs) pd.DataFrame[source]
Processes a collection of structures using a calculator, saves intermittent checkpoints, and returns the results in a DataFrame. In addition to the base processing performed by the parent class, this method computes the Euclidean distance between the relaxed structure (obtained from the property calculation) and the reference DFT structure, using SiteStatsFingerprint. The computed distance is added as a new column in the results DataFrame with the key “distance_{model_name}”.
This function supports parallel computation and allows for error tolerance during processing. It retrieves a property calculator and utilizes it to calculate desired results for the given set of structures. Checkpoints are saved periodically based on the specified frequency, ensuring that progress is not lost in case of interruptions.
- Parameters:
calculator – ASE calculator or model name.
model_name – Label for predicted columns.
n_jobs – Parallelism for
calc_many.checkpoint_file – Optional resume path.
checkpoint_freq – Checkpoint interval (structures).
delete_checkpoint_on_finish – Remove checkpoint after success.
include_full_results – Keep full
calcdict keys.**kwargs – Forwarded to
Benchmark.run.
- Returns:
Results frame plus
d_{model_name}structure fingerprint distance to DFT.
- class PhononBenchmark(index_name: str = 'mp_id', benchmark_name: str | Path = 'alexandria-binary-pbe-phonon-2025.1.json.gz', **kwargs)[source]
Bases:
BenchmarkPhonon benchmark: compares heat capacity at 300 K (configurable index) to reference data.
Wraps
PhononCalcwith benchmark-specific defaults and result extraction.Initializes an instance with specified index and benchmark details.
This constructor sets up an object with predefined properties such as heat capacity and additional fields such as the formula. It supports customizations via keyword arguments for further configurations.
- Parameters:
index_name – Primary key field name.
benchmark_name – Remote filename or local path.
**kwargs – Forwarded to
Benchmark.
- get_prop_calc(calculator: str | Calculator, **kwargs: Any) PropCalc[source]
Retrieves a phonon calculation instance based on the given calculator and additional keyword arguments.
This function initializes and returns a PhononCalc object using the provided calculator instance and any optional keyword arguments to configure the calculation further.
- Parameters:
calculator – ASE calculator or model name.
**kwargs – Merged into
PhononCalc(defaultsfmax=0.05, no phonon YAML).
- Returns:
PhononCalcinstance.
- process_result(result: dict | None, model_name: str) dict[source]
Processes the result dictionary to extract specific thermal property information for the provided model name.
- Parameters:
result –
PhononCalcoutput dict or None.model_name – Column suffix.
- Returns:
heat_capacity_{model_name}at fixed thermal-properties index (default grid).
- class SofteningBenchmark(benchmark_name: str | Path = 'wbm-high-energy-states.json.gz', index_name: str = 'wbm_id', n_samples: int | None = None, seed: int = 42, **kwargs)[source]
Bases:
object- A benchmark for the systematic softening of a PES, as described in:
B. Deng, et al. npj Comput. Mater. 11, 9 (2025). doi: 10.1038/s41524-024-01500-6
- The dataset used here can be found in figshare through:
https://figshare.com/articles/dataset/WBM_high_energy_states/27307776?file=50005317
This benchmark essentially performs static calculation on pre-sampled high-energy PES configurations, and then compare the systematic underestimation of forces predicted between GGA-DFT and the provided force field.
- Parameters:
benchmark_name – Remote filename or local path to high-energy-state dataset.
index_name – Id field name for each material block.
n_samples – Optional random subset of material keys.
seed – RNG seed for subsampling.
**kwargs – Stored on
self.kwargsfor extensions.
- static get_linear_fitted_slope(x: list | ndarray, y: list | ndarray) float[source]
Linear least-squares slope for
y ≈ a x(proportional fit).- Parameters:
x – Reference force components (flattened).
y – Predicted force components (flattened).
- Returns:
Fitted proportionality constant
a.
- run(calculator: Calculator, model_name: str, checkpoint_file: str | Path | None = None, checkpoint_freq: int = 10, *, include_full_results: bool = False) pd.DataFrame[source]
Process all the material ids by 1. calculate the forces on all the sampled structures. 2. perform a linear fit on the predicted forces w.r.t. provided DFT forces. 3. returning the fitted slopes as the softening scales.
- Parameters:
calculator – ASE calculator for force evaluation.
model_name – Column suffix for softening scale.
checkpoint_file – Optional resume path.
checkpoint_freq – Checkpoint every N materials completed.
include_full_results – Keep
raw_force_predictionscolumn when True.
- Returns:
DataFrame with per-material softening scale and metadata.
- get_available_benchmarks() list[str][source]
Fetches and returns a list of available benchmarks from the
Materialyze/matcalc-benchHugging Face dataset.- Returns:
Benchmark archive filenames ending in
.json.gz.
- get_benchmark_data(name: str) list[Any][source]
Retrieve a benchmark dataset from the
Materialyze/matcalc-benchHugging Face dataset. Files are cached locally byhuggingface_hub.- Parameters:
name – Benchmark JSON archive filename (e.g.
*.json.gz).- Returns:
List of entries decoded with
MontyDecoder(typically dicts).- Raises:
huggingface_hub.errors.EntryNotFoundError – If the file does not exist in the dataset.