[1]:

# /// script
# dependencies = [
#     "torch_sim_atomistic[mace, io]"
# ]
# ///

Introduction to TorchSim¶

This tutorial introduces TorchSim’s high-level API for molecular dynamics simulations and geometry optimizations. The high-level API provides simple, powerful interfaces that abstract away the complexities of setting up atomistic simulations while still allowing for customization.

Introduction¶

TorchSim’s high-level API consists of three primary functions:

integrate - For running molecular dynamics simulations
optimize - For geometry optimization
static - For one-time energy/force calculations on a diversity set of systems

These functions handle:

Automatic state initialization from various input formats
Memory-efficient GPU operations via autobatching
Trajectory reporting and property calculation
Custom convergence criteria

Over the course of the tutorial, we will fully explain the example in the README by steadily adding functionality.

Basic Molecular Dynamics¶

We’ll start with a simple example: simulating a silicon system using a Lennard-Jones potential. First, let’s set up our model and create an atomic structure:

[2]:

import torch_sim as ts
import torch
from ase.build import bulk
from torch_sim.models.lennard_jones import LennardJonesModel

# Create a Lennard-Jones model with parameters suitable for Si
lj_model = LennardJonesModel(
    sigma=2.0,  # Å, typical for Si-Si interaction
    epsilon=0.1,  # eV, typical for Si-Si interaction
    device=torch.device("cpu"),
    dtype=torch.float64,
)

# Create a silicon FCC structure using ASE
cu_atoms = bulk("Cu", "fcc", a=5.43, cubic=True)

Warp CUDA warning: Could not find or load the NVIDIA CUDA driver. Proceeding in CPU-only mode.

Warp DeprecationWarning: The symbol `warp.vec` will soon be removed from the public API. Use `warp.types.vector` instead.

Now we can run a molecular dynamics simulation using the integrate function. This function takes care of initializing the state, setting up the integrator, and running the simulation:

[3]:

n_steps = 50
final_state = ts.integrate(
    system=cu_atoms,  # Input atomic system
    model=lj_model,  # Energy/force model
    integrator=ts.Integrator.nvt_langevin,  # Integrator to use
    n_steps=n_steps,  # Number of MD steps
    temperature=2000,  # Target temperature (K)
    timestep=0.002,  # Integration timestep (ps)
)

# Convert the final state back to ASE atoms
final_atoms = final_state.to_atoms()

Trajectory Reporting¶

While running simulations, we often want to save trajectory data and calculate properties. The easiest way to do this is to simply specify the filenames argument in the integrate function. This will assume some reasonable default settings for the trajectory reporter and write to the specified files.

[4]:

n_steps = 50
final_state = ts.integrate(
    system=cu_atoms,
    model=lj_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
    trajectory_reporter={"filenames": "lj_trajectory.h5"},
)

Behind the scenes, the dict is used to instantiate a TrajectoryReporter object, which then handles the reporting. If you need more control over the trajectory reporter, you can instantiate it manually and pass it to the integrate function.

This makes it easier to customize the trajectory reporter to your needs. Below, we show how to periodically report additional quantities and manually specify the frequency that the state is saved.

For more detail, see the trajectory reporter tutorial.

[5]:

trajectory_file = "lj_trajectory.h5"

# Define property calculators to track energies
# - Calculate potential energy every 10 steps
# - Calculate kinetic energy every 20 steps
prop_calculators = {
    10: {"potential_energy": lambda state: state.energy},
    20: {
        "kinetic_energy": lambda state: ts.calc_kinetic_energy(
            momenta=state.momenta, masses=state.masses
        )
    },
}

# Create a reporter that saves the state every 10 steps
reporter = ts.TrajectoryReporter(
    trajectory_file,
    state_frequency=10,  # Save the state every 10 steps
    prop_calculators=prop_calculators,
)

Now we can run the simulation with trajectory reporting:

[6]:

final_state = ts.integrate(
    system=cu_atoms,
    model=lj_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
    trajectory_reporter=reporter,  # Add the reporter
)

After the simulation is complete, we can analyze the trajectory using the TorchSimTrajectory class. This class provides a simple interface for analyzing the trajectory data.

[7]:

with ts.TorchSimTrajectory(trajectory_file) as traj:
    # Read energy arrays
    kinetic_energies = traj.get_array("kinetic_energy")
    potential_energies = traj.get_array("potential_energy")
    final_energy = potential_energies[-1].item()

    # Get the final atomic configuration
    final_atoms = traj.get_atoms(-1)

print(f"Final potential energy: {final_energy:.6f} eV")
print(f"Shape of kinetic energy array: {kinetic_energies.shape}")

Final potential energy: -0.399215 eV
Shape of kinetic energy array: (3, 1)

Using Machine Learning Potentials¶

TorchSim isn’t limited to classical potentials. It also supports machine learning potentials like MACE for more accurate simulations. Let’s run a similar simulation using MACE:

[8]:

from mace.calculators.foundations_models import mace_mp
from torch_sim.models.mace import MaceModel

# Use CUDA if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the MACE "small" foundation model
mace = mace_mp(model="small", return_raw_model=True)
mace_model = MaceModel(
    model=mace,
    device=device,
    dtype=torch.float64,
    compute_forces=True,
)

# Run the simulation with MACE
final_state = ts.integrate(
    system=cu_atoms,
    model=mace_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
)

final_atoms = final_state.to_atoms()

cuequivariance or cuequivariance_torch is not available. Cuequivariance acceleration will be disabled.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/e3nn/o3/_wigner.py:10: UserWarning: Environment variable TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD detected, since the`weights_only` argument was not explicitly passed to `torch.load`, forcing weights_only=False.
  _Jd, _W3j_flat, _W3j_indices = torch.load(os.path.join(os.path.dirname(__file__), 'constants.pt'))

Using Materials Project MACE for MACECalculator with /home/runner/.cache/mace/20231210mace128L0_energy_epoch249model
Using float32 for MACECalculator, which is faster but less accurate. Recommended for MD. Use float64 for geometry optimization.

/opt/hostedtoolcache/Python/3.12.12/x64/lib/python3.12/site-packages/mace/calculators/foundations_models.py:166: UserWarning: Environment variable TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD detected, since the`weights_only` argument was not explicitly passed to `torch.load`, forcing weights_only=False.
  return torch.load(model_path, map_location=device)

Batch Processing Multiple Systems¶

One of the most powerful features of TorchSim is the ability to simulate multiple systems in parallel. This is especially useful when working with machine learning potentials that benefit from GPU acceleration:

[9]:

fe_atoms = bulk("Fe", "fcc", a=5.26, cubic=True)
fe_atoms_supercell = fe_atoms.repeat([2, 2, 2])
cu_atoms_supercell = cu_atoms.repeat([2, 2, 2])

# Pack them into a list
systems = [cu_atoms, fe_atoms, cu_atoms_supercell, fe_atoms_supercell]

We can simulate all these systems in a single call to integrate:

[10]:

final_state = ts.integrate(
    system=systems,  # List of systems to simulate
    model=mace_model,  # Single model for all systems
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
)

final_atoms = final_state.to_atoms()
print(f"Number of systems simulated: {len(final_atoms)}")
print(f"Number of atoms in last system: {len(final_atoms[3])}")

Number of systems simulated: 4
Number of atoms in last system: 32

Batch Trajectory Reporting¶

When simulating multiple systems, we can save each to its own trajectory file:

[11]:

filenames = [f"tmp/batch_traj_{i}.h5" for i in range(len(systems))]

# Create a reporter that handles multiple trajectories
batch_reporter = ts.TrajectoryReporter(
    filenames,
    state_frequency=10,
    prop_calculators=prop_calculators,
)

# Run the simulation with batch reporting
final_state = ts.integrate(
    system=systems,
    model=mace_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
    trajectory_reporter=batch_reporter,
)

We can analyze each trajectory individually:

[12]:

final_energies_per_atom = []
for sys_idx, filename in enumerate(filenames):
    with ts.TorchSimTrajectory(filename) as traj:
        final_energy = traj.get_array("potential_energy")[-1].item()
        n_atoms = len(traj.get_atoms(-1))
        final_energies_per_atom.append(final_energy / n_atoms)
        print(
            f"System {sys_idx}: {final_energy:.6f} eV, {final_energy / n_atoms:.6f} eV/atom"
        )

System 0: -7.628079 eV, -1.907020 eV/atom
System 1: -22.203321 eV, -5.550830 eV/atom
System 2: -64.678780 eV, -2.021212 eV/atom
System 3: -184.179565 eV, -5.755611 eV/atom

Autobatching¶

The integrate function also supports autobatching, which automatically determines the maximum number of systems that can fit in memory and splits up the systems to make optimal use of the GPU. This abstracts away the complexity of managing memory when running more systems than can fit on the GPU.

Ignore the following cell, it just exists so that the example runs on CPU.

[13]:

ts.autobatching.determine_max_batch_size = lambda *args, **kwargs: 10  # type: ignore[invalid-assignment]

We enable autobatching by simply setting the autobatcher argument to True.

[14]:

final_state = ts.integrate(
    system=systems,
    model=mace_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
    autobatcher=True,
)

Model Memory Estimation: Estimating memory from worst case of largest and smallest system. Largest system has 32 atoms and 1 batches, and smallest system has 4 atoms and 1 batches.

Otherwise, everything else is the same! The integrate function will still report out trajectory data, calculate properties, and return a final state with the correct ordering.

Geometry Optimization¶

In addition to molecular dynamics, TorchSim provides a high-level API for geometry optimization. The optimize function is similar to integrate in that it takes a list of systems and a model and support reporting and autobatching. The key difference is that instead of taking n_steps and temperature, optimize takes a convergence_fn that determines when the optimization is converged. By default, the convergence_fn will wait until the energy difference between steps is less than 1 meV.

Let’s use the optimize function with the FIRE algorithm to relax our structures:

[15]:

final_state = ts.optimize(
    system=systems,
    model=mace_model,
    optimizer=ts.Optimizer.fire,
    init_kwargs=dict(cell_filter=ts.CellFilter.unit),
)

final_atoms = final_state.to_atoms()

Custom Convergence Criteria¶

The optimize function allows us to specify custom convergence criteria. The inputs to the convergence function are state and last_energy. The state is a SimState object that contains the current state of the system and the last_energy is the energy of the previous step. The convergence function should return a boolean tensor of length n_systems.

This is how we’d manually define the default convergence_fn:

[16]:

def default_energy_convergence(state, last_energy):
    # Consider converged when energy change is less than 1e-6 eV
    if last_energy is None:
        return False
    energy_diff = torch.abs(last_energy - state.energy)
    return energy_diff < 1e-6


# we arbitrarily add energy so nothing is converged
convergence_tensor = default_energy_convergence(final_state, final_state.energy + 1)
print(f"Any converged? {torch.any(convergence_tensor).item()}")

Any converged? False

For convenience TorchSim provides constructors for common convergence functions.

[17]:

energy_convergence_fn = ts.generate_energy_convergence_fn(energy_tol=1e-6)
force_convergence_fn = ts.generate_force_convergence_fn(force_tol=1e-3)

# Run optimization with custom convergence
final_state = ts.optimize(
    system=systems,
    model=mace_model,
    optimizer=ts.Optimizer.fire,
    convergence_fn=force_convergence_fn,  # Custom convergence function
    init_kwargs=dict(cell_filter=ts.CellFilter.unit),
)

final_atoms = final_state.to_atoms()

Static Calculations¶

TorchSim also supports static calculations, which are useful for calculating properties across a diverse set of systems without any system evolution. This is a great way to compute elastic properties or run a benchmark against DFT energies.

The static function is similar to integrate and optimize in that it takes a list of structures and a model while supporting batching and reporting. The key difference is that static does not return a final state, but rather a list of dictionaries containing the outputs of any prop_calculators specified in the TrajectoryReporter.

[18]:

prop_calculators = {
    10: {"potential_energy": lambda state: state.energy},
    20: {"stress": lambda state: state.stress},
}

final_results = ts.static(
    system=systems,
    model=mace_model,
    # we don't want to save any trajectories this time, just get the properties
    trajectory_reporter={"filenames": None, "prop_calculators": prop_calculators},
)

print(f"Static returns {len(final_results)} results, one for each system")
print(f"Matches the number of systems? {len(final_results) == len(systems)}")
print(f"len(final_results): {len(final_results)}")
assert len(final_results) == len(systems)

cu_results = final_results[0]
print(
    "The Cu system has a final energy of ",
    cu_results["potential_energy"][-1].item(),
    " eV",
)

Static returns 4 results, one for each system
Matches the number of systems? True
len(final_results): 4
The Cu system has a final energy of  -6.075483250449456  eV

Working with PyMatGen Structures¶

TorchSim supports PyMatGen Structure objects in addition to ASE Atoms objects:

[19]:

from pymatgen.core import Structure

# Define a silicon diamond structure using PyMatGen
lattice = [[5.43, 0, 0], [0, 5.43, 0], [0, 0, 5.43]]
species = ["Si"] * 8
coords = [
    [0.0, 0.0, 0.0],
    [0.25, 0.25, 0.25],
    [0.0, 0.5, 0.5],
    [0.25, 0.75, 0.75],
    [0.5, 0.0, 0.5],
    [0.75, 0.25, 0.75],
    [0.5, 0.5, 0.0],
    [0.75, 0.75, 0.25],
]
structure = Structure(lattice, species, coords)

# Run a simulation starting from the PyMatGen structure
final_state = ts.integrate(
    system=structure,
    model=lj_model,
    integrator=ts.Integrator.nvt_langevin,
    n_steps=n_steps,
    temperature=2000,
    timestep=0.002,
)

# Convert the final state back to a PyMatGen structure
final_structure = final_state.to_structures()

Conclusion¶

TorchSim’s high-level API provides a simple yet powerful interface for running molecular simulations:

The integrate function makes it easy to run MD simulations
The optimize function handles geometry optimization with custom convergence
The static function handles static calculations with batching and reporting
Built-in support for batch processing multiple systems
Seamless integration with trajectory reporting
Compatible with both ASE and PyMatGen structures

By handling the complexity behind the scenes, these high-level functions let you focus on the scientific questions rather than simulation details.

For more advanced use cases, you can still access the lower-level components (integrators, optimizers, etc.) directly, as shown in other tutorials.