settings #

Pydantic models which control/validate the settings.

Classes:

MMMDSamplingSettings –

Settings for molecular dynamics sampling using a molecular mechanics
MLMDSamplingSettings –

Settings for molecular dynamics sampling using a machine learning
MMMDMetadynamicsSamplingSettings –

Settings for molecular dynamics sampling using a molecular mechanics
MMMDMetadynamicsTorsionMinimisationSamplingSettings –

Settings for MM MD metadynamics sampling with additional torsion-restrained
PreComputedDatasetSettings –

Settings for loading pre-computed datasets from disk.
TrainingSettings –

Settings for the training process.
OutlierFilterSettings –

Settings for filtering outliers from datasets based on MM vs MLP differences.
TypeGenerationSettings –

Settings for generating tagged SMARTS types for a given potential type.
MSMSettings –

Settings for the modified Seminario method.
ParameterisationSettings –

Settings for the starting parameterisation.
WorkflowSettings –

Overall settings for the full fitting workflow.

Attributes:

SamplingSettings –

Union type for all sampling settings. See the associated sampling_protocol field

SamplingSettings `module-attribute` #

SamplingSettings = Union[
    MMMDSamplingSettings,
    MLMDSamplingSettings,
    MMMDMetadynamicsSamplingSettings,
    MMMDMetadynamicsTorsionMinimisationSamplingSettings,
    PreComputedDatasetSettings,
]

Union type for all sampling settings. See the associated sampling_protocol field in each class for the string identifier which should be supplied to training_sampling_settings and testing_sampling_settings fields in WorkflowSettings.

_DefaultSettings `pydantic-model` #

Bases: BaseModel, ABC

Default configuration for all models.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Default configuration for all models.",
  "properties": {},
  "title": "_DefaultSettings",
  "type": "object"
}

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

_SamplingSettingsBase `pydantic-model` #

Bases: _DefaultSettings, ABC

Settings for sampling (usually molecular dynamics).

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for sampling (usually molecular dynamics).",
  "properties": {
    "sampling_protocol": {
      "description": "Type of sampling protocol. Each sampling settings subclass should set this to a unique value. This is used as a discriminator when loading from YAML.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "timestep": {
      "description": "MD timestep",
      "title": "Timestep",
      "type": "string"
    },
    "temperature": {
      "description": "Temperature to run MD at",
      "title": "Temperature",
      "type": "string"
    },
    "snapshot_interval": {
      "description": "Interval between saving snapshots during production sampling",
      "title": "Snapshot Interval",
      "type": "string"
    },
    "n_conformers": {
      "default": 10,
      "description": "The number of conformers to generate, from which sampling is started",
      "title": "N Conformers",
      "type": "integer"
    },
    "equilibration_sampling_time_per_conformer": {
      "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
      "title": "Equilibration Sampling Time Per Conformer",
      "type": "string"
    },
    "production_sampling_time_per_conformer": {
      "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
      "title": "Production Sampling Time Per Conformer",
      "type": "string"
    },
    "loss_energy_weight": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for samples from this protocol.",
      "title": "Loss Energy Weight",
      "type": "number"
    },
    "loss_force_weight": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for samples from this protocol.",
      "title": "Loss Force Weight",
      "type": "number"
    }
  },
  "required": [
    "sampling_protocol"
  ],
  "title": "_SamplingSettingsBase",
  "type": "object"
}

Fields:

sampling_protocol (str)
ml_potential (Literal[AvailableModels])
timestep (OpenMMQuantity[femtoseconds])
temperature (OpenMMQuantity[kelvin])
snapshot_interval (OpenMMQuantity[femtoseconds])
n_conformers (int)
equilibration_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
production_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
loss_energy_weight (float)
loss_force_weight (float)

Validators:

validate_sampling_times

sampling_protocol `pydantic-field` #

sampling_protocol: str

Type of sampling protocol. Each sampling settings subclass should set this to a unique value. This is used as a discriminator when loading from YAML.

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating energies and forces of the snapshots. Note that this is not generally the potential used for sampling.

timestep `pydantic-field` #

timestep: OpenMMQuantity[femtoseconds] = 1 * femtoseconds

MD timestep

temperature `pydantic-field` #

temperature: OpenMMQuantity[kelvin] = 500 * kelvin

Temperature to run MD at

snapshot_interval `pydantic-field` #

snapshot_interval: OpenMMQuantity[femtoseconds] = (
    0.5 * picoseconds
)

Interval between saving snapshots during production sampling

n_conformers `pydantic-field` #

n_conformers: int = 10

The number of conformers to generate, from which sampling is started

equilibration_sampling_time_per_conformer `pydantic-field` #

equilibration_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (0.0 * picoseconds)

Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.

production_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (100 * picoseconds)

Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.

loss_energy_weight `pydantic-field` #

loss_energy_weight: float = 1000.0

Scaling factor for the energy loss term for samples from this protocol.

loss_force_weight `pydantic-field` #

loss_force_weight: float = 0.1

Scaling factor for the force loss term for samples from this protocol.

validate_sampling_times `pydantic-validator` #

validate_sampling_times() -> Self

Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_sampling_times(self) -> Self:
    """Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval."""
    for time, name in [
        (
            self.equilibration_sampling_time_per_conformer,
            "equilibration_sampling_time_per_conformer",
        ),
        (
            self.production_sampling_time_per_conformer,
            "production_sampling_time_per_conformer",
        ),
    ]:
        n_steps = time / self.timestep
        if not n_steps.is_integer():
            raise InvalidSettingsError(
                f"{name} ({time}) must be divisible by the timestep ({self.timestep})."
            )

    # Additionally check that production sampling time divides by snapshot interval
    time = self.production_sampling_time_per_conformer / self.snapshot_interval
    if not n_steps.is_integer():
        raise InvalidSettingsError(
            f"production_sampling_time_per_conformer ({time}) must be divisible by the snapshot_interval ({self.snapshot_interval})."
        )

    return self

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

MMMDSamplingSettings `pydantic-model` #

Bases: _SamplingSettingsBase

Settings for molecular dynamics sampling using a molecular mechanics force field. This is initally the force field supplined in the parameterisation settings, but is updated as the bespoke force field is trained.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for molecular dynamics sampling using a molecular mechanics\nforce field. This is initally the force field supplined in the parameterisation\nsettings, but is updated as the bespoke force field is trained.",
  "properties": {
    "sampling_protocol": {
      "const": "mm_md",
      "default": "mm_md",
      "description": "Sampling protocol to use.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "timestep": {
      "description": "MD timestep",
      "title": "Timestep",
      "type": "string"
    },
    "temperature": {
      "description": "Temperature to run MD at",
      "title": "Temperature",
      "type": "string"
    },
    "snapshot_interval": {
      "description": "Interval between saving snapshots during production sampling",
      "title": "Snapshot Interval",
      "type": "string"
    },
    "n_conformers": {
      "default": 10,
      "description": "The number of conformers to generate, from which sampling is started",
      "title": "N Conformers",
      "type": "integer"
    },
    "equilibration_sampling_time_per_conformer": {
      "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
      "title": "Equilibration Sampling Time Per Conformer",
      "type": "string"
    },
    "production_sampling_time_per_conformer": {
      "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
      "title": "Production Sampling Time Per Conformer",
      "type": "string"
    },
    "loss_energy_weight": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for samples from this protocol.",
      "title": "Loss Energy Weight",
      "type": "number"
    },
    "loss_force_weight": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for samples from this protocol.",
      "title": "Loss Force Weight",
      "type": "number"
    }
  },
  "title": "MMMDSamplingSettings",
  "type": "object"
}

Fields:

ml_potential (Literal[AvailableModels])
timestep (OpenMMQuantity[femtoseconds])
temperature (OpenMMQuantity[kelvin])
snapshot_interval (OpenMMQuantity[femtoseconds])
n_conformers (int)
equilibration_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
production_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
loss_energy_weight (float)
loss_force_weight (float)
sampling_protocol (Literal['mm_md'])

Validators:

validate_sampling_times

sampling_protocol `pydantic-field` #

sampling_protocol: Literal['mm_md'] = 'mm_md'

Sampling protocol to use.

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating energies and forces of the snapshots. Note that this is not generally the potential used for sampling.

timestep `pydantic-field` #

timestep: OpenMMQuantity[femtoseconds] = 1 * femtoseconds

MD timestep

temperature `pydantic-field` #

temperature: OpenMMQuantity[kelvin] = 500 * kelvin

Temperature to run MD at

snapshot_interval `pydantic-field` #

snapshot_interval: OpenMMQuantity[femtoseconds] = (
    0.5 * picoseconds
)

Interval between saving snapshots during production sampling

n_conformers `pydantic-field` #

n_conformers: int = 10

The number of conformers to generate, from which sampling is started

equilibration_sampling_time_per_conformer `pydantic-field` #

equilibration_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (0.0 * picoseconds)

Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.

production_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (100 * picoseconds)

Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.

loss_energy_weight `pydantic-field` #

loss_energy_weight: float = 1000.0

Scaling factor for the energy loss term for samples from this protocol.

loss_force_weight `pydantic-field` #

loss_force_weight: float = 0.1

Scaling factor for the force loss term for samples from this protocol.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

validate_sampling_times `pydantic-validator` #

validate_sampling_times() -> Self

Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_sampling_times(self) -> Self:
    """Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval."""
    for time, name in [
        (
            self.equilibration_sampling_time_per_conformer,
            "equilibration_sampling_time_per_conformer",
        ),
        (
            self.production_sampling_time_per_conformer,
            "production_sampling_time_per_conformer",
        ),
    ]:
        n_steps = time / self.timestep
        if not n_steps.is_integer():
            raise InvalidSettingsError(
                f"{name} ({time}) must be divisible by the timestep ({self.timestep})."
            )

    # Additionally check that production sampling time divides by snapshot interval
    time = self.production_sampling_time_per_conformer / self.snapshot_interval
    if not n_steps.is_integer():
        raise InvalidSettingsError(
            f"production_sampling_time_per_conformer ({time}) must be divisible by the snapshot_interval ({self.snapshot_interval})."
        )

    return self

MLMDSamplingSettings `pydantic-model` #

Bases: _SamplingSettingsBase

Settings for molecular dynamics sampling using a machine learning potential. This protocol uses the ML reference potential for sampling as well as for energy and force calculations.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for molecular dynamics sampling using a machine learning\npotential. This protocol uses the ML reference potential for sampling as\nwell as for energy and force calculations.",
  "properties": {
    "sampling_protocol": {
      "const": "ml_md",
      "default": "ml_md",
      "description": "Sampling protocol to use.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "timestep": {
      "description": "MD timestep",
      "title": "Timestep",
      "type": "string"
    },
    "temperature": {
      "description": "Temperature to run MD at",
      "title": "Temperature",
      "type": "string"
    },
    "snapshot_interval": {
      "description": "Interval between saving snapshots during production sampling",
      "title": "Snapshot Interval",
      "type": "string"
    },
    "n_conformers": {
      "default": 10,
      "description": "The number of conformers to generate, from which sampling is started",
      "title": "N Conformers",
      "type": "integer"
    },
    "equilibration_sampling_time_per_conformer": {
      "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
      "title": "Equilibration Sampling Time Per Conformer",
      "type": "string"
    },
    "production_sampling_time_per_conformer": {
      "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
      "title": "Production Sampling Time Per Conformer",
      "type": "string"
    },
    "loss_energy_weight": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for samples from this protocol.",
      "title": "Loss Energy Weight",
      "type": "number"
    },
    "loss_force_weight": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for samples from this protocol.",
      "title": "Loss Force Weight",
      "type": "number"
    }
  },
  "title": "MLMDSamplingSettings",
  "type": "object"
}

Fields:

ml_potential (Literal[AvailableModels])
timestep (OpenMMQuantity[femtoseconds])
temperature (OpenMMQuantity[kelvin])
snapshot_interval (OpenMMQuantity[femtoseconds])
n_conformers (int)
equilibration_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
production_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
loss_energy_weight (float)
loss_force_weight (float)
sampling_protocol (Literal['ml_md'])

Validators:

validate_sampling_times

sampling_protocol `pydantic-field` #

sampling_protocol: Literal['ml_md'] = 'ml_md'

Sampling protocol to use.

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating energies and forces of the snapshots. Note that this is not generally the potential used for sampling.

timestep `pydantic-field` #

timestep: OpenMMQuantity[femtoseconds] = 1 * femtoseconds

MD timestep

temperature `pydantic-field` #

temperature: OpenMMQuantity[kelvin] = 500 * kelvin

Temperature to run MD at

snapshot_interval `pydantic-field` #

snapshot_interval: OpenMMQuantity[femtoseconds] = (
    0.5 * picoseconds
)

Interval between saving snapshots during production sampling

n_conformers `pydantic-field` #

n_conformers: int = 10

The number of conformers to generate, from which sampling is started

equilibration_sampling_time_per_conformer `pydantic-field` #

equilibration_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (0.0 * picoseconds)

Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.

production_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (100 * picoseconds)

Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.

loss_energy_weight `pydantic-field` #

loss_energy_weight: float = 1000.0

Scaling factor for the energy loss term for samples from this protocol.

loss_force_weight `pydantic-field` #

loss_force_weight: float = 0.1

Scaling factor for the force loss term for samples from this protocol.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

validate_sampling_times `pydantic-validator` #

validate_sampling_times() -> Self

Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_sampling_times(self) -> Self:
    """Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval."""
    for time, name in [
        (
            self.equilibration_sampling_time_per_conformer,
            "equilibration_sampling_time_per_conformer",
        ),
        (
            self.production_sampling_time_per_conformer,
            "production_sampling_time_per_conformer",
        ),
    ]:
        n_steps = time / self.timestep
        if not n_steps.is_integer():
            raise InvalidSettingsError(
                f"{name} ({time}) must be divisible by the timestep ({self.timestep})."
            )

    # Additionally check that production sampling time divides by snapshot interval
    time = self.production_sampling_time_per_conformer / self.snapshot_interval
    if not n_steps.is_integer():
        raise InvalidSettingsError(
            f"production_sampling_time_per_conformer ({time}) must be divisible by the snapshot_interval ({self.snapshot_interval})."
        )

    return self

MMMDMetadynamicsSamplingSettings `pydantic-model` #

Bases: _SamplingSettingsBase

Settings for molecular dynamics sampling using a molecular mechanics force field with metadynamics. This is initally the force field supplined in the parameterisation settings, but is updated as the bespoke force field is trained.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for molecular dynamics sampling using a molecular mechanics\nforce field with metadynamics. This is initally the force field supplined in the parameterisation\nsettings, but is updated as the bespoke force field is trained.",
  "properties": {
    "sampling_protocol": {
      "const": "mm_md_metadynamics",
      "default": "mm_md_metadynamics",
      "description": "Sampling protocol to use.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "timestep": {
      "description": "MD timestep",
      "title": "Timestep",
      "type": "string"
    },
    "temperature": {
      "description": "Temperature to run MD at",
      "title": "Temperature",
      "type": "string"
    },
    "snapshot_interval": {
      "description": "Interval between saving snapshots during production sampling",
      "title": "Snapshot Interval",
      "type": "string"
    },
    "n_conformers": {
      "default": 10,
      "description": "The number of conformers to generate, from which sampling is started",
      "title": "N Conformers",
      "type": "integer"
    },
    "equilibration_sampling_time_per_conformer": {
      "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
      "title": "Equilibration Sampling Time Per Conformer",
      "type": "string"
    },
    "production_sampling_time_per_conformer": {
      "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
      "title": "Production Sampling Time Per Conformer",
      "type": "string"
    },
    "loss_energy_weight": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for samples from this protocol.",
      "title": "Loss Energy Weight",
      "type": "number"
    },
    "loss_force_weight": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for samples from this protocol.",
      "title": "Loss Force Weight",
      "type": "number"
    },
    "bias_width": {
      "default": 0.3141592653589793,
      "description": "Width of the bias (in radians)",
      "title": "Bias Width",
      "type": "number"
    },
    "bias_factor": {
      "default": 20.0,
      "description": "Bias factor for well-tempered metadynamics. Typical range: 5-20",
      "title": "Bias Factor",
      "type": "number"
    },
    "bias_height": {
      "description": "Initial height of the bias",
      "title": "Bias Height",
      "type": "string"
    },
    "bias_frequency": {
      "description": "Frequency at which to add bias",
      "title": "Bias Frequency",
      "type": "string"
    },
    "bias_save_frequency": {
      "description": "Frequency at which to save the bias",
      "title": "Bias Save Frequency",
      "type": "string"
    },
    "torsions_to_include_smarts": {
      "description": "SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.",
      "items": {
        "type": "string"
      },
      "title": "Torsions To Include Smarts",
      "type": "array"
    },
    "torsions_to_exclude_smarts": {
      "description": "SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.",
      "items": {
        "type": "string"
      },
      "title": "Torsions To Exclude Smarts",
      "type": "array"
    }
  },
  "title": "MMMDMetadynamicsSamplingSettings",
  "type": "object"
}

Fields:

ml_potential (Literal[AvailableModels])
timestep (OpenMMQuantity[femtoseconds])
temperature (OpenMMQuantity[kelvin])
snapshot_interval (OpenMMQuantity[femtoseconds])
n_conformers (int)
equilibration_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
production_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
loss_energy_weight (float)
loss_force_weight (float)
sampling_protocol (Literal['mm_md_metadynamics'])
bias_width (float)
bias_factor (float)
bias_height (OpenMMQuantity[kilojoules_per_mole])
bias_frequency (OpenMMQuantity[picoseconds])
bias_save_frequency (OpenMMQuantity[picoseconds])
torsions_to_include_smarts (list[str])
torsions_to_exclude_smarts (list[str])

Validators:

validate_sampling_times
validate_frequencies

sampling_protocol `pydantic-field` #

sampling_protocol: Literal["mm_md_metadynamics"] = (
    "mm_md_metadynamics"
)

Sampling protocol to use.

bias_width `pydantic-field` #

bias_width: float = pi / 10

Width of the bias (in radians)

bias_factor `pydantic-field` #

bias_factor: float = 20.0

Bias factor for well-tempered metadynamics. Typical range: 5-20

bias_height `pydantic-field` #

bias_height: OpenMMQuantity[kilojoules_per_mole] = (
    1.0 * kilojoules_per_mole
)

Initial height of the bias

bias_frequency `pydantic-field` #

bias_frequency: OpenMMQuantity[picoseconds] = (
    0.1 * picoseconds
)

Frequency at which to add bias

bias_save_frequency `pydantic-field` #

bias_save_frequency: OpenMMQuantity[picoseconds] = (
    10 * picoseconds
)

Frequency at which to save the bias

torsions_to_include_smarts `pydantic-field` #

torsions_to_include_smarts: list[str]

SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.

torsions_to_exclude_smarts `pydantic-field` #

torsions_to_exclude_smarts: list[str]

SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating energies and forces of the snapshots. Note that this is not generally the potential used for sampling.

timestep `pydantic-field` #

timestep: OpenMMQuantity[femtoseconds] = 1 * femtoseconds

MD timestep

temperature `pydantic-field` #

temperature: OpenMMQuantity[kelvin] = 500 * kelvin

Temperature to run MD at

snapshot_interval `pydantic-field` #

snapshot_interval: OpenMMQuantity[femtoseconds] = (
    0.5 * picoseconds
)

Interval between saving snapshots during production sampling

n_conformers `pydantic-field` #

n_conformers: int = 10

The number of conformers to generate, from which sampling is started

equilibration_sampling_time_per_conformer `pydantic-field` #

equilibration_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (0.0 * picoseconds)

Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.

production_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (100 * picoseconds)

Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.

loss_energy_weight `pydantic-field` #

loss_energy_weight: float = 1000.0

Scaling factor for the energy loss term for samples from this protocol.

loss_force_weight `pydantic-field` #

loss_force_weight: float = 0.1

Scaling factor for the force loss term for samples from this protocol.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

validate_sampling_times `pydantic-validator` #

validate_sampling_times() -> Self

Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_sampling_times(self) -> Self:
    """Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval."""
    for time, name in [
        (
            self.equilibration_sampling_time_per_conformer,
            "equilibration_sampling_time_per_conformer",
        ),
        (
            self.production_sampling_time_per_conformer,
            "production_sampling_time_per_conformer",
        ),
    ]:
        n_steps = time / self.timestep
        if not n_steps.is_integer():
            raise InvalidSettingsError(
                f"{name} ({time}) must be divisible by the timestep ({self.timestep})."
            )

    # Additionally check that production sampling time divides by snapshot interval
    time = self.production_sampling_time_per_conformer / self.snapshot_interval
    if not n_steps.is_integer():
        raise InvalidSettingsError(
            f"production_sampling_time_per_conformer ({time}) must be divisible by the snapshot_interval ({self.snapshot_interval})."
        )

    return self

MMMDMetadynamicsTorsionMinimisationSamplingSettings `pydantic-model` #

Bases: MMMDMetadynamicsSamplingSettings

Settings for MM MD metadynamics sampling with additional torsion-restrained minimisation structures. This extends MMMDMetadynamicsSamplingSettings by generating additional training data from torsion-restrained minimisations.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for MM MD metadynamics sampling with additional torsion-restrained\nminimisation structures. This extends MMMDMetadynamicsSamplingSettings by generating\nadditional training data from torsion-restrained minimisations.",
  "properties": {
    "sampling_protocol": {
      "const": "mm_md_metadynamics_torsion_minimisation",
      "default": "mm_md_metadynamics_torsion_minimisation",
      "description": "Sampling protocol to use.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "timestep": {
      "description": "MD timestep",
      "title": "Timestep",
      "type": "string"
    },
    "temperature": {
      "description": "Temperature to run MD at",
      "title": "Temperature",
      "type": "string"
    },
    "snapshot_interval": {
      "description": "Interval between saving snapshots during production sampling",
      "title": "Snapshot Interval",
      "type": "string"
    },
    "n_conformers": {
      "default": 10,
      "description": "The number of conformers to generate, from which sampling is started",
      "title": "N Conformers",
      "type": "integer"
    },
    "equilibration_sampling_time_per_conformer": {
      "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
      "title": "Equilibration Sampling Time Per Conformer",
      "type": "string"
    },
    "production_sampling_time_per_conformer": {
      "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
      "title": "Production Sampling Time Per Conformer",
      "type": "string"
    },
    "loss_energy_weight": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for samples from this protocol.",
      "title": "Loss Energy Weight",
      "type": "number"
    },
    "loss_force_weight": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for samples from this protocol.",
      "title": "Loss Force Weight",
      "type": "number"
    },
    "bias_width": {
      "default": 0.3141592653589793,
      "description": "Width of the bias (in radians)",
      "title": "Bias Width",
      "type": "number"
    },
    "bias_factor": {
      "default": 20.0,
      "description": "Bias factor for well-tempered metadynamics. Typical range: 5-20",
      "title": "Bias Factor",
      "type": "number"
    },
    "bias_height": {
      "description": "Initial height of the bias",
      "title": "Bias Height",
      "type": "string"
    },
    "bias_frequency": {
      "description": "Frequency at which to add bias",
      "title": "Bias Frequency",
      "type": "string"
    },
    "bias_save_frequency": {
      "description": "Frequency at which to save the bias",
      "title": "Bias Save Frequency",
      "type": "string"
    },
    "torsions_to_include_smarts": {
      "description": "SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.",
      "items": {
        "type": "string"
      },
      "title": "Torsions To Include Smarts",
      "type": "array"
    },
    "torsions_to_exclude_smarts": {
      "description": "SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.",
      "items": {
        "type": "string"
      },
      "title": "Torsions To Exclude Smarts",
      "type": "array"
    },
    "ml_minimisation_steps": {
      "default": 10,
      "description": "Number of MLP minimisation steps with restrained torsions.",
      "title": "Ml Minimisation Steps",
      "type": "integer"
    },
    "mm_minimisation_steps": {
      "default": 10,
      "description": "Number of MM minimisation steps with restrained torsions.",
      "title": "Mm Minimisation Steps",
      "type": "integer"
    },
    "torsion_restraint_force_constant": {
      "description": "Force constant for torsion restraints.",
      "title": "Torsion Restraint Force Constant",
      "type": "string"
    },
    "map_ml_coords_energy_to_mm_coords_energy": {
      "default": false,
      "description": "Whether to substitute the MLP energy for the MM-minimised coordinates with the MLP energy for the corresponding MLP-minimised coordinates.",
      "title": "Map Ml Coords Energy To Mm Coords Energy",
      "type": "boolean"
    },
    "loss_energy_weight_mm_torsion_min": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.",
      "title": "Loss Energy Weight Mm Torsion Min",
      "type": "number"
    },
    "loss_force_weight_mm_torsion_min": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.",
      "title": "Loss Force Weight Mm Torsion Min",
      "type": "number"
    },
    "loss_energy_weight_ml_torsion_min": {
      "default": 1000.0,
      "description": "Scaling factor for the energy loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.",
      "title": "Loss Energy Weight Ml Torsion Min",
      "type": "number"
    },
    "loss_force_weight_ml_torsion_min": {
      "default": 0.1,
      "description": "Scaling factor for the force loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.",
      "title": "Loss Force Weight Ml Torsion Min",
      "type": "number"
    }
  },
  "title": "MMMDMetadynamicsTorsionMinimisationSamplingSettings",
  "type": "object"
}

Fields:

ml_potential (Literal[AvailableModels])
timestep (OpenMMQuantity[femtoseconds])
temperature (OpenMMQuantity[kelvin])
snapshot_interval (OpenMMQuantity[femtoseconds])
n_conformers (int)
equilibration_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
production_sampling_time_per_conformer (OpenMMQuantity[picoseconds])
loss_energy_weight (float)
loss_force_weight (float)
bias_width (float)
bias_factor (float)
bias_height (OpenMMQuantity[kilojoules_per_mole])
bias_frequency (OpenMMQuantity[picoseconds])
bias_save_frequency (OpenMMQuantity[picoseconds])
torsions_to_include_smarts (list[str])
torsions_to_exclude_smarts (list[str])
sampling_protocol (Literal['mm_md_metadynamics_torsion_minimisation'])
ml_minimisation_steps (int)
mm_minimisation_steps (int)
torsion_restraint_force_constant (OpenMMQuantity[kilojoules_per_mole / radian ** 2])
map_ml_coords_energy_to_mm_coords_energy (bool)
loss_energy_weight_mm_torsion_min (float)
loss_force_weight_mm_torsion_min (float)
loss_energy_weight_ml_torsion_min (float)
loss_force_weight_ml_torsion_min (float)

Validators:

validate_sampling_times
validate_frequencies

sampling_protocol `pydantic-field` #

sampling_protocol: Literal[
    "mm_md_metadynamics_torsion_minimisation"
] = "mm_md_metadynamics_torsion_minimisation"

Sampling protocol to use.

ml_minimisation_steps `pydantic-field` #

ml_minimisation_steps: int = 10

Number of MLP minimisation steps with restrained torsions.

mm_minimisation_steps `pydantic-field` #

mm_minimisation_steps: int = 10

Number of MM minimisation steps with restrained torsions.

torsion_restraint_force_constant `pydantic-field` #

torsion_restraint_force_constant: OpenMMQuantity[
    kilojoules_per_mole / radian**2
] = (0.0 * kilojoules_per_mole / radian**2)

Force constant for torsion restraints.

map_ml_coords_energy_to_mm_coords_energy `pydantic-field` #

map_ml_coords_energy_to_mm_coords_energy: bool = False

Whether to substitute the MLP energy for the MM-minimised coordinates with the MLP energy for the corresponding MLP-minimised coordinates.

loss_energy_weight_mm_torsion_min `pydantic-field` #

loss_energy_weight_mm_torsion_min: float = 1000.0

Scaling factor for the energy loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.

loss_force_weight_mm_torsion_min `pydantic-field` #

loss_force_weight_mm_torsion_min: float = 0.1

Scaling factor for the force loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.

loss_energy_weight_ml_torsion_min `pydantic-field` #

loss_energy_weight_ml_torsion_min: float = 1000.0

Scaling factor for the energy loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.

loss_force_weight_ml_torsion_min `pydantic-field` #

loss_force_weight_ml_torsion_min: float = 0.1

Scaling factor for the force loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating energies and forces of the snapshots. Note that this is not generally the potential used for sampling.

timestep `pydantic-field` #

timestep: OpenMMQuantity[femtoseconds] = 1 * femtoseconds

MD timestep

temperature `pydantic-field` #

temperature: OpenMMQuantity[kelvin] = 500 * kelvin

Temperature to run MD at

snapshot_interval `pydantic-field` #

snapshot_interval: OpenMMQuantity[femtoseconds] = (
    0.5 * picoseconds
)

Interval between saving snapshots during production sampling

n_conformers `pydantic-field` #

n_conformers: int = 10

The number of conformers to generate, from which sampling is started

equilibration_sampling_time_per_conformer `pydantic-field` #

equilibration_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (0.0 * picoseconds)

Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.

production_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer: OpenMMQuantity[
    picoseconds
] = (100 * picoseconds)

Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.

loss_energy_weight `pydantic-field` #

loss_energy_weight: float = 1000.0

Scaling factor for the energy loss term for samples from this protocol.

loss_force_weight `pydantic-field` #

loss_force_weight: float = 0.1

Scaling factor for the force loss term for samples from this protocol.

bias_width `pydantic-field` #

bias_width: float = pi / 10

Width of the bias (in radians)

bias_factor `pydantic-field` #

bias_factor: float = 20.0

Bias factor for well-tempered metadynamics. Typical range: 5-20

bias_height `pydantic-field` #

bias_height: OpenMMQuantity[kilojoules_per_mole] = (
    1.0 * kilojoules_per_mole
)

Initial height of the bias

bias_frequency `pydantic-field` #

bias_frequency: OpenMMQuantity[picoseconds] = (
    0.1 * picoseconds
)

Frequency at which to add bias

bias_save_frequency `pydantic-field` #

bias_save_frequency: OpenMMQuantity[picoseconds] = (
    10 * picoseconds
)

Frequency at which to save the bias

torsions_to_include_smarts `pydantic-field` #

torsions_to_include_smarts: list[str]

SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.

torsions_to_exclude_smarts `pydantic-field` #

torsions_to_exclude_smarts: list[str]

SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

validate_sampling_times `pydantic-validator` #

validate_sampling_times() -> Self

Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_sampling_times(self) -> Self:
    """Ensure that the sampling times divide exactly by the timestep and (for production) the snapshot interval."""
    for time, name in [
        (
            self.equilibration_sampling_time_per_conformer,
            "equilibration_sampling_time_per_conformer",
        ),
        (
            self.production_sampling_time_per_conformer,
            "production_sampling_time_per_conformer",
        ),
    ]:
        n_steps = time / self.timestep
        if not n_steps.is_integer():
            raise InvalidSettingsError(
                f"{name} ({time}) must be divisible by the timestep ({self.timestep})."
            )

    # Additionally check that production sampling time divides by snapshot interval
    time = self.production_sampling_time_per_conformer / self.snapshot_interval
    if not n_steps.is_integer():
        raise InvalidSettingsError(
            f"production_sampling_time_per_conformer ({time}) must be divisible by the snapshot_interval ({self.snapshot_interval})."
        )

    return self

PreComputedDatasetSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for loading pre-computed datasets from disk.

For single-molecule fits, provide a single Path. For multi-molecule fits, provide a list of Paths (one per molecule).

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for loading pre-computed datasets from disk.\n\nFor single-molecule fits, provide a single Path.\nFor multi-molecule fits, provide a list of Paths (one per molecule).",
  "properties": {
    "sampling_protocol": {
      "const": "pre_computed",
      "default": "pre_computed",
      "description": "Sampling protocol identifier.",
      "title": "Sampling Protocol",
      "type": "string"
    },
    "dataset_paths": {
      "description": "Path(s) to pre-computed dataset(s) saved with dataset.save_to_disk(). For single-molecule fits, provide a single Path. For multi-molecule fits, provide a list of Paths (one per molecule in order).",
      "items": {
        "format": "path",
        "type": "string"
      },
      "title": "Dataset Paths",
      "type": "array"
    }
  },
  "required": [
    "dataset_paths"
  ],
  "title": "PreComputedDatasetSettings",
  "type": "object"
}

Fields:

sampling_protocol (Literal['pre_computed'])
dataset_paths (list[Path])

Validators:

normalize_dataset_paths → dataset_paths

sampling_protocol `pydantic-field` #

sampling_protocol: Literal['pre_computed'] = 'pre_computed'

Sampling protocol identifier.

output_types `property` #

output_types: set[OutputType]

Pre-computed datasets don't produce any output files.

normalize_dataset_paths `pydantic-validator` #

normalize_dataset_paths(
    value: Path | list[Path],
) -> list[Path]

Normalize dataset_paths to always be a list internally.

Source code in presto/settings.py

@field_validator("dataset_paths", mode="before")
@classmethod
def normalize_dataset_paths(cls, value: Path | list[Path]) -> list[Path]:
    """Normalize dataset_paths to always be a list internally."""
    if isinstance(value, (str, Path)):
        return [Path(value)]
    return [Path(p) for p in value]

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

TrainingSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for the training process.

Show JSON schema:

{
  "$defs": {
    "AttributeConfig": {
      "description": "Configuration for how a potential's attributes should be trained.",
      "properties": {
        "cols": {
          "description": "The parameters to train, e.g. 'k', 'length', 'epsilon'.",
          "items": {
            "type": "string"
          },
          "title": "Cols",
          "type": "array"
        },
        "scales": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The scales to apply to each parameter, e.g. 'k': 1.0, 'length': 1.0, 'epsilon': 1.0.",
          "title": "Scales",
          "type": "object"
        },
        "limits": {
          "additionalProperties": {
            "maxItems": 2,
            "minItems": 2,
            "prefixItems": [
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              },
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              }
            ],
            "type": "array"
          },
          "default": {},
          "description": "The min and max values to clamp each parameter within, e.g. 'k': (0.0, None), 'angle': (0.0, pi), 'epsilon': (0.0, None), where none indicates no constraint.",
          "title": "Limits",
          "type": "object"
        },
        "regularize": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The regularization strength to apply to each parameter, e.g. 'k': 0.01, 'epsilon': 0.001. Parameters not listed are not regularized.",
          "title": "Regularize",
          "type": "object"
        }
      },
      "required": [
        "cols"
      ],
      "title": "AttributeConfig",
      "type": "object"
    },
    "ParameterConfig": {
      "description": "Configuration for how a potential's parameters should be trained.",
      "properties": {
        "cols": {
          "description": "The parameters to train, e.g. 'k', 'length', 'epsilon'.",
          "items": {
            "type": "string"
          },
          "title": "Cols",
          "type": "array"
        },
        "scales": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The scales to apply to each parameter, e.g. 'k': 1.0, 'length': 1.0, 'epsilon': 1.0.",
          "title": "Scales",
          "type": "object"
        },
        "limits": {
          "additionalProperties": {
            "maxItems": 2,
            "minItems": 2,
            "prefixItems": [
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              },
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              }
            ],
            "type": "array"
          },
          "default": {},
          "description": "The min and max values to clamp each parameter within, e.g. 'k': (0.0, None), 'angle': (0.0, pi), 'epsilon': (0.0, None), where none indicates no constraint.",
          "title": "Limits",
          "type": "object"
        },
        "regularize": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The regularization strength to apply to each parameter, e.g. 'k': 0.01, 'epsilon': 0.001. Parameters not listed are not regularized.",
          "title": "Regularize",
          "type": "object"
        },
        "include": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/_PotentialKey"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The keys (see ``smee.TensorPotential.parameter_keys`` for details) corresponding to specific parameters to be trained. If ``None``, all parameters will be trained.",
          "title": "Include"
        },
        "exclude": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/_PotentialKey"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The keys (see ``smee.TensorPotential.parameter_keys`` for details) corresponding to specific parameters to be excluded from training. If ``None``, no parameters will be excluded.",
          "title": "Exclude"
        }
      },
      "required": [
        "cols"
      ],
      "title": "ParameterConfig",
      "type": "object"
    },
    "_PotentialKey": {
      "description": "TODO: Needed until interchange upgrades to pydantic >=2",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "mult": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Mult"
        },
        "associated_handler": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Associated Handler"
        },
        "bond_order": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Bond Order"
        }
      },
      "required": [
        "id"
      ],
      "title": "_PotentialKey",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Settings for the training process.",
  "properties": {
    "optimiser": {
      "default": "adam",
      "description": "Optimiser to use for the training. 'adam' is Adam, 'lm' is Levenberg-Marquardt",
      "enum": [
        "adam",
        "lm"
      ],
      "title": "Optimiser",
      "type": "string"
    },
    "parameter_configs": {
      "additionalProperties": {
        "$ref": "#/$defs/ParameterConfig"
      },
      "description": "Configuration for the force field parameters to be trained.",
      "propertyNames": {
        "enum": [
          "Bonds",
          "LinearBonds",
          "Angles",
          "LinearAngles",
          "ProperTorsions",
          "ImproperTorsions"
        ]
      },
      "title": "Parameter Configs",
      "type": "object"
    },
    "attribute_configs": {
      "additionalProperties": {
        "$ref": "#/$defs/AttributeConfig"
      },
      "default": {},
      "description": "Configuration for the force field attributes to be trained. This allows 1-4 scaling for 'vdW' and 'Electrostatics' to be trained.",
      "propertyNames": {
        "enum": [
          "vdW",
          "Electrostatics"
        ]
      },
      "title": "Attribute Configs",
      "type": "object"
    },
    "n_epochs": {
      "default": 1000,
      "description": "Number of epochs in the ML fit",
      "title": "N Epochs",
      "type": "integer"
    },
    "learning_rate": {
      "default": 0.01,
      "description": "Learning Rate in the ML fit",
      "title": "Learning Rate",
      "type": "number"
    },
    "learning_rate_decay": {
      "default": 1.0,
      "description": "Learning Rate Decay. 0.99 is 1%, and 1.0 is no decay.",
      "title": "Learning Rate Decay",
      "type": "number"
    },
    "learning_rate_decay_step": {
      "default": 10,
      "description": "Learning Rate Decay Step",
      "title": "Learning Rate Decay Step",
      "type": "integer"
    },
    "regularisation_target": {
      "default": "initial",
      "description": "Target value to regularise parameters towards. 'initial' is the initial parameter value, 'zero' is zero.",
      "enum": [
        "initial",
        "zero"
      ],
      "title": "Regularisation Target",
      "type": "string"
    }
  },
  "title": "TrainingSettings",
  "type": "object"
}

Fields:

optimiser (OptimiserName)
parameter_configs (dict[ValenceType, ParameterConfig])
attribute_configs (dict[AllowedAttributeType, AttributeConfig])
n_epochs (int)
learning_rate (float)
learning_rate_decay (float)
learning_rate_decay_step (int)
regularisation_target (Literal['initial', 'zero'])

optimiser `pydantic-field` #

optimiser: OptimiserName = 'adam'

Optimiser to use for the training. 'adam' is Adam, 'lm' is Levenberg-Marquardt

parameter_configs `pydantic-field` #

parameter_configs: dict[ValenceType, ParameterConfig]

Configuration for the force field parameters to be trained.

attribute_configs `pydantic-field` #

attribute_configs: dict[
    AllowedAttributeType, AttributeConfig
] = {}

Configuration for the force field attributes to be trained. This allows 1-4 scaling for 'vdW' and 'Electrostatics' to be trained.

n_epochs `pydantic-field` #

n_epochs: int = 1000

Number of epochs in the ML fit

learning_rate `pydantic-field` #

learning_rate: float = 0.01

Learning Rate in the ML fit

learning_rate_decay `pydantic-field` #

learning_rate_decay: float = 1.0

Learning Rate Decay. 0.99 is 1%, and 1.0 is no decay.

learning_rate_decay_step `pydantic-field` #

learning_rate_decay_step: int = 10

Learning Rate Decay Step

regularisation_target `pydantic-field` #

regularisation_target: Literal["initial", "zero"] = (
    "initial"
)

Target value to regularise parameters towards. 'initial' is the initial parameter value, 'zero' is zero.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

OutlierFilterSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for filtering outliers from datasets based on MM vs MLP differences.

Outliers are identified by comparing MM and reference (typically MLP) energies and forces. Conformations where the absolute difference exceeds a threshold are removed.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for filtering outliers from datasets based on MM vs MLP differences.\n\nOutliers are identified by comparing MM and reference (typically MLP) energies\nand forces. Conformations where the absolute difference exceeds a threshold\nare removed.",
  "properties": {
    "energy_outlier_threshold": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": 2.0,
      "description": "Absolute threshold in kcal/mol/atom for energy outlier detection. Conformations where |energy_mm - energy_ref| / n_atoms (relative to minimum) exceeds this threshold will be removed. Set to None to disable energy-based filtering.",
      "title": "Energy Outlier Threshold"
    },
    "force_outlier_threshold": {
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "null"
        }
      ],
      "default": 500.0,
      "description": "Absolute threshold in kcal/mol/\u00c5 for force outlier detection. Conformations where max |force_mm - force_ref| exceeds this threshold will be removed. Set to None to disable force-based filtering.",
      "title": "Force Outlier Threshold"
    },
    "min_conformations": {
      "default": 1,
      "description": "Minimum number of conformations to keep per molecule. If filtering would remove too many conformations, all conformations will be kept for that molecule.",
      "title": "Min Conformations",
      "type": "integer"
    }
  },
  "title": "OutlierFilterSettings",
  "type": "object"
}

Fields:

energy_outlier_threshold (float | None)
force_outlier_threshold (float | None)
min_conformations (int)

energy_outlier_threshold `pydantic-field` #

energy_outlier_threshold: float | None = 2.0

Absolute threshold in kcal/mol/atom for energy outlier detection. Conformations where |energy_mm - energy_ref| / n_atoms (relative to minimum) exceeds this threshold will be removed. Set to None to disable energy-based filtering.

force_outlier_threshold `pydantic-field` #

force_outlier_threshold: float | None = 500.0

Absolute threshold in kcal/mol/Å for force outlier detection. Conformations where max |force_mm - force_ref| exceeds this threshold will be removed. Set to None to disable force-based filtering.

min_conformations `pydantic-field` #

min_conformations: int = 1

Minimum number of conformations to keep per molecule. If filtering would remove too many conformations, all conformations will be kept for that molecule.

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

TypeGenerationSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for generating tagged SMARTS types for a given potential type.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for generating tagged SMARTS types for a given potential type.",
  "properties": {
    "max_extend_distance": {
      "default": -1,
      "description": "Maximum number of bonds to extend from the atoms to which the potential is applied when generating tagged SMARTS patterns. A value of -1 means no limit.",
      "title": "Max Extend Distance",
      "type": "integer"
    },
    "include": {
      "default": [],
      "description": "List of SMARTS present in the initial force field for which to generate new SMARTS  patterns. This allows you to split specific types for reparameterisation. This is mutually exclusive with the exclude field.",
      "items": {
        "type": "string"
      },
      "title": "Include",
      "type": "array"
    },
    "exclude": {
      "default": [],
      "description": "List of SMARTS patterns to exclude when generating tagged SMARTS types. If present,  these patterns will remain the same as in the initial force field. This is mutually exclusive with the include field.",
      "items": {
        "type": "string"
      },
      "title": "Exclude",
      "type": "array"
    }
  },
  "title": "TypeGenerationSettings",
  "type": "object"
}

Fields:

max_extend_distance (int)
include (list[str])
exclude (list[str])

Validators:

validate_include_exclude

max_extend_distance `pydantic-field` #

max_extend_distance: int = -1

Maximum number of bonds to extend from the atoms to which the potential is applied when generating tagged SMARTS patterns. A value of -1 means no limit.

include `pydantic-field` #

include: list[str] = []

List of SMARTS present in the initial force field for which to generate new SMARTS patterns. This allows you to split specific types for reparameterisation. This is mutually exclusive with the exclude field.

exclude `pydantic-field` #

exclude: list[str] = []

List of SMARTS patterns to exclude when generating tagged SMARTS types. If present, these patterns will remain the same as in the initial force field. This is mutually exclusive with the include field.

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

validate_include_exclude `pydantic-validator` #

validate_include_exclude() -> Self

Ensure that only one of include or exclude is set.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_include_exclude(self) -> Self:
    """Ensure that only one of include or exclude is set."""
    if self.include and self.exclude:
        raise InvalidSettingsError(
            "Only one of include or exclude can be set in TypeGenerationSettings."
        )
    return self

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

MSMSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for the modified Seminario method.

Show JSON schema:

{
  "additionalProperties": false,
  "description": "Settings for the modified Seminario method.",
  "properties": {
    "ml_potential": {
      "default": "aceff-2.0",
      "description": "The machine learning potential to use for calculating the Hessian matrix",
      "enum": [
        "aceff-2.0",
        "mace-off23-small",
        "mace-off23-medium",
        "mace-off23-large",
        "egret-1",
        "aimnet2_b973c_d3_ens",
        "aimnet2_wb97m_d3_ens"
      ],
      "title": "Ml Potential",
      "type": "string"
    },
    "finite_step": {
      "description": "Finite step to calculate Hessian (Angstrom)",
      "title": "Finite Step",
      "type": "string"
    },
    "tolerance": {
      "description": "Tolerance for the geometry optimizer",
      "title": "Tolerance",
      "type": "string"
    },
    "vib_scaling": {
      "default": 0.958,
      "description": "Vibrational scaling factor. This is a reasonable default for \u03c9B97M-V/def2-TZVPPD (AceFF-2.0 LOT),  see https://doi-org.libproxy.ncl.ac.uk/10.1063/5.0152838",
      "title": "Vib Scaling",
      "type": "number"
    },
    "n_conformers": {
      "default": 1,
      "description": "Number of conformers to generate and calculate MSM parameters for. The resulting bond and angle parameters will be averaged over all conformers.",
      "title": "N Conformers",
      "type": "integer"
    }
  },
  "title": "MSMSettings",
  "type": "object"
}

Fields:

ml_potential (Literal[AvailableModels])
finite_step (OpenMMQuantity[nanometers])
tolerance (OpenMMQuantity[kilocalories_per_mole / angstrom])
vib_scaling (float)
n_conformers (int)

ml_potential `pydantic-field` #

ml_potential: Literal[AvailableModels] = 'aceff-2.0'

The machine learning potential to use for calculating the Hessian matrix

finite_step `pydantic-field` #

finite_step: OpenMMQuantity[nanometers] = (
    0.0005291772 * nanometers
)

Finite step to calculate Hessian (Angstrom)

tolerance `pydantic-field` #

tolerance: OpenMMQuantity[
    kilocalories_per_mole / angstrom
] = (0.005291772 * kilocalories_per_mole / angstrom)

Tolerance for the geometry optimizer

vib_scaling `pydantic-field` #

vib_scaling: float = 0.958

Vibrational scaling factor. This is a reasonable default for ωB97M-V/def2-TZVPPD (AceFF-2.0 LOT), see https://doi-org.libproxy.ncl.ac.uk/10.1063/5.0152838

n_conformers `pydantic-field` #

n_conformers: int = 1

Number of conformers to generate and calculate MSM parameters for. The resulting bond and angle parameters will be averaged over all conformers.

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

ParameterisationSettings `pydantic-model` #

Bases: _DefaultSettings

Settings for the starting parameterisation.

Show JSON schema:

{
  "$defs": {
    "MSMSettings": {
      "additionalProperties": false,
      "description": "Settings for the modified Seminario method.",
      "properties": {
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating the Hessian matrix",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "finite_step": {
          "description": "Finite step to calculate Hessian (Angstrom)",
          "title": "Finite Step",
          "type": "string"
        },
        "tolerance": {
          "description": "Tolerance for the geometry optimizer",
          "title": "Tolerance",
          "type": "string"
        },
        "vib_scaling": {
          "default": 0.958,
          "description": "Vibrational scaling factor. This is a reasonable default for \u03c9B97M-V/def2-TZVPPD (AceFF-2.0 LOT),  see https://doi-org.libproxy.ncl.ac.uk/10.1063/5.0152838",
          "title": "Vib Scaling",
          "type": "number"
        },
        "n_conformers": {
          "default": 1,
          "description": "Number of conformers to generate and calculate MSM parameters for. The resulting bond and angle parameters will be averaged over all conformers.",
          "title": "N Conformers",
          "type": "integer"
        }
      },
      "title": "MSMSettings",
      "type": "object"
    },
    "TypeGenerationSettings": {
      "additionalProperties": false,
      "description": "Settings for generating tagged SMARTS types for a given potential type.",
      "properties": {
        "max_extend_distance": {
          "default": -1,
          "description": "Maximum number of bonds to extend from the atoms to which the potential is applied when generating tagged SMARTS patterns. A value of -1 means no limit.",
          "title": "Max Extend Distance",
          "type": "integer"
        },
        "include": {
          "default": [],
          "description": "List of SMARTS present in the initial force field for which to generate new SMARTS  patterns. This allows you to split specific types for reparameterisation. This is mutually exclusive with the exclude field.",
          "items": {
            "type": "string"
          },
          "title": "Include",
          "type": "array"
        },
        "exclude": {
          "default": [],
          "description": "List of SMARTS patterns to exclude when generating tagged SMARTS types. If present,  these patterns will remain the same as in the initial force field. This is mutually exclusive with the include field.",
          "items": {
            "type": "string"
          },
          "title": "Exclude",
          "type": "array"
        }
      },
      "title": "TypeGenerationSettings",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Settings for the starting parameterisation.",
  "properties": {
    "smiles": {
      "description": "SMILES string or list of SMILES for molecules to fit",
      "items": {
        "type": "string"
      },
      "title": "Smiles",
      "type": "array"
    },
    "initial_force_field": {
      "default": "openff_unconstrained-2.3.0.offxml",
      "description": "The force field from which to start. This can be any OpenFF force field, or your own .offxml file.",
      "title": "Initial Force Field",
      "type": "string"
    },
    "expand_torsions": {
      "default": true,
      "description": "Whether to expand the torsion periodicities up to 4.",
      "title": "Expand Torsions",
      "type": "boolean"
    },
    "linearise_harmonics": {
      "default": true,
      "description": "Linearise the harmonic potentials in the Force Field (Default)",
      "title": "Linearise Harmonics",
      "type": "boolean"
    },
    "msm_settings": {
      "anyOf": [
        {
          "$ref": "#/$defs/MSMSettings"
        },
        {
          "type": "null"
        }
      ],
      "description": "Settings for the modified Seminario method to initialise force field parameters."
    },
    "type_generation_settings": {
      "additionalProperties": {
        "$ref": "#/$defs/TypeGenerationSettings"
      },
      "description": "Settings for generating tagged SMARTS types for each valence type.",
      "propertyNames": {
        "enum": [
          "Bonds",
          "Angles",
          "ProperTorsions",
          "ImproperTorsions"
        ]
      },
      "title": "Type Generation Settings",
      "type": "object"
    }
  },
  "required": [
    "smiles"
  ],
  "title": "ParameterisationSettings",
  "type": "object"
}

Fields:

smiles (list[str])
initial_force_field (str)
expand_torsions (bool)
linearise_harmonics (bool)
msm_settings (MSMSettings | None)
type_generation_settings (dict[NonLinearValenceType, TypeGenerationSettings])

Validators:

validate_smiles → smiles

smiles `pydantic-field` #

smiles: list[str]

SMILES string or list of SMILES for molecules to fit

initial_force_field `pydantic-field` #

initial_force_field: str = (
    "openff_unconstrained-2.3.0.offxml"
)

The force field from which to start. This can be any OpenFF force field, or your own .offxml file.

expand_torsions `pydantic-field` #

expand_torsions: bool = True

Whether to expand the torsion periodicities up to 4.

linearise_harmonics `pydantic-field` #

linearise_harmonics: bool = True

Linearise the harmonic potentials in the Force Field (Default)

msm_settings `pydantic-field` #

msm_settings: MSMSettings | None

Settings for the modified Seminario method to initialise force field parameters.

type_generation_settings `pydantic-field` #

type_generation_settings: dict[
    NonLinearValenceType, TypeGenerationSettings
]

Settings for generating tagged SMARTS types for each valence type.

molecules `property` #

molecules: list[Molecule]

Return the list of OpenFF Molecule objects for the SMILES strings.

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

validate_smiles `pydantic-validator` #

validate_smiles(value: str | list[str]) -> list[str]

Validate all SMILES are valid, unique. Accepts string or list.

Source code in presto/settings.py

@field_validator("smiles", mode="before")
def validate_smiles(cls, value: str | list[str]) -> list[str]:
    """Validate all SMILES are valid, unique. Accepts string or list."""
    # Convert single string to list for backward compatibility
    if isinstance(value, str):
        value = [value]

    if not value:
        raise ValueError("smiles list cannot be empty")

    # Check for duplicates
    if len(value) != len(set(value)):
        duplicates = [s for s in value if value.count(s) > 1]
        unique_duplicates = list(set(duplicates))
        raise ValueError(f"Duplicate SMILES found: {unique_duplicates}")

    # Validate each SMILES string
    for smiles in value:
        if Chem.MolFromSmiles(smiles) is None:
            raise ValueError(f"Invalid SMILES string: {smiles}")
    return value

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

WorkflowSettings `pydantic-model` #

Bases: _DefaultSettings

Overall settings for the full fitting workflow.

Show JSON schema:

{
  "$defs": {
    "AttributeConfig": {
      "description": "Configuration for how a potential's attributes should be trained.",
      "properties": {
        "cols": {
          "description": "The parameters to train, e.g. 'k', 'length', 'epsilon'.",
          "items": {
            "type": "string"
          },
          "title": "Cols",
          "type": "array"
        },
        "scales": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The scales to apply to each parameter, e.g. 'k': 1.0, 'length': 1.0, 'epsilon': 1.0.",
          "title": "Scales",
          "type": "object"
        },
        "limits": {
          "additionalProperties": {
            "maxItems": 2,
            "minItems": 2,
            "prefixItems": [
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              },
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              }
            ],
            "type": "array"
          },
          "default": {},
          "description": "The min and max values to clamp each parameter within, e.g. 'k': (0.0, None), 'angle': (0.0, pi), 'epsilon': (0.0, None), where none indicates no constraint.",
          "title": "Limits",
          "type": "object"
        },
        "regularize": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The regularization strength to apply to each parameter, e.g. 'k': 0.01, 'epsilon': 0.001. Parameters not listed are not regularized.",
          "title": "Regularize",
          "type": "object"
        }
      },
      "required": [
        "cols"
      ],
      "title": "AttributeConfig",
      "type": "object"
    },
    "MLMDSamplingSettings": {
      "additionalProperties": false,
      "description": "Settings for molecular dynamics sampling using a machine learning\npotential. This protocol uses the ML reference potential for sampling as\nwell as for energy and force calculations.",
      "properties": {
        "sampling_protocol": {
          "const": "ml_md",
          "default": "ml_md",
          "description": "Sampling protocol to use.",
          "title": "Sampling Protocol",
          "type": "string"
        },
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "timestep": {
          "description": "MD timestep",
          "title": "Timestep",
          "type": "string"
        },
        "temperature": {
          "description": "Temperature to run MD at",
          "title": "Temperature",
          "type": "string"
        },
        "snapshot_interval": {
          "description": "Interval between saving snapshots during production sampling",
          "title": "Snapshot Interval",
          "type": "string"
        },
        "n_conformers": {
          "default": 10,
          "description": "The number of conformers to generate, from which sampling is started",
          "title": "N Conformers",
          "type": "integer"
        },
        "equilibration_sampling_time_per_conformer": {
          "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
          "title": "Equilibration Sampling Time Per Conformer",
          "type": "string"
        },
        "production_sampling_time_per_conformer": {
          "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
          "title": "Production Sampling Time Per Conformer",
          "type": "string"
        },
        "loss_energy_weight": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for samples from this protocol.",
          "title": "Loss Energy Weight",
          "type": "number"
        },
        "loss_force_weight": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for samples from this protocol.",
          "title": "Loss Force Weight",
          "type": "number"
        }
      },
      "title": "MLMDSamplingSettings",
      "type": "object"
    },
    "MMMDMetadynamicsSamplingSettings": {
      "additionalProperties": false,
      "description": "Settings for molecular dynamics sampling using a molecular mechanics\nforce field with metadynamics. This is initally the force field supplined in the parameterisation\nsettings, but is updated as the bespoke force field is trained.",
      "properties": {
        "sampling_protocol": {
          "const": "mm_md_metadynamics",
          "default": "mm_md_metadynamics",
          "description": "Sampling protocol to use.",
          "title": "Sampling Protocol",
          "type": "string"
        },
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "timestep": {
          "description": "MD timestep",
          "title": "Timestep",
          "type": "string"
        },
        "temperature": {
          "description": "Temperature to run MD at",
          "title": "Temperature",
          "type": "string"
        },
        "snapshot_interval": {
          "description": "Interval between saving snapshots during production sampling",
          "title": "Snapshot Interval",
          "type": "string"
        },
        "n_conformers": {
          "default": 10,
          "description": "The number of conformers to generate, from which sampling is started",
          "title": "N Conformers",
          "type": "integer"
        },
        "equilibration_sampling_time_per_conformer": {
          "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
          "title": "Equilibration Sampling Time Per Conformer",
          "type": "string"
        },
        "production_sampling_time_per_conformer": {
          "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
          "title": "Production Sampling Time Per Conformer",
          "type": "string"
        },
        "loss_energy_weight": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for samples from this protocol.",
          "title": "Loss Energy Weight",
          "type": "number"
        },
        "loss_force_weight": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for samples from this protocol.",
          "title": "Loss Force Weight",
          "type": "number"
        },
        "bias_width": {
          "default": 0.3141592653589793,
          "description": "Width of the bias (in radians)",
          "title": "Bias Width",
          "type": "number"
        },
        "bias_factor": {
          "default": 20.0,
          "description": "Bias factor for well-tempered metadynamics. Typical range: 5-20",
          "title": "Bias Factor",
          "type": "number"
        },
        "bias_height": {
          "description": "Initial height of the bias",
          "title": "Bias Height",
          "type": "string"
        },
        "bias_frequency": {
          "description": "Frequency at which to add bias",
          "title": "Bias Frequency",
          "type": "string"
        },
        "bias_save_frequency": {
          "description": "Frequency at which to save the bias",
          "title": "Bias Save Frequency",
          "type": "string"
        },
        "torsions_to_include_smarts": {
          "description": "SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.",
          "items": {
            "type": "string"
          },
          "title": "Torsions To Include Smarts",
          "type": "array"
        },
        "torsions_to_exclude_smarts": {
          "description": "SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.",
          "items": {
            "type": "string"
          },
          "title": "Torsions To Exclude Smarts",
          "type": "array"
        }
      },
      "title": "MMMDMetadynamicsSamplingSettings",
      "type": "object"
    },
    "MMMDMetadynamicsTorsionMinimisationSamplingSettings": {
      "additionalProperties": false,
      "description": "Settings for MM MD metadynamics sampling with additional torsion-restrained\nminimisation structures. This extends MMMDMetadynamicsSamplingSettings by generating\nadditional training data from torsion-restrained minimisations.",
      "properties": {
        "sampling_protocol": {
          "const": "mm_md_metadynamics_torsion_minimisation",
          "default": "mm_md_metadynamics_torsion_minimisation",
          "description": "Sampling protocol to use.",
          "title": "Sampling Protocol",
          "type": "string"
        },
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "timestep": {
          "description": "MD timestep",
          "title": "Timestep",
          "type": "string"
        },
        "temperature": {
          "description": "Temperature to run MD at",
          "title": "Temperature",
          "type": "string"
        },
        "snapshot_interval": {
          "description": "Interval between saving snapshots during production sampling",
          "title": "Snapshot Interval",
          "type": "string"
        },
        "n_conformers": {
          "default": 10,
          "description": "The number of conformers to generate, from which sampling is started",
          "title": "N Conformers",
          "type": "integer"
        },
        "equilibration_sampling_time_per_conformer": {
          "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
          "title": "Equilibration Sampling Time Per Conformer",
          "type": "string"
        },
        "production_sampling_time_per_conformer": {
          "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
          "title": "Production Sampling Time Per Conformer",
          "type": "string"
        },
        "loss_energy_weight": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for samples from this protocol.",
          "title": "Loss Energy Weight",
          "type": "number"
        },
        "loss_force_weight": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for samples from this protocol.",
          "title": "Loss Force Weight",
          "type": "number"
        },
        "bias_width": {
          "default": 0.3141592653589793,
          "description": "Width of the bias (in radians)",
          "title": "Bias Width",
          "type": "number"
        },
        "bias_factor": {
          "default": 20.0,
          "description": "Bias factor for well-tempered metadynamics. Typical range: 5-20",
          "title": "Bias Factor",
          "type": "number"
        },
        "bias_height": {
          "description": "Initial height of the bias",
          "title": "Bias Height",
          "type": "string"
        },
        "bias_frequency": {
          "description": "Frequency at which to add bias",
          "title": "Bias Frequency",
          "type": "string"
        },
        "bias_save_frequency": {
          "description": "Frequency at which to save the bias",
          "title": "Bias Save Frequency",
          "type": "string"
        },
        "torsions_to_include_smarts": {
          "description": "SMARTS patterns for torsions to include in metadynamics biasing. Matches single bonds not in rings and single bonds in aliphatic rings of size 5 or more. These should match the entire torsion (4 atoms), not just the rotatable bond.",
          "items": {
            "type": "string"
          },
          "title": "Torsions To Include Smarts",
          "type": "array"
        },
        "torsions_to_exclude_smarts": {
          "description": "SMARTS patterns for bonds to exclude from metadynamics biasing. These are removed from the list of torsions matched by the include patterns. These should match only the rotatable bond (2 atoms), not the full torsion.",
          "items": {
            "type": "string"
          },
          "title": "Torsions To Exclude Smarts",
          "type": "array"
        },
        "ml_minimisation_steps": {
          "default": 10,
          "description": "Number of MLP minimisation steps with restrained torsions.",
          "title": "Ml Minimisation Steps",
          "type": "integer"
        },
        "mm_minimisation_steps": {
          "default": 10,
          "description": "Number of MM minimisation steps with restrained torsions.",
          "title": "Mm Minimisation Steps",
          "type": "integer"
        },
        "torsion_restraint_force_constant": {
          "description": "Force constant for torsion restraints.",
          "title": "Torsion Restraint Force Constant",
          "type": "string"
        },
        "map_ml_coords_energy_to_mm_coords_energy": {
          "default": false,
          "description": "Whether to substitute the MLP energy for the MM-minimised coordinates with the MLP energy for the corresponding MLP-minimised coordinates.",
          "title": "Map Ml Coords Energy To Mm Coords Energy",
          "type": "boolean"
        },
        "loss_energy_weight_mm_torsion_min": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.",
          "title": "Loss Energy Weight Mm Torsion Min",
          "type": "number"
        },
        "loss_force_weight_mm_torsion_min": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for torsion-minimised samples, using MM minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.",
          "title": "Loss Force Weight Mm Torsion Min",
          "type": "number"
        },
        "loss_energy_weight_ml_torsion_min": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_energy_weight field.",
          "title": "Loss Energy Weight Ml Torsion Min",
          "type": "number"
        },
        "loss_force_weight_ml_torsion_min": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for torsion-minimised samples, using MLP minimisation. Note that the weights for the MMMD samples are controlled by the loss_force_weight field.",
          "title": "Loss Force Weight Ml Torsion Min",
          "type": "number"
        }
      },
      "title": "MMMDMetadynamicsTorsionMinimisationSamplingSettings",
      "type": "object"
    },
    "MMMDSamplingSettings": {
      "additionalProperties": false,
      "description": "Settings for molecular dynamics sampling using a molecular mechanics\nforce field. This is initally the force field supplined in the parameterisation\nsettings, but is updated as the bespoke force field is trained.",
      "properties": {
        "sampling_protocol": {
          "const": "mm_md",
          "default": "mm_md",
          "description": "Sampling protocol to use.",
          "title": "Sampling Protocol",
          "type": "string"
        },
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating energies and forces of  the snapshots. Note that this is not generally the potential used for sampling.",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "timestep": {
          "description": "MD timestep",
          "title": "Timestep",
          "type": "string"
        },
        "temperature": {
          "description": "Temperature to run MD at",
          "title": "Temperature",
          "type": "string"
        },
        "snapshot_interval": {
          "description": "Interval between saving snapshots during production sampling",
          "title": "Snapshot Interval",
          "type": "string"
        },
        "n_conformers": {
          "default": 10,
          "description": "The number of conformers to generate, from which sampling is started",
          "title": "N Conformers",
          "type": "integer"
        },
        "equilibration_sampling_time_per_conformer": {
          "description": "Equilibration sampling time per conformer. No snapshots are saved during equilibration sampling. The total sampling time per conformer will be this plus the production_sampling_time_per_conformer.",
          "title": "Equilibration Sampling Time Per Conformer",
          "type": "string"
        },
        "production_sampling_time_per_conformer": {
          "description": "Production sampling time per conformer. The total sampling time per conformer will be this plus the equilibration_sampling_time_per_conformer.",
          "title": "Production Sampling Time Per Conformer",
          "type": "string"
        },
        "loss_energy_weight": {
          "default": 1000.0,
          "description": "Scaling factor for the energy loss term for samples from this protocol.",
          "title": "Loss Energy Weight",
          "type": "number"
        },
        "loss_force_weight": {
          "default": 0.1,
          "description": "Scaling factor for the force loss term for samples from this protocol.",
          "title": "Loss Force Weight",
          "type": "number"
        }
      },
      "title": "MMMDSamplingSettings",
      "type": "object"
    },
    "MSMSettings": {
      "additionalProperties": false,
      "description": "Settings for the modified Seminario method.",
      "properties": {
        "ml_potential": {
          "default": "aceff-2.0",
          "description": "The machine learning potential to use for calculating the Hessian matrix",
          "enum": [
            "aceff-2.0",
            "mace-off23-small",
            "mace-off23-medium",
            "mace-off23-large",
            "egret-1",
            "aimnet2_b973c_d3_ens",
            "aimnet2_wb97m_d3_ens"
          ],
          "title": "Ml Potential",
          "type": "string"
        },
        "finite_step": {
          "description": "Finite step to calculate Hessian (Angstrom)",
          "title": "Finite Step",
          "type": "string"
        },
        "tolerance": {
          "description": "Tolerance for the geometry optimizer",
          "title": "Tolerance",
          "type": "string"
        },
        "vib_scaling": {
          "default": 0.958,
          "description": "Vibrational scaling factor. This is a reasonable default for \u03c9B97M-V/def2-TZVPPD (AceFF-2.0 LOT),  see https://doi-org.libproxy.ncl.ac.uk/10.1063/5.0152838",
          "title": "Vib Scaling",
          "type": "number"
        },
        "n_conformers": {
          "default": 1,
          "description": "Number of conformers to generate and calculate MSM parameters for. The resulting bond and angle parameters will be averaged over all conformers.",
          "title": "N Conformers",
          "type": "integer"
        }
      },
      "title": "MSMSettings",
      "type": "object"
    },
    "OutlierFilterSettings": {
      "additionalProperties": false,
      "description": "Settings for filtering outliers from datasets based on MM vs MLP differences.\n\nOutliers are identified by comparing MM and reference (typically MLP) energies\nand forces. Conformations where the absolute difference exceeds a threshold\nare removed.",
      "properties": {
        "energy_outlier_threshold": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": 2.0,
          "description": "Absolute threshold in kcal/mol/atom for energy outlier detection. Conformations where |energy_mm - energy_ref| / n_atoms (relative to minimum) exceeds this threshold will be removed. Set to None to disable energy-based filtering.",
          "title": "Energy Outlier Threshold"
        },
        "force_outlier_threshold": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": 500.0,
          "description": "Absolute threshold in kcal/mol/\u00c5 for force outlier detection. Conformations where max |force_mm - force_ref| exceeds this threshold will be removed. Set to None to disable force-based filtering.",
          "title": "Force Outlier Threshold"
        },
        "min_conformations": {
          "default": 1,
          "description": "Minimum number of conformations to keep per molecule. If filtering would remove too many conformations, all conformations will be kept for that molecule.",
          "title": "Min Conformations",
          "type": "integer"
        }
      },
      "title": "OutlierFilterSettings",
      "type": "object"
    },
    "ParameterConfig": {
      "description": "Configuration for how a potential's parameters should be trained.",
      "properties": {
        "cols": {
          "description": "The parameters to train, e.g. 'k', 'length', 'epsilon'.",
          "items": {
            "type": "string"
          },
          "title": "Cols",
          "type": "array"
        },
        "scales": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The scales to apply to each parameter, e.g. 'k': 1.0, 'length': 1.0, 'epsilon': 1.0.",
          "title": "Scales",
          "type": "object"
        },
        "limits": {
          "additionalProperties": {
            "maxItems": 2,
            "minItems": 2,
            "prefixItems": [
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              },
              {
                "anyOf": [
                  {
                    "type": "number"
                  },
                  {
                    "type": "null"
                  }
                ]
              }
            ],
            "type": "array"
          },
          "default": {},
          "description": "The min and max values to clamp each parameter within, e.g. 'k': (0.0, None), 'angle': (0.0, pi), 'epsilon': (0.0, None), where none indicates no constraint.",
          "title": "Limits",
          "type": "object"
        },
        "regularize": {
          "additionalProperties": {
            "type": "number"
          },
          "default": {},
          "description": "The regularization strength to apply to each parameter, e.g. 'k': 0.01, 'epsilon': 0.001. Parameters not listed are not regularized.",
          "title": "Regularize",
          "type": "object"
        },
        "include": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/_PotentialKey"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The keys (see ``smee.TensorPotential.parameter_keys`` for details) corresponding to specific parameters to be trained. If ``None``, all parameters will be trained.",
          "title": "Include"
        },
        "exclude": {
          "anyOf": [
            {
              "items": {
                "$ref": "#/$defs/_PotentialKey"
              },
              "type": "array"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "description": "The keys (see ``smee.TensorPotential.parameter_keys`` for details) corresponding to specific parameters to be excluded from training. If ``None``, no parameters will be excluded.",
          "title": "Exclude"
        }
      },
      "required": [
        "cols"
      ],
      "title": "ParameterConfig",
      "type": "object"
    },
    "ParameterisationSettings": {
      "additionalProperties": false,
      "description": "Settings for the starting parameterisation.",
      "properties": {
        "smiles": {
          "description": "SMILES string or list of SMILES for molecules to fit",
          "items": {
            "type": "string"
          },
          "title": "Smiles",
          "type": "array"
        },
        "initial_force_field": {
          "default": "openff_unconstrained-2.3.0.offxml",
          "description": "The force field from which to start. This can be any OpenFF force field, or your own .offxml file.",
          "title": "Initial Force Field",
          "type": "string"
        },
        "expand_torsions": {
          "default": true,
          "description": "Whether to expand the torsion periodicities up to 4.",
          "title": "Expand Torsions",
          "type": "boolean"
        },
        "linearise_harmonics": {
          "default": true,
          "description": "Linearise the harmonic potentials in the Force Field (Default)",
          "title": "Linearise Harmonics",
          "type": "boolean"
        },
        "msm_settings": {
          "anyOf": [
            {
              "$ref": "#/$defs/MSMSettings"
            },
            {
              "type": "null"
            }
          ],
          "description": "Settings for the modified Seminario method to initialise force field parameters."
        },
        "type_generation_settings": {
          "additionalProperties": {
            "$ref": "#/$defs/TypeGenerationSettings"
          },
          "description": "Settings for generating tagged SMARTS types for each valence type.",
          "propertyNames": {
            "enum": [
              "Bonds",
              "Angles",
              "ProperTorsions",
              "ImproperTorsions"
            ]
          },
          "title": "Type Generation Settings",
          "type": "object"
        }
      },
      "required": [
        "smiles"
      ],
      "title": "ParameterisationSettings",
      "type": "object"
    },
    "PreComputedDatasetSettings": {
      "additionalProperties": false,
      "description": "Settings for loading pre-computed datasets from disk.\n\nFor single-molecule fits, provide a single Path.\nFor multi-molecule fits, provide a list of Paths (one per molecule).",
      "properties": {
        "sampling_protocol": {
          "const": "pre_computed",
          "default": "pre_computed",
          "description": "Sampling protocol identifier.",
          "title": "Sampling Protocol",
          "type": "string"
        },
        "dataset_paths": {
          "description": "Path(s) to pre-computed dataset(s) saved with dataset.save_to_disk(). For single-molecule fits, provide a single Path. For multi-molecule fits, provide a list of Paths (one per molecule in order).",
          "items": {
            "format": "path",
            "type": "string"
          },
          "title": "Dataset Paths",
          "type": "array"
        }
      },
      "required": [
        "dataset_paths"
      ],
      "title": "PreComputedDatasetSettings",
      "type": "object"
    },
    "TrainingSettings": {
      "additionalProperties": false,
      "description": "Settings for the training process.",
      "properties": {
        "optimiser": {
          "default": "adam",
          "description": "Optimiser to use for the training. 'adam' is Adam, 'lm' is Levenberg-Marquardt",
          "enum": [
            "adam",
            "lm"
          ],
          "title": "Optimiser",
          "type": "string"
        },
        "parameter_configs": {
          "additionalProperties": {
            "$ref": "#/$defs/ParameterConfig"
          },
          "description": "Configuration for the force field parameters to be trained.",
          "propertyNames": {
            "enum": [
              "Bonds",
              "LinearBonds",
              "Angles",
              "LinearAngles",
              "ProperTorsions",
              "ImproperTorsions"
            ]
          },
          "title": "Parameter Configs",
          "type": "object"
        },
        "attribute_configs": {
          "additionalProperties": {
            "$ref": "#/$defs/AttributeConfig"
          },
          "default": {},
          "description": "Configuration for the force field attributes to be trained. This allows 1-4 scaling for 'vdW' and 'Electrostatics' to be trained.",
          "propertyNames": {
            "enum": [
              "vdW",
              "Electrostatics"
            ]
          },
          "title": "Attribute Configs",
          "type": "object"
        },
        "n_epochs": {
          "default": 1000,
          "description": "Number of epochs in the ML fit",
          "title": "N Epochs",
          "type": "integer"
        },
        "learning_rate": {
          "default": 0.01,
          "description": "Learning Rate in the ML fit",
          "title": "Learning Rate",
          "type": "number"
        },
        "learning_rate_decay": {
          "default": 1.0,
          "description": "Learning Rate Decay. 0.99 is 1%, and 1.0 is no decay.",
          "title": "Learning Rate Decay",
          "type": "number"
        },
        "learning_rate_decay_step": {
          "default": 10,
          "description": "Learning Rate Decay Step",
          "title": "Learning Rate Decay Step",
          "type": "integer"
        },
        "regularisation_target": {
          "default": "initial",
          "description": "Target value to regularise parameters towards. 'initial' is the initial parameter value, 'zero' is zero.",
          "enum": [
            "initial",
            "zero"
          ],
          "title": "Regularisation Target",
          "type": "string"
        }
      },
      "title": "TrainingSettings",
      "type": "object"
    },
    "TypeGenerationSettings": {
      "additionalProperties": false,
      "description": "Settings for generating tagged SMARTS types for a given potential type.",
      "properties": {
        "max_extend_distance": {
          "default": -1,
          "description": "Maximum number of bonds to extend from the atoms to which the potential is applied when generating tagged SMARTS patterns. A value of -1 means no limit.",
          "title": "Max Extend Distance",
          "type": "integer"
        },
        "include": {
          "default": [],
          "description": "List of SMARTS present in the initial force field for which to generate new SMARTS  patterns. This allows you to split specific types for reparameterisation. This is mutually exclusive with the exclude field.",
          "items": {
            "type": "string"
          },
          "title": "Include",
          "type": "array"
        },
        "exclude": {
          "default": [],
          "description": "List of SMARTS patterns to exclude when generating tagged SMARTS types. If present,  these patterns will remain the same as in the initial force field. This is mutually exclusive with the include field.",
          "items": {
            "type": "string"
          },
          "title": "Exclude",
          "type": "array"
        }
      },
      "title": "TypeGenerationSettings",
      "type": "object"
    },
    "_PotentialKey": {
      "description": "TODO: Needed until interchange upgrades to pydantic >=2",
      "properties": {
        "id": {
          "title": "Id",
          "type": "string"
        },
        "mult": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Mult"
        },
        "associated_handler": {
          "anyOf": [
            {
              "type": "string"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Associated Handler"
        },
        "bond_order": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "null"
            }
          ],
          "default": null,
          "title": "Bond Order"
        }
      },
      "required": [
        "id"
      ],
      "title": "_PotentialKey",
      "type": "object"
    }
  },
  "additionalProperties": false,
  "description": "Overall settings for the full fitting workflow.",
  "properties": {
    "version": {
      "default": "0.1.dev1+g55bd96965",
      "description": "Version of presto used to create these settings",
      "title": "Version",
      "type": "string"
    },
    "output_dir": {
      "default": ".",
      "description": "Directory where the output files will be saved",
      "format": "path",
      "title": "Output Dir",
      "type": "string"
    },
    "device_type": {
      "default": "cuda",
      "description": "Device type for training, either 'cpu' or 'cuda'",
      "enum": [
        "cpu",
        "cuda"
      ],
      "title": "Device Type",
      "type": "string"
    },
    "n_iterations": {
      "default": 2,
      "description": "Number of iterations of sampling, then training the FF to run",
      "title": "N Iterations",
      "type": "integer"
    },
    "memory": {
      "default": false,
      "description": "Whether to append new training data to training data from the previous iterations, or overwrite it (False).",
      "title": "Memory",
      "type": "boolean"
    },
    "parameterisation_settings": {
      "$ref": "#/$defs/ParameterisationSettings",
      "description": "Settings for the starting parameterisation"
    },
    "training_sampling_settings": {
      "description": "Settings for sampling for generating the training data (usually molecular dynamics)",
      "discriminator": {
        "mapping": {
          "ml_md": "#/$defs/MLMDSamplingSettings",
          "mm_md": "#/$defs/MMMDSamplingSettings",
          "mm_md_metadynamics": "#/$defs/MMMDMetadynamicsSamplingSettings",
          "mm_md_metadynamics_torsion_minimisation": "#/$defs/MMMDMetadynamicsTorsionMinimisationSamplingSettings",
          "pre_computed": "#/$defs/PreComputedDatasetSettings"
        },
        "propertyName": "sampling_protocol"
      },
      "oneOf": [
        {
          "$ref": "#/$defs/MMMDSamplingSettings"
        },
        {
          "$ref": "#/$defs/MLMDSamplingSettings"
        },
        {
          "$ref": "#/$defs/MMMDMetadynamicsSamplingSettings"
        },
        {
          "$ref": "#/$defs/MMMDMetadynamicsTorsionMinimisationSamplingSettings"
        },
        {
          "$ref": "#/$defs/PreComputedDatasetSettings"
        }
      ],
      "title": "Training Sampling Settings"
    },
    "testing_sampling_settings": {
      "description": "Settings for sampling for generating the testing data (usually molecular dynamics)",
      "discriminator": {
        "mapping": {
          "ml_md": "#/$defs/MLMDSamplingSettings",
          "mm_md": "#/$defs/MMMDSamplingSettings",
          "mm_md_metadynamics": "#/$defs/MMMDMetadynamicsSamplingSettings",
          "mm_md_metadynamics_torsion_minimisation": "#/$defs/MMMDMetadynamicsTorsionMinimisationSamplingSettings",
          "pre_computed": "#/$defs/PreComputedDatasetSettings"
        },
        "propertyName": "sampling_protocol"
      },
      "oneOf": [
        {
          "$ref": "#/$defs/MMMDSamplingSettings"
        },
        {
          "$ref": "#/$defs/MLMDSamplingSettings"
        },
        {
          "$ref": "#/$defs/MMMDMetadynamicsSamplingSettings"
        },
        {
          "$ref": "#/$defs/MMMDMetadynamicsTorsionMinimisationSamplingSettings"
        },
        {
          "$ref": "#/$defs/PreComputedDatasetSettings"
        }
      ],
      "title": "Testing Sampling Settings"
    },
    "training_settings": {
      "$ref": "#/$defs/TrainingSettings",
      "description": "Settings for the training process"
    },
    "outlier_filter_settings": {
      "anyOf": [
        {
          "$ref": "#/$defs/OutlierFilterSettings"
        },
        {
          "type": "null"
        }
      ],
      "description": "Settings for filtering outliers from training data. Set to None to disable outlier filtering."
    }
  },
  "required": [
    "parameterisation_settings"
  ],
  "title": "WorkflowSettings",
  "type": "object"
}

Fields:

version (str)
output_dir (Path)
device_type (TorchDevice)
n_iterations (int)
memory (bool)
parameterisation_settings (ParameterisationSettings)
training_sampling_settings (SamplingSettings)
testing_sampling_settings (SamplingSettings)
training_settings (TrainingSettings)
outlier_filter_settings (OutlierFilterSettings | None)

version `pydantic-field` #

version: str = __version__

Version of presto used to create these settings

output_dir `pydantic-field` #

output_dir: Path = Path('.')

Directory where the output files will be saved

device_type `pydantic-field` #

device_type: TorchDevice = 'cuda'

Device type for training, either 'cpu' or 'cuda'

n_iterations `pydantic-field` #

n_iterations: int = 2

Number of iterations of sampling, then training the FF to run

memory `pydantic-field` #

memory: bool = False

Whether to append new training data to training data from the previous iterations, or overwrite it (False).

parameterisation_settings `pydantic-field` #

parameterisation_settings: ParameterisationSettings

Settings for the starting parameterisation

training_sampling_settings `pydantic-field` #

training_sampling_settings: SamplingSettings

Settings for sampling for generating the training data (usually molecular dynamics)

testing_sampling_settings `pydantic-field` #

testing_sampling_settings: SamplingSettings

Settings for sampling for generating the testing data (usually molecular dynamics)

training_settings `pydantic-field` #

training_settings: TrainingSettings

Settings for the training process

outlier_filter_settings `pydantic-field` #

outlier_filter_settings: OutlierFilterSettings | None

Settings for filtering outliers from training data. Set to None to disable outlier filtering.

output_types `property` #

output_types: set[OutputType]

Return a set of expected output types for the function which implements this settings object. Subclasses should override this method.

validate_version `classmethod` #

validate_version(value: str) -> str

Validate version format and check compatibility.

Source code in presto/settings.py

@field_validator("version")
@classmethod
def validate_version(cls, value: str) -> str:
    """Validate version format and check compatibility."""
    try:
        parsed = Version(value)
    except Exception as e:
        raise ValueError(f"Invalid version format: {value}") from e

    actual_version = Version(__version__)

    # Warn the user if major or minor versions do not match
    if parsed.major != actual_version.major or parsed.minor != actual_version.minor:
        logger.warning(
            f"Version mismatch: settings version {value} may not be compatible with current version {__version__}."
        )

    return value

validate_device_type `classmethod` #

validate_device_type(value: TorchDevice) -> TorchDevice

Ensure that the requested device type is available.

Source code in presto/settings.py

@field_validator("device_type")
@classmethod
def validate_device_type(cls, value: TorchDevice) -> TorchDevice:
    """Ensure that the requested device type is available."""
    if value == "cuda" and not torch.cuda.is_available():
        raise ValueError("CUDA is not available on this system.")

    if value == "cpu":
        warnings.warn(
            "Using CPU for training and sampling. This may be slow. Consider using CUDA if available.",
            UserWarning,
            stacklevel=2,
        )

    return value

validate_parameterisation_training_consistency #

validate_parameterisation_training_consistency() -> Self

Validate that linearise_harmonics argument in parameterisation settings is consistent with the valence types in the training settings.

Source code in presto/settings.py

@model_validator(mode="after")
def validate_parameterisation_training_consistency(self) -> Self:
    """Validate that linearise_harmonics argument in parameterisation settings is consistent with the valence types
    in the training settings."""

    harmonics_linearised = self.parameterisation_settings.linearise_harmonics
    excluded_valence_types = (
        ("Bonds", "Angles")
        if harmonics_linearised
        else ("LinearBonds", "LinearAngles")
    )
    if any(
        valence_type in self.training_settings.parameter_configs
        for valence_type in excluded_valence_types
    ):
        raise InvalidSettingsError(
            f"ParameterisationSettings.linearise_harmonics is {harmonics_linearised}, but TrainingSettings.parameter_configs "
            f"contains valence types that are inconsistent with this setting: {excluded_valence_types}. "
        )

    return self

get_path_manager #

get_path_manager() -> WorkflowPathManager

Get the output paths manager for this workflow settings object.

Source code in presto/settings.py

def get_path_manager(self) -> WorkflowPathManager:
    """Get the output paths manager for this workflow settings object."""
    # Get the number of molecules from the smiles list
    smiles = self.parameterisation_settings.smiles
    n_mols = len(smiles) if isinstance(smiles, list) else 1
    return WorkflowPathManager(
        output_dir=self.output_dir,
        n_iterations=self.n_iterations,
        n_mols=n_mols,
        training_settings=self.training_settings,
        training_sampling_settings=self.training_sampling_settings,
        testing_sampling_settings=self.testing_sampling_settings,
    )

to_yaml #

to_yaml(yaml_path: PathLike) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def to_yaml(self, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    _model_to_yaml(self, yaml_path)

from_yaml `classmethod` #

from_yaml(yaml_path: PathLike) -> Self

Load settings from a YAML file

Source code in presto/settings.py

@classmethod
def from_yaml(cls, yaml_path: PathLike) -> Self:
    """Load settings from a YAML file"""
    return _model_from_yaml(cls, yaml_path)

_model_to_yaml #

_model_to_yaml(
    model: BaseModel, yaml_path: PathLike
) -> None

Save the settings to a YAML file

Source code in presto/settings.py

def _model_to_yaml(model: BaseModel, yaml_path: PathLike) -> None:
    """Save the settings to a YAML file"""
    data = model.model_dump(mode="json")
    with open(yaml_path, "w") as file:
        yaml.dump(data, file, default_flow_style=False, sort_keys=False, indent=4)

_model_from_yaml #

_model_from_yaml(cls: type[_T], yaml_path: PathLike) -> _T

Load settings from a YAML file

Source code in presto/settings.py

def _model_from_yaml(cls: type[_T], yaml_path: PathLike) -> _T:
    """Load settings from a YAML file"""
    with open(yaml_path, "r") as file:
        settings_data = yaml.safe_load(file)
    return cls(**settings_data)

settings #

SamplingSettings module-attribute #

_DefaultSettings pydantic-model #

output_types property #

to_yaml #

from_yaml classmethod #

_SamplingSettingsBase pydantic-model #

sampling_protocol pydantic-field #

ml_potential pydantic-field #

timestep pydantic-field #

temperature pydantic-field #

snapshot_interval pydantic-field #

n_conformers pydantic-field #

equilibration_sampling_time_per_conformer pydantic-field #

production_sampling_time_per_conformer pydantic-field #

loss_energy_weight pydantic-field #

loss_force_weight pydantic-field #

validate_sampling_times pydantic-validator #

to_yaml #

from_yaml classmethod #

MMMDSamplingSettings pydantic-model #

sampling_protocol pydantic-field #

ml_potential pydantic-field #

timestep pydantic-field #

temperature pydantic-field #

snapshot_interval pydantic-field #

n_conformers pydantic-field #

equilibration_sampling_time_per_conformer pydantic-field #

production_sampling_time_per_conformer pydantic-field #

loss_energy_weight pydantic-field #

loss_force_weight pydantic-field #

to_yaml #

from_yaml classmethod #

validate_sampling_times pydantic-validator #

MLMDSamplingSettings pydantic-model #

sampling_protocol pydantic-field #

ml_potential pydantic-field #

timestep pydantic-field #

temperature pydantic-field #

snapshot_interval pydantic-field #

n_conformers pydantic-field #

equilibration_sampling_time_per_conformer pydantic-field #

production_sampling_time_per_conformer pydantic-field #

loss_energy_weight pydantic-field #

loss_force_weight pydantic-field #

to_yaml #

from_yaml classmethod #

validate_sampling_times pydantic-validator #

MMMDMetadynamicsSamplingSettings pydantic-model #

sampling_protocol pydantic-field #

bias_width pydantic-field #

bias_factor pydantic-field #

bias_height pydantic-field #

bias_frequency pydantic-field #

bias_save_frequency pydantic-field #

torsions_to_include_smarts pydantic-field #

torsions_to_exclude_smarts pydantic-field #

ml_potential pydantic-field #

timestep pydantic-field #

temperature pydantic-field #

snapshot_interval pydantic-field #

n_conformers pydantic-field #

equilibration_sampling_time_per_conformer pydantic-field #

production_sampling_time_per_conformer pydantic-field #

loss_energy_weight pydantic-field #

loss_force_weight pydantic-field #

to_yaml #

from_yaml classmethod #

validate_sampling_times pydantic-validator #

MMMDMetadynamicsTorsionMinimisationSamplingSettings pydantic-model #

sampling_protocol pydantic-field #

ml_minimisation_steps pydantic-field #

mm_minimisation_steps pydantic-field #

torsion_restraint_force_constant pydantic-field #

map_ml_coords_energy_to_mm_coords_energy pydantic-field #

loss_energy_weight_mm_torsion_min pydantic-field #

loss_force_weight_mm_torsion_min pydantic-field #

loss_energy_weight_ml_torsion_min pydantic-field #

loss_force_weight_ml_torsion_min pydantic-field #

ml_potential pydantic-field #

SamplingSettings `module-attribute` #

_DefaultSettings `pydantic-model` #

output_types `property` #

from_yaml `classmethod` #

_SamplingSettingsBase `pydantic-model` #

sampling_protocol `pydantic-field` #

ml_potential `pydantic-field` #

timestep `pydantic-field` #

temperature `pydantic-field` #

snapshot_interval `pydantic-field` #

n_conformers `pydantic-field` #

equilibration_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer `pydantic-field` #

loss_energy_weight `pydantic-field` #

loss_force_weight `pydantic-field` #

validate_sampling_times `pydantic-validator` #

from_yaml `classmethod` #

MMMDSamplingSettings `pydantic-model` #

sampling_protocol `pydantic-field` #

ml_potential `pydantic-field` #

timestep `pydantic-field` #

temperature `pydantic-field` #

snapshot_interval `pydantic-field` #

n_conformers `pydantic-field` #

equilibration_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer `pydantic-field` #

loss_energy_weight `pydantic-field` #

loss_force_weight `pydantic-field` #

from_yaml `classmethod` #

validate_sampling_times `pydantic-validator` #

MLMDSamplingSettings `pydantic-model` #

sampling_protocol `pydantic-field` #

ml_potential `pydantic-field` #

timestep `pydantic-field` #

temperature `pydantic-field` #

snapshot_interval `pydantic-field` #

n_conformers `pydantic-field` #

equilibration_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer `pydantic-field` #

loss_energy_weight `pydantic-field` #

loss_force_weight `pydantic-field` #

from_yaml `classmethod` #

validate_sampling_times `pydantic-validator` #

MMMDMetadynamicsSamplingSettings `pydantic-model` #

sampling_protocol `pydantic-field` #

bias_width `pydantic-field` #

bias_factor `pydantic-field` #

bias_height `pydantic-field` #

bias_frequency `pydantic-field` #

bias_save_frequency `pydantic-field` #

torsions_to_include_smarts `pydantic-field` #

torsions_to_exclude_smarts `pydantic-field` #

ml_potential `pydantic-field` #

timestep `pydantic-field` #

temperature `pydantic-field` #

snapshot_interval `pydantic-field` #

n_conformers `pydantic-field` #

equilibration_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer `pydantic-field` #

loss_energy_weight `pydantic-field` #

loss_force_weight `pydantic-field` #

from_yaml `classmethod` #

validate_sampling_times `pydantic-validator` #

MMMDMetadynamicsTorsionMinimisationSamplingSettings `pydantic-model` #

sampling_protocol `pydantic-field` #

ml_minimisation_steps `pydantic-field` #

mm_minimisation_steps `pydantic-field` #

torsion_restraint_force_constant `pydantic-field` #

map_ml_coords_energy_to_mm_coords_energy `pydantic-field` #

loss_energy_weight_mm_torsion_min `pydantic-field` #

loss_force_weight_mm_torsion_min `pydantic-field` #

loss_energy_weight_ml_torsion_min `pydantic-field` #

loss_force_weight_ml_torsion_min `pydantic-field` #

ml_potential `pydantic-field` #

timestep `pydantic-field` #

temperature `pydantic-field` #

snapshot_interval `pydantic-field` #

n_conformers `pydantic-field` #

equilibration_sampling_time_per_conformer `pydantic-field` #

production_sampling_time_per_conformer `pydantic-field` #