outputs
#
Functionality for handling the outputs of a workflow.
Classes:
-
OutputType–An enumeration of the different types of outputs produced by bespoke fitting functions
-
WorkflowPathManager–Manages paths for workflow outputs based on WorkflowSettings.
Functions:
-
delete_path–Delete an output file or directory if it exists. Deletes the entire contents of
-
get_mol_path–Get a molecule-specific path from a base path.
OutputType
#
Bases: Enum
An enumeration of the different types of outputs produced by bespoke fitting functions
Attributes:
-
WORKFLOW_SETTINGS–The settings yaml file which is written if the user runs using
presto train -
ENERGIES_AND_FORCES–Directory containing energies and forces data files in HDF5 format.
-
TENSORBOARD–Directory containing TensorBoard logs.
-
TRAINING_METRICS–File containing training metrics.
-
OFFXML–The output OpenFF ForceField file containing the bespoke parameters.
-
SCATTER–HDF5 file containing scatter data for energies and forces.
-
PDB_TRAJECTORY–PDB trajectory file containing structures sampled during sampling.
-
METADYNAMICS_BIAS–Directory containing metadynamics bias files.
-
LOSS_PLOT–Plot of training and validation loss over training epochs.
-
ERROR_PLOT–Plot of error distributions for energies and forces. These are with
-
CORRELATION_PLOT–Plot of predicted vs reference energies and forces. These are with respect
-
FORCE_ERROR_BY_ATOM_INDEX_PLOT–Plot of force errors broken down by atom index. These are with respect to
-
PARAMETER_VALUES_PLOT–Plot of parameter values before and after fitting. Note that the 'before' force field
-
PARAMETER_DIFFERENCES_PLOT–Plot of parameter differences (fitted - initial) after fitting. Note that the 'initial' force field
-
ML_MINIMISED_PDB–PDB file containing structures minimised using the machine-learned potential.
-
MM_MINIMISED_PDB–PDB file containing structures minimised using the molecular mechanics potential.
-
TORSION_DIHEDRALS_PLOT–Plot of dihedral angles for all rotatable torsions during trajectories.
WORKFLOW_SETTINGS
class-attribute
instance-attribute
#
The settings yaml file which is written if the user runs using presto train
(rather than presto train-from-yaml <settings file>). This provides a record of the
settings used and allows easy re-running of the workflow later.
ENERGIES_AND_FORCES
class-attribute
instance-attribute
#
Directory containing energies and forces data files in HDF5 format.
TENSORBOARD
class-attribute
instance-attribute
#
Directory containing TensorBoard logs.
TRAINING_METRICS
class-attribute
instance-attribute
#
File containing training metrics.
OFFXML
class-attribute
instance-attribute
#
The output OpenFF ForceField file containing the bespoke parameters. One bespoke FF file is produced per training iteration.
SCATTER
class-attribute
instance-attribute
#
HDF5 file containing scatter data for energies and forces.
PDB_TRAJECTORY
class-attribute
instance-attribute
#
PDB trajectory file containing structures sampled during sampling.
METADYNAMICS_BIAS
class-attribute
instance-attribute
#
Directory containing metadynamics bias files.
LOSS_PLOT
class-attribute
instance-attribute
#
Plot of training and validation loss over training epochs.
ERROR_PLOT
class-attribute
instance-attribute
#
Plot of error distributions for energies and forces. These are with respect to the 'test' data.
CORRELATION_PLOT
class-attribute
instance-attribute
#
Plot of predicted vs reference energies and forces. These are with respect to the 'test' data.
FORCE_ERROR_BY_ATOM_INDEX_PLOT
class-attribute
instance-attribute
#
Plot of force errors broken down by atom index. These are with respect to the 'test' data.
PARAMETER_VALUES_PLOT
class-attribute
instance-attribute
#
Plot of parameter values before and after fitting. Note that the 'before' force field is the one used for the initial sampling, after the MSM step.
PARAMETER_DIFFERENCES_PLOT
class-attribute
instance-attribute
#
Plot of parameter differences (fitted - initial) after fitting. Note that the 'initial' force field is the one used for the initial sampling, after the MSM step.
ML_MINIMISED_PDB
class-attribute
instance-attribute
#
PDB file containing structures minimised using the machine-learned potential.
MM_MINIMISED_PDB
class-attribute
instance-attribute
#
PDB file containing structures minimised using the molecular mechanics potential.
TORSION_DIHEDRALS_PLOT
class-attribute
instance-attribute
#
Plot of dihedral angles for all rotatable torsions during trajectories.
WorkflowPathManager
dataclass
#
WorkflowPathManager(
output_dir: Path,
n_iterations: int = 1,
n_mols: int = 1,
training_settings: TrainingSettings | None = None,
training_sampling_settings: (
SamplingSettings | None
) = None,
testing_sampling_settings: (
SamplingSettings | None
) = None,
)
Manages paths for workflow outputs based on WorkflowSettings.
Methods:
-
get_stage_path–Get the directory path for a workflow stage.
-
mk_stage_dir–Create the directory for a workflow stage.
-
get_output_path–Get the path for an output type in a stage.
-
get_output_path_for_mol–Get the path for a per-molecule output type in a stage.
-
get_all_output_paths–Get all expected output paths organized by stage.
-
get_all_output_paths_by_output_type–Get all expected output paths organized by output type.
-
get_all_output_paths_by_output_type_by_molecule–Get all expected output paths organized by output type and molecule.
-
clean–Remove all output files and empty stage directories.
Attributes:
-
outputs_by_stage(dict[OutputStage, set[OutputType]]) –Return a dictionary mapping each stage to expected output types.
outputs_by_stage
property
#
outputs_by_stage: dict[OutputStage, set[OutputType]]
Return a dictionary mapping each stage to expected output types.
get_stage_path
#
Get the directory path for a workflow stage.
mk_stage_dir
#
get_output_path
#
get_output_path(
stage: OutputStage, output_type: OutputType
) -> Path
Get the path for an output type in a stage.
Note: For per-molecule output types (those in PER_MOLECULE_OUTPUT_TYPES), use get_output_path_for_mol instead.
Source code in presto/outputs.py
get_output_path_for_mol
#
get_output_path_for_mol(
stage: OutputStage,
output_type: OutputType,
mol_idx: int,
) -> Path
Get the path for a per-molecule output type in a stage.
Parameters:
-
stage(OutputStage) –The workflow stage.
-
output_type(OutputType) –The type of output (must be a per-molecule output type).
-
mol_idx(int) –The molecule index.
Returns:
-
Path–The path for the per-molecule output.
Raises:
-
ValueError–If the output type is not a per-molecule output type, or if mol_idx is out of range.
Source code in presto/outputs.py
get_all_output_paths
#
get_all_output_paths(
only_if_exists: bool = True,
) -> dict[OutputStage, dict[OutputType, Path | list[Path]]]
Get all expected output paths organized by stage.
For per-molecule output types, returns a list of paths (one per molecule). For other output types, returns a single path.
Source code in presto/outputs.py
get_all_output_paths_by_output_type
#
get_all_output_paths_by_output_type(
only_if_exists: bool = True,
) -> dict[OutputType, list[Path]]
Get all expected output paths organized by output type.
Note: For per-molecule output types, paths from all molecules are flattened into a single list. Use get_all_output_paths_by_output_type_by_molecule for per-molecule organization.
Source code in presto/outputs.py
get_all_output_paths_by_output_type_by_molecule
#
get_all_output_paths_by_output_type_by_molecule(
only_if_exists: bool = True,
) -> dict[OutputType, dict[int, list[Path]] | list[Path]]
Get all expected output paths organized by output type and molecule.
For per-molecule output types, returns a dict mapping mol_idx to list of paths. For non-per-molecule output types, returns a flat list of paths.
Parameters:
-
only_if_exists(bool, default:True) –If True, only return paths that exist on disk, by default True.
Returns:
-
dict[OutputType, dict[int, list[Path]] | list[Path]]–A dictionary mapping output types to either: - For per-molecule types: dict mapping mol_idx -> list of paths (one per stage) - For other types: list of paths (one per stage)
Source code in presto/outputs.py
_extract_mol_idx_from_path
#
Extract molecule index from a per-molecule path.
Parameters:
-
path(Path) –A path with the _mol{idx} naming convention.
Returns:
-
int–The molecule index.
Source code in presto/outputs.py
clean
#
Remove all output files and empty stage directories.
Source code in presto/outputs.py
delete_path
#
Delete an output file or directory if it exists. Deletes the entire contents of a directory.
Parameters:
-
path(Path) –The path to delete.
-
recursive(bool, default:False) –Whether to delete directories recursively, by default False. If False, only empty directories will be deleted.
Source code in presto/outputs.py
get_mol_path
#
Get a molecule-specific path from a base path.
This function applies the standard naming convention for per-molecule outputs, where the molecule index is appended to the filename/directory name.
Parameters:
-
base_path(Path) –The base path (e.g., from get_output_path or output_paths dict).
-
mol_idx(int) –The molecule index.
Returns:
-
Path–The path with the molecule index appended.
Examples:
>>> get_mol_path(Path("output/scatter.hdf5"), 0)
PosixPath('output/scatter_mol0.hdf5')
>>> get_mol_path(Path("output/energy_data"), 1)
PosixPath('output/energy_data_mol1')