6: Chemspace with SMILES#
Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole
Overview#
Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library.
import pandas as pd
import prody
from rdkit import Chem
import fegrow
from fegrow import ChemSpace
from fegrow.testing import core_5R83_path, rec_5R83_path, data_5R83_path
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
Prepare the ligand template#
scaffold = Chem.SDMolSupplier(core_5R83_path)[0]
As we are using already prepared Smiles that have the scaffold as a substructure, it is not needed to set any growing vector.
Ensure that your code is in __name__ == "__main__" when creating a cluster in your scripts,
particularly when using processes=True. Although jupyter notebook works fine.
When using ANI=True for processing the Dask cluster has to use processes because ANI is currently not threadsafe. Thus we create here a LocalCluster and ask ChemSpace to use it.
from dask.distributed import LocalCluster
lc = LocalCluster(processes=True, n_workers=None, threads_per_worker=1)
2025-03-07 11:50:46,892 - distributed.nanny - WARNING - Restarting worker
# create the chemical space
cs = ChemSpace(dask_cluster=lc)
Dask can be watched on http://127.0.0.1:8787/status
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(scaffold)
cs.add_protein(rec_5R83_path)
/home/dresio/code/fegrow/fegrow/package.py:799: UserWarning: The template does not have an attachement (Atoms with index 0, or in case of Smiles the * character. )
warnings.warn("The template does not have an attachement (Atoms with index 0, "
# load 50k smiles dataset from the study
smiles = pd.read_csv(data_5R83_path).Smiles.to_list()
# for testing, sort by size and pick small
smiles.sort(key=len)
# take 5 smallest smiles
smiles = smiles[:5]
# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles, protonate=False)
cs
Smiles | score | h | Training | Success | enamine_searched | enamine_id | 2D | |
---|---|---|---|---|---|---|---|---|
0 | [H]c1nc([H])c(SF)c([H])c1[H] | <NA> | <NA> | False | NaN | False | NaN | |
1 | [H]c1nc([H])c(SI)c([H])c1[H] | <NA> | <NA> | False | NaN | False | NaN | |
2 | [H]c1nc([H])c(SCl)c([H])c1[H] | <NA> | <NA> | False | NaN | False | NaN | |
3 | [H]c1nc([H])c(SBr)c([H])c1[H] | <NA> | <NA> | False | NaN | False | NaN | |
4 | [H]c1nc([H])c(C#CF)c([H])c1[H] | <NA> | <NA> | False | NaN | False | NaN |
cs.evaluate()
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
Generated 2 conformers.
Removed 0 conformers.
Using force field
Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00, 7.45it/s]
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Generated 2 conformers.
Removed 0 conformers.
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 0%| | 0/2 [00:00<?, ?it/s]
Generated 2 conformers.
Removed 0 conformers.
Using force field
Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00, 9.99it/s]
Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00, 1.07s/it]
Generated 2 conformers.
Removed 0 conformers.
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00, 1.19s/it]
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
Generated 1 conformers.
Removed 0 conformers.
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 100%|███████████████████████| 1/1 [00:01<00:00, 1.29s/it]
Smiles | Mol | score | h | Training | Success | enamine_searched | enamine_id | |
---|---|---|---|---|---|---|---|---|
0 | [H]c1nc([H])c(SF)c([H])c1[H] | <fegrow.package.RMol object at 0x7e17cc670f90> | 3.752 | <NA> | True | True | False | NaN |
1 | [H]c1nc([H])c(SI)c([H])c1[H] | <fegrow.package.RMol object at 0x7e17cc673a60> | 3.933 | <NA> | True | True | False | NaN |
2 | [H]c1nc([H])c(SCl)c([H])c1[H] | <fegrow.package.RMol object at 0x7e1798140860> | 3.708 | <NA> | True | True | False | NaN |
3 | [H]c1nc([H])c(SBr)c([H])c1[H] | <fegrow.package.RMol object at 0x7e1798140950> | 3.89 | <NA> | True | True | False | NaN |
4 | [H]c1nc([H])c(C#CF)c([H])c1[H] | <fegrow.package.RMol object at 0x7e1798141210> | 3.478 | <NA> | True | True | False | NaN |
cs
Smiles | score | h | Training | Success | enamine_searched | enamine_id | 2D | |
---|---|---|---|---|---|---|---|---|
0 | [H]c1nc([H])c(SF)c([H])c1[H] | 3.752 | <NA> | True | True | False | NaN | |
1 | [H]c1nc([H])c(SI)c([H])c1[H] | 3.933 | <NA> | True | True | False | NaN | |
2 | [H]c1nc([H])c(SCl)c([H])c1[H] | 3.708 | <NA> | True | True | False | NaN | |
3 | [H]c1nc([H])c(SBr)c([H])c1[H] | 3.89 | <NA> | True | True | False | NaN | |
4 | [H]c1nc([H])c(C#CF)c([H])c1[H] | 3.478 | <NA> | True | True | False | NaN |