Skip to content

6: Chemspace with SMILES#

Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole

Overview#

Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library.

import pandas as pd
import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace

from fegrow.testing import core_5R83_path, rec_5R83_path, data_5R83_path
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.

Prepare the ligand template#

scaffold = Chem.SDMolSupplier(core_5R83_path)[0]

As we are using already prepared Smiles that have the scaffold as a substructure, it is not needed to set any growing vector.

Ensure that your code is in __name__ == "__main__" when creating a cluster in your scripts, particularly when using processes=True. Although jupyter notebook works fine.
When using ANI=True for processing the Dask cluster has to use processes because ANI is currently not threadsafe. Thus we create here a LocalCluster and ask ChemSpace to use it.
from dask.distributed import LocalCluster
lc = LocalCluster(processes=True, n_workers=None, threads_per_worker=1)
2025-03-07 11:50:46,892 - distributed.nanny - WARNING - Restarting worker
# create the chemical space
cs = ChemSpace(dask_cluster=lc)
Dask can be watched on http://127.0.0.1:8787/status
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(scaffold)
cs.add_protein(rec_5R83_path)
/home/dresio/code/fegrow/fegrow/package.py:799: UserWarning: The template does not have an attachement (Atoms with index 0, or in case of Smiles the * character. )
  warnings.warn("The template does not have an attachement (Atoms with index 0, "
# load 50k smiles dataset from the study
smiles = pd.read_csv(data_5R83_path).Smiles.to_list()

# for testing, sort by size and pick small
smiles.sort(key=len)
# take 5 smallest smiles
smiles = smiles[:5]
# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles, protonate=False)
cs
Smiles score h Training Success enamine_searched enamine_id 2D
0 [H]c1nc([H])c(SF)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
1 [H]c1nc([H])c(SI)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
2 [H]c1nc([H])c(SCl)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
3 [H]c1nc([H])c(SBr)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
4 [H]c1nc([H])c(C#CF)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
cs.evaluate()
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


Generated 2 conformers. 
Removed 0 conformers. 
Using force field


Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00,  7.45it/s]
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.


Generated 2 conformers. 
Removed 0 conformers. 
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer:   0%|                               | 0/2 [00:00<?, ?it/s]

Generated 2 conformers. 
Removed 0 conformers. 
Using force field


Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00,  9.99it/s]
Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00,  1.07s/it]


Generated 2 conformers. 
Removed 0 conformers. 
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00,  1.19s/it]
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.


Generated 1 conformers. 
Removed 0 conformers. 
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer: 100%|███████████████████████| 1/1 [00:01<00:00,  1.29s/it]
Smiles Mol score h Training Success enamine_searched enamine_id
0 [H]c1nc([H])c(SF)c([H])c1[H] <fegrow.package.RMol object at 0x7e17cc670f90> 3.752 <NA> True True False NaN
1 [H]c1nc([H])c(SI)c([H])c1[H] <fegrow.package.RMol object at 0x7e17cc673a60> 3.933 <NA> True True False NaN
2 [H]c1nc([H])c(SCl)c([H])c1[H] <fegrow.package.RMol object at 0x7e1798140860> 3.708 <NA> True True False NaN
3 [H]c1nc([H])c(SBr)c([H])c1[H] <fegrow.package.RMol object at 0x7e1798140950> 3.89 <NA> True True False NaN
4 [H]c1nc([H])c(C#CF)c([H])c1[H] <fegrow.package.RMol object at 0x7e1798141210> 3.478 <NA> True True False NaN
cs
Smiles score h Training Success enamine_searched enamine_id 2D
0 [H]c1nc([H])c(SF)c([H])c1[H] 3.752 <NA> True True False NaN
Mol
1 [H]c1nc([H])c(SI)c([H])c1[H] 3.933 <NA> True True False NaN
Mol
2 [H]c1nc([H])c(SCl)c([H])c1[H] 3.708 <NA> True True False NaN
Mol
3 [H]c1nc([H])c(SBr)c([H])c1[H] 3.89 <NA> True True False NaN
Mol
4 [H]c1nc([H])c(C#CF)c([H])c1[H] 3.478 <NA> True True False NaN
Mol