Skip to content

5: Chemspace streamlined#

Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole

Overview#

Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library.

import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace, RGroups, Linkers

rgroups = RGroups()
linkers = Linkers()
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.



MolGridWidget(grid_id='m2')



MolGridWidget(grid_id='m1')

Prepare the ligand template#

The provided core structure lig.pdb has been extracted from a crystal structure of Mpro in complex with compound 4 from the Jorgensen study (PDB: 7L10), and a Cl atom has been removed to allow growth into the S3/S4 pocket. The template structure of the ligand is protonated with Open Babel:

init_mol = Chem.SDMolSupplier('sarscov2/mini.sdf', removeHs=False)[0]

# get the FEgrow representation of the rdkit Mol
scaffold = fegrow.RMol(init_mol)
# Show the 2D (with indices) representation of the core. This is used to select the desired growth vector.
scaffold.rep2D(idx=True, size=(500, 500))

png

Using the 2D drawing, select an index for the growth vector. Note that it is currently only possible to grow from hydrogen atom positions. In this case, we are selecting the hydrogen atom labelled H:40 to enable growth into the S3/S4 pocket of Mpro.

# specify the connecting point
scaffold.GetAtomWithIdx(8).SetAtomicNum(0)
# create the chemical space
cs = ChemSpace()
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
  warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "


Dask can be watched on http://192.168.178.20:8989/status
Generated 2 conformers. 
Generated 1 conformers. 
Generated 1 conformers. 
Generated 8 conformers. 
Generated 2 conformers. 
Generated 4 conformers. 
Removed 0 conformers. 
Removed 0 conformers. 
Removed 0 conformers. 
Removed 1 conformers. 
Removed 1 conformers. 
Removed 3 conformers.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


using ani2x
using ani2x
using ani2x
using ani2x


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")


using ani2x
using ani2x


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer:   0%|                               | 0/1 [00:00<?, ?it/s]

failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer: 100%|███████████████████████| 1/1 [00:01<00:00,  1.66s/it]
Optimising conformer:   0%|                               | 0/5 [00:00<?, ?it/s]
imising conformer:   0%|                               | 0/2 [00:00<?, ?it/s]


g conformer:   0%|                               | 0/1 [00:00<?, ?it/s]

sing conformer:   0%|                               | 0/3 [00:00<?, ?it/s]



Optimising conformer:  20%|████▌                  | 1/5 [00:05<00:23,  5.84s/it]
imising conformer:  50%|███████████▌           | 1/2 [00:05<00:05,  5.61s/it]
Optimising conformer: 100%|███████████████████████| 2/2 [00:07<00:00,  3.62s/it]
Optimising conformer:  40%|█████████▏             | 2/5 [00:08<00:12,  4.18s/it]


Optimising conformer: 100%|███████████████████████| 1/1 [00:08<00:00,  8.07s/it]


Optimising conformer:  60%|█████████████▊         | 3/5 [00:12<00:08,  4.13s/it]



Optimising conformer: 100%|███████████████████████| 1/1 [00:07<00:00,  7.59s/it]
Optimising conformer:  80%|██████████████████▍    | 4/5 [00:14<00:03,  3.31s/it]

sing conformer:  67%|███████████████▎       | 2/3 [00:11<00:04,  4.93s/it]

Optimising conformer: 100%|███████████████████████| 3/3 [00:12<00:00,  4.18s/it]
Optimising conformer: 100%|███████████████████████| 5/5 [00:16<00:00,  3.37s/it]
cs.add_scaffold(scaffold)

Build a quick library#

# building molecules by attaching the most frequently used 5 R-groups
cs.add_rgroups(rgroups.Mol[:3].to_list())

# build more molecules by combining the linkers and R-groups
cs.add_rgroups(linkers.Mol[:3].to_list(), 
               rgroups.Mol[:3].to_list())
cs
Smiles score h Training Success enamine_searched enamine_id 2D
0 [H]Oc1c([H])nc([H])c([H])c1[H] <NA> 8 False NaN False NaN
Mol
1 [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] <NA> 8 False NaN False NaN
Mol
2 [H]c1nc([H])c(N([H])[H])c([H])c1[H] <NA> 8 False NaN False NaN
Mol
3 [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] <NA> 8 False NaN False NaN
Mol
4 [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... <NA> 8 False NaN False NaN
Mol
5 [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] <NA> 8 False NaN False NaN
Mol

Prepare the protein#

The protein-ligand complex structure is downloaded, and PDBFixer is used to protonate the protein, and perform other simple repair:

# get the protein-ligand complex structure
!wget -nc https://files.rcsb.org/download/7L10.pdb

# load the complex with the ligand
sys = prody.parsePDB('7L10.pdb')

# remove any unwanted molecules
rec = sys.select('not (nucleic or hetatm or water)')

# save the processed protein
prody.writePDB('rec.pdb', rec)

# fix the receptor file (missing residues, protonation, etc)
fegrow.fix_receptor("rec.pdb", "rec_final.pdb")

# load back into prody
rec_final = prody.parsePDB("rec_final.pdb")
File ‘7L10.pdb’ already there; not retrieving.



@> 2609 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 4638 atoms and 1 coordinate set(s) were parsed in 0.03s.
# make your chemical space aware of your receptor (important for the next step! )
cs.add_protein("rec_final.pdb")
# build and score the entire chemical space
cs.evaluate()
Smiles Mol score h Training Success enamine_searched enamine_id
0 [H]Oc1c([H])nc([H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704c8d60> 3.215 8 True True False NaN
1 [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704c9b70> 3.231 8 True True False NaN
2 [H]c1nc([H])c(N([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e0170790ae0> 3.188 8 True True False NaN
3 [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704cab10> 3.225 8 True True False NaN
4 [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... <fegrow.package.RMol object at 0x7e01707918f0> 3.39 8 True True False NaN
5 [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01cd52d800> 3.551 8 True True False NaN
# verify that the score has been computed
cs
Smiles score h Training Success enamine_searched enamine_id 2D
0 [H]Oc1c([H])nc([H])c([H])c1[H] 3.215 8 True True False NaN
Mol
1 [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] 3.231 8 True True False NaN
Mol
2 [H]c1nc([H])c(N([H])[H])c([H])c1[H] 3.188 8 True True False NaN
Mol
3 [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] 3.225 8 True True False NaN
Mol
4 [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... 3.39 8 True True False NaN
Mol
5 [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] 3.551 8 True True False NaN
Mol
# access the Pandas dataframe directly 
cs.df
Smiles Mol score h Training Success enamine_searched enamine_id
0 [H]Oc1c([H])nc([H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704c8d60> 3.215 8 True True False NaN
1 [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704c9b70> 3.231 8 True True False NaN
2 [H]c1nc([H])c(N([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e0170790ae0> 3.188 8 True True False NaN
3 [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01704cab10> 3.225 8 True True False NaN
4 [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... <fegrow.package.RMol object at 0x7e01707918f0> 3.39 8 True True False NaN
5 [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] <fegrow.package.RMol object at 0x7e01cd52d800> 3.551 8 True True False NaN
# you can save the entire ChemSpace into an .SDF file, which can be used to recover ChemSpace
cs.to_sdf("cs_optimised_molecules.sdf")

# or access the molecules directly
cs[0].to_file("best_conformers0.pdb") 
# recreate the chemical space
cs = ChemSpace.from_sdf("cs_optimised_molecules.sdf")
Dask can be watched on http://192.168.178.20:33405/status


/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
  warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/distributed/node.py:187: UserWarning: Port 8989 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33405 instead
  warnings.warn(
# search the Enamine database for the best 3 scoring molecules in your chemical space 
# and enrich your chemical space by adding them to the chemical space
# (relies on https://sw.docking.org/)
# cs.add_enamine_molecules(3)