5: Chemspace streamlined#
Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole
Overview#
Building and scoring molecules can be further streamlined by employing our established protocol. Here we show how to quickly build a library and score the entire library.
import prody
from rdkit import Chem
import fegrow
from fegrow import ChemSpace, RGroups, Linkers
rgroups = RGroups()
linkers = Linkers()
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
MolGridWidget(grid_id='m2')
MolGridWidget(grid_id='m1')
Prepare the ligand template#
The provided core structure lig.pdb
has been extracted from a crystal structure of Mpro in complex with compound 4 from the Jorgensen study (PDB: 7L10), and a Cl atom has been removed to allow growth into the S3/S4 pocket. The template structure of the ligand is protonated with Open Babel:
init_mol = Chem.SDMolSupplier('sarscov2/mini.sdf', removeHs=False)[0]
# get the FEgrow representation of the rdkit Mol
scaffold = fegrow.RMol(init_mol)
# Show the 2D (with indices) representation of the core. This is used to select the desired growth vector.
scaffold.rep2D(idx=True, size=(500, 500))
Using the 2D drawing, select an index for the growth vector. Note that it is currently only possible to grow from hydrogen atom positions. In this case, we are selecting the hydrogen atom labelled H:40 to enable growth into the S3/S4 pocket of Mpro.
# specify the connecting point
scaffold.GetAtomWithIdx(8).SetAtomicNum(0)
# create the chemical space
cs = ChemSpace()
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "
Dask can be watched on http://192.168.178.20:8989/status
Generated 2 conformers.
Generated 1 conformers.
Generated 1 conformers.
Generated 8 conformers.
Generated 2 conformers.
Generated 4 conformers.
Removed 0 conformers.
Removed 0 conformers.
Removed 0 conformers.
Removed 1 conformers.
Removed 1 conformers.
Removed 3 conformers.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
using ani2x
using ani2x
using ani2x
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
using ani2x
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 0%| | 0/1 [00:00<?, ?it/s]
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 100%|███████████████████████| 1/1 [00:01<00:00, 1.66s/it]
Optimising conformer: 0%| | 0/5 [00:00<?, ?it/s]
[Aimising conformer: 0%| | 0/2 [00:00<?, ?it/s]
[A[A[Ag conformer: 0%| | 0/1 [00:00<?, ?it/s]
[A[Asing conformer: 0%| | 0/3 [00:00<?, ?it/s]
Optimising conformer: 20%|████▌ | 1/5 [00:05<00:23, 5.84s/it]
[Aimising conformer: 50%|███████████▌ | 1/2 [00:05<00:05, 5.61s/it]
Optimising conformer: 100%|███████████████████████| 2/2 [00:07<00:00, 3.62s/it]
Optimising conformer: 40%|█████████▏ | 2/5 [00:08<00:12, 4.18s/it]
Optimising conformer: 100%|███████████████████████| 1/1 [00:08<00:00, 8.07s/it]
Optimising conformer: 60%|█████████████▊ | 3/5 [00:12<00:08, 4.13s/it]
Optimising conformer: 100%|███████████████████████| 1/1 [00:07<00:00, 7.59s/it]
Optimising conformer: 80%|██████████████████▍ | 4/5 [00:14<00:03, 3.31s/it]
[A[Asing conformer: 67%|███████████████▎ | 2/3 [00:11<00:04, 4.93s/it]
Optimising conformer: 100%|███████████████████████| 3/3 [00:12<00:00, 4.18s/it]
Optimising conformer: 100%|███████████████████████| 5/5 [00:16<00:00, 3.37s/it]
cs.add_scaffold(scaffold)
Build a quick library#
# building molecules by attaching the most frequently used 5 R-groups
cs.add_rgroups(rgroups.Mol[:3].to_list())
# build more molecules by combining the linkers and R-groups
cs.add_rgroups(linkers.Mol[:3].to_list(),
rgroups.Mol[:3].to_list())
cs
Smiles | score | h | Training | Success | enamine_searched | enamine_id | 2D | |
---|---|---|---|---|---|---|---|---|
0 | [H]Oc1c([H])nc([H])c([H])c1[H] | <NA> | 8 | False | NaN | False | NaN | |
1 | [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] | <NA> | 8 | False | NaN | False | NaN | |
2 | [H]c1nc([H])c(N([H])[H])c([H])c1[H] | <NA> | 8 | False | NaN | False | NaN | |
3 | [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] | <NA> | 8 | False | NaN | False | NaN | |
4 | [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... | <NA> | 8 | False | NaN | False | NaN | |
5 | [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] | <NA> | 8 | False | NaN | False | NaN |
Prepare the protein#
The protein-ligand complex structure is downloaded, and PDBFixer is used to protonate the protein, and perform other simple repair:
# get the protein-ligand complex structure
!wget -nc https://files.rcsb.org/download/7L10.pdb
# load the complex with the ligand
sys = prody.parsePDB('7L10.pdb')
# remove any unwanted molecules
rec = sys.select('not (nucleic or hetatm or water)')
# save the processed protein
prody.writePDB('rec.pdb', rec)
# fix the receptor file (missing residues, protonation, etc)
fegrow.fix_receptor("rec.pdb", "rec_final.pdb")
# load back into prody
rec_final = prody.parsePDB("rec_final.pdb")
File ‘7L10.pdb’ already there; not retrieving.
@> 2609 atoms and 1 coordinate set(s) were parsed in 0.02s.
@> 4638 atoms and 1 coordinate set(s) were parsed in 0.03s.
# make your chemical space aware of your receptor (important for the next step! )
cs.add_protein("rec_final.pdb")
# build and score the entire chemical space
cs.evaluate()
Smiles | Mol | score | h | Training | Success | enamine_searched | enamine_id | |
---|---|---|---|---|---|---|---|---|
0 | [H]Oc1c([H])nc([H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704c8d60> | 3.215 | 8 | True | True | False | NaN |
1 | [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704c9b70> | 3.231 | 8 | True | True | False | NaN |
2 | [H]c1nc([H])c(N([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e0170790ae0> | 3.188 | 8 | True | True | False | NaN |
3 | [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704cab10> | 3.225 | 8 | True | True | False | NaN |
4 | [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... | <fegrow.package.RMol object at 0x7e01707918f0> | 3.39 | 8 | True | True | False | NaN |
5 | [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01cd52d800> | 3.551 | 8 | True | True | False | NaN |
# verify that the score has been computed
cs
Smiles | score | h | Training | Success | enamine_searched | enamine_id | 2D | |
---|---|---|---|---|---|---|---|---|
0 | [H]Oc1c([H])nc([H])c([H])c1[H] | 3.215 | 8 | True | True | False | NaN | |
1 | [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] | 3.231 | 8 | True | True | False | NaN | |
2 | [H]c1nc([H])c(N([H])[H])c([H])c1[H] | 3.188 | 8 | True | True | False | NaN | |
3 | [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] | 3.225 | 8 | True | True | False | NaN | |
4 | [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... | 3.39 | 8 | True | True | False | NaN | |
5 | [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] | 3.551 | 8 | True | True | False | NaN |
# access the Pandas dataframe directly
cs.df
Smiles | Mol | score | h | Training | Success | enamine_searched | enamine_id | |
---|---|---|---|---|---|---|---|---|
0 | [H]Oc1c([H])nc([H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704c8d60> | 3.215 | 8 | True | True | False | NaN |
1 | [H]c1nc([H])c(OC([H])([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704c9b70> | 3.231 | 8 | True | True | False | NaN |
2 | [H]c1nc([H])c(N([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e0170790ae0> | 3.188 | 8 | True | True | False | NaN |
3 | [H]OC([H])([H])c1c([H])nc([H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01704cab10> | 3.225 | 8 | True | True | False | NaN |
4 | [H]c1nc([H])c(C(=O)N([H])OC([H])([H])[H])c([H]... | <fegrow.package.RMol object at 0x7e01707918f0> | 3.39 | 8 | True | True | False | NaN |
5 | [H]c1nc([H])c(N([H])C(=O)N([H])[H])c([H])c1[H] | <fegrow.package.RMol object at 0x7e01cd52d800> | 3.551 | 8 | True | True | False | NaN |
# you can save the entire ChemSpace into an .SDF file, which can be used to recover ChemSpace
cs.to_sdf("cs_optimised_molecules.sdf")
# or access the molecules directly
cs[0].to_file("best_conformers0.pdb")
# recreate the chemical space
cs = ChemSpace.from_sdf("cs_optimised_molecules.sdf")
Dask can be watched on http://192.168.178.20:33405/status
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/distributed/node.py:187: UserWarning: Port 8989 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33405 instead
warnings.warn(
# search the Enamine database for the best 3 scoring molecules in your chemical space
# and enrich your chemical space by adding them to the chemical space
# (relies on https://sw.docking.org/)
# cs.add_enamine_molecules(3)