Skip to content

7: Active Learning and Enamine#

Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole

Overview#

Configure the Active Learning

import pandas as pd
import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace

from fegrow.testing import core_5R83_path, smiles_5R83_path, rec_5R83_path
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
# create the chemical space
cs = ChemSpace()
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(Chem.SDMolSupplier(core_5R83_path)[0])
cs.add_protein(rec_5R83_path)
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
  warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "


Dask can be watched on http://192.168.178.20:8989/status


/home/dresio/code/fegrow/fegrow/package.py:799: UserWarning: The template does not have an attachement (Atoms with index 0, or in case of Smiles the * character. )
  warnings.warn("The template does not have an attachement (Atoms with index 0, "


Generated 7 conformers. 
Generated 5 conformers. 
Generated 12 conformers. 
Removed 0 conformers. 
Removed 0 conformers. 
Removed 0 conformers.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


Using force field


Optimising conformer: 100%|███████████████████████| 7/7 [00:01<00:00,  6.48it/s]


Using force field


Optimising conformer: 100%|███████████████████████| 5/5 [00:00<00:00, 10.19it/s]


Using force field


Optimising conformer: 100%|█████████████████████| 12/12 [00:02<00:00,  5.38it/s]
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator


Generated 2 conformers. 
Generated 5 conformers. 
Generated 5 conformers. 
Removed 0 conformers.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


Removed 0 conformers. 
Removed 0 conformers.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


Using force field


Optimising conformer:  20%|████▌                  | 1/5 [00:00<00:00,  5.66it/s]

using ani2x


Optimising conformer: 100%|███████████████████████| 5/5 [00:01<00:00,  3.49it/s]


using ani2x


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
  warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer:   0%|                               | 0/5 [00:00<?, ?it/s]
Optimising conformer:  40%|█████████▏             | 2/5 [00:06<00:08,  2.97s/it]
imising conformer:  50%|███████████▌           | 1/2 [00:04<00:04,  4.56s/it]
Optimising conformer: 100%|███████████████████████| 2/2 [00:05<00:00,  2.95s/it]
Optimising conformer: 100%|███████████████████████| 5/5 [00:10<00:00,  2.06s/it]


Generated 2 conformers. 
Generated 2 conformers. 
Generated 2 conformers. 
Removed 0 conformers. 
Removed 0 conformers. 
Removed 0 conformers.


/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  coords = np.asanyarray(value, dtype=np.float64)


Using force field


Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00,  6.47it/s]


Using force field


Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00,  9.18it/s]


using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'


Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00,  1.06s/it]
# turn on the caching in RAM (optional)
cs.set_dask_caching()
# load 50k Smiles
smiles = pd.read_csv(smiles_5R83_path).Smiles.to_list()

# for testing, sort by size and pick small
smiles.sort(key=len)
# take 200 smallest smiles
smiles = smiles[:200]

# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles)
cs
Smiles score h Training Success enamine_searched enamine_id 2D
0 [H]c1nc([H])c(SF)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
1 [H]c1nc([H])c(SI)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
2 [H]c1nc([H])c(SCl)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
3 [H]c1nc([H])c(SBr)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
4 [H]c1nc([H])c(C#CF)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
... ... ... ... ... ... ... ... ...
195 [H]c1nc([H])c(-n2nnc(F)c2[H])c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
196 [H]c1nc([H])c(-c2nnn(F)c2[H])c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
197 [H]c1nc([H])c(C([H])([H])C#N)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
198 [H]c1nc([H])c(C(=O)N([H])C#N)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol
199 [H]c1nc([H])c(N([H])C(=O)C#N)c([H])c1[H] <NA> <NA> False NaN False NaN
Mol

200 rows × 8 columns

Active Learning#

# There is nothing to train the model on, so initially "first_random" is used by default
random1 = cs.active_learning(3, first_random=True)
random2 = cs.active_learning(3, first_random=True)

# note the different indices selected (unless you're lucky!)
print(random1.index.to_list(), random2.index.to_list())
[149, 49, 151] [160, 112, 153]


/home/dresio/code/fegrow/fegrow/package.py:1284: UserWarning: Selecting randomly the first samples to be studied (no score data yet). 
  warnings.warn("Selecting randomly the first samples to be studied (no score data yet). ")
# now evaluate the first selection
random1_results = cs.evaluate(random1, ani=False)
# check the scores, note that they were updated in the master dataframe too
random1_results
Smiles Mol score h Training Success enamine_searched enamine_id
149 [H]c1nc([H])c(OC([H])([H])F)c([H])c1[H] <fegrow.package.RMol object at 0x72731d573f10> 3.283 <NA> True True False NaN
49 [H]c1nc([H])c(OC(F)(F)F)c([H])c1[H] <fegrow.package.RMol object at 0x72731d573600> 3.291 <NA> True True False NaN
151 [H]c1nc([H])c(C([H])([H])SF)c([H])c1[H] <fegrow.package.RMol object at 0x72731d54ee80> 3.856 <NA> True True False NaN
# by default Gaussian Process with Greedy approach is used
# note that this time 
greedy1 = cs.active_learning(3)
greedy2 = cs.active_learning(3)
print(greedy1.index.to_list(), greedy2.index.to_list())
[113, 168, 191] [113, 168, 191]
# learn in cycles
for cycle in range(2):
    greedy = cs.active_learning(3)
    greedy_results = cs.evaluate(greedy)

    # save the new results
    greedy_results.to_csv(f'notebook6_iteration{cycle}_results.csv')

# save the entire chemical space with all the results
cs.to_sdf('notebook6_chemspace.sdf')
computed = cs.df[~cs.df.score.isna()]
print('Computed cases in total: ', len(computed))
Computed cases in total:  9
from fegrow.al import Model, Query
# This is the default configuration
cs.model = Model.gaussian_process()
cs.query = Query.Greedy()

cs.active_learning(3)
Smiles Mol score h Training Success enamine_searched enamine_id regression
166 [H]c1nc([H])c(OC([H])([H])I)c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731c1db4c0> <NA> <NA> False NaN False NaN 2.718
197 [H]c1nc([H])c(C([H])([H])C#N)c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731d552c70> <NA> <NA> False NaN False NaN 2.633
189 [H]c1nc([H])c(OC([H])([H])Cl)c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731c1dbed0> <NA> <NA> False NaN False NaN 2.704
cs.query = Query.UCB(beta=10)
cs.active_learning(3)
Smiles Mol score h Training Success enamine_searched enamine_id regression
33 [H]C#CSc1c([H])nc([H])c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731d687920> <NA> <NA> False NaN False NaN 2.137
7 [H]c1nc([H])c(SC#N)c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731d687a00> <NA> <NA> False NaN False NaN 2.087
54 [H]C(=O)Sc1c([H])nc([H])c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731c1d87b0> <NA> <NA> False NaN False NaN 2.087
# The query methods available in modAL.acquisition are made available, these include
# Query.greedy(), 
# Query.PI(tradeoff=0) - highest probability of improvement
# Query.EI(tradeoff=0) - highest expected improvement
# Query.UCB(beta=1) - highest upper confidence bound (employes modAL.models.BayesianOptimizer)

# Models include the scikit:
# Model.linear()
# Model.elastic_net()
# Model.random_forest()
# Model.gradient_boosting_regressor()
# Model.mlp_regressor()

# Model.gaussian_process()  # uses a TanimotoKernel by default, meaning that it
#                           # compares the fingerprints of all the training dataset
#                           # with the cases not yet studied, which can be expensive
#                           # computationally

cs.model = Model.linear()
cs.query = Query.Greedy()
cs.active_learning()
Smiles Mol score h Training Success enamine_searched enamine_id regression
177 [H]c1nc([H])c(Sc2nnn([H])n2)c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x72731c1db990> <NA> <NA> False NaN False NaN 1.8

Search the Enamine database usuing the sw.docking.org (check if online)#

Please note that you should check whether you have the permission to use this interface. Furthermore, you are going to need the pip package pydockingorg

# search only molecules similar to the best molecule score-wise (n_best)
# and return up to 5
new_enamines = cs.add_enamine_molecules(n_best=1, results_per_search=10)
Querying Enamine REAL. Looking up 1 smiles.
Found 10 in 6.407189130783081
Enamine returned with 10 rows in 6.4s.
Dask obabel protonation + scaffold test finished in 0.06s.
Tested scaffold presence. Kept 10/10.
Adding:  10


/home/dresio/code/fegrow/fegrow/package.py:1229: UserWarning: Only one H vector is assumed and used. Picking <NA> hydrogen on the scaffold. 
  warnings.warn(f"Only one H vector is assumed and used. Picking {vl.h[0]} hydrogen on the scaffold. ")
new_enamines
Smiles Mol score h Training Success enamine_searched enamine_id
200 C(SC(c1c(c(c(nc1[H])[H])[H])[H])([H])[H])([H])... <NA> <NA> <NA> False <NA> False PV-002558062946
201 C(SC(c1c(c(c(Br)nc1[H])[H])[H])([H])[H])([H])(... <NA> <NA> <NA> False <NA> False Z3340872668
202 C(SC(c1c(c(c(C([H])([H])[H])nc1[H])[H])[H])([H... <NA> <NA> <NA> False <NA> False PV-002903174203
203 C(SC(c1c(c(c(Cl)nc1[H])[H])[H])([H])[H])([H])(... <NA> <NA> <NA> False <NA> False PV-004253211555
204 C(SC(c1c(c(c(F)nc1[H])[H])[H])([H])[H])([H])([... <NA> <NA> <NA> False <NA> False PV-005723429185
205 C(SC(c1c(c(c(nc1Br)[H])[H])[H])([H])[H])([H])(... <NA> <NA> <NA> False <NA> False Z2832853555
206 C(SC(c1c(c(c(nc1C([H])([H])[H])[H])[H])[H])([H... <NA> <NA> <NA> False <NA> False PV-003024225282
207 C(SC(c1c(c(c(nc1Cl)[H])[H])[H])([H])[H])([H])(... <NA> <NA> <NA> False <NA> False PV-004696925594
208 C(SC(c1c(c(c(nc1F)[H])[H])[H])([H])[H])([H])([... <NA> <NA> <NA> False <NA> False PV-005922678029
209 C(SC(c1c(nc(c(c1Cl)[H])[H])[H])([H])[H])([H])(... <NA> <NA> <NA> False <NA> False PV-002978169168
# we marked the molecules to avoid searching for them again
# for that we use the column "enamine_searched"
cs.df[cs.df.enamine_searched == True]
Smiles Mol score h Training Success enamine_searched enamine_id regression
168 [H]c1nc([H])c(C([H])([H])SI)c([H])c1[H] <fegrow.package.RMol object at 0x7272ea02c720> 3.858 <NA> True True True NaN 3.858