Skip to content

8: Active Learning - Details#

Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole

Overview#

Configure the Active Learning

import pandas as pd
import prody
from rdkit import Chem

import fegrow
from fegrow import ChemSpace

from fegrow.testing import core_5R83_path, smiles_5R83_path, rec_5R83_path
# create the chemical space
cs = ChemSpace()
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(Chem.SDMolSupplier(core_5R83_path)[0])
cs.add_protein(rec_5R83_path)
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
  warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "


Dask can be watched on http://192.168.178.20:8989/status


/home/dresio/code/fegrow/fegrow/package.py:799: UserWarning: The template does not have an attachement (Atoms with index 0, or in case of Smiles the * character. )
  warnings.warn("The template does not have an attachement (Atoms with index 0, "
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
[13:43:23] DEPRECATION WARNING: please use MorganGenerator
# switch on the caching
cs.set_dask_caching()
# load 50k Smiles
data = pd.read_csv(smiles_5R83_path)

# take only 100
smiles = data.Smiles.to_list()[:200]

# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles)
# configure manually 5 cases
cs.df.loc[0, ("score", "Training")] = 3.248, True
cs.df.loc[1, ("score", "Training")] = 3.572, True
cs.df.loc[2, ("score", "Training")] = 3.687, True
cs.df.loc[3, ("score", "Training")] = 3.492, True
cs.df.loc[4, ("score", "Training")] = 3.208, True

Active Learning#

Warning! Please change the logger in order to see what is happening inside of ChemSpace.evaluate. There is too much info to output it into the screen .#

from fegrow.al import Model, Query
# This is the default configuration
cs.model = Model.gaussian_process()
cs.query = Query.Greedy()

cs.active_learning(3)
Smiles Mol score h Training Success enamine_searched enamine_id
18 [H]ON([H])C(=O)N([H])c1c([H])nc([H])c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x76de7d5bff40> <NA> <NA> False NaN False NaN
67 [H]OC(=S)N([H])c1c([H])nc([H])c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x76de7d5dd540> <NA> <NA> False NaN False NaN
16 [H]OC([H])([H])C(=O)N([H])c1c([H])nc([H])c([H]... <rdkit.Chem.rdchem.Mol object at 0x76de7d5bfe60> <NA> <NA> False NaN False NaN
cs.query = Query.UCB(beta=10)
cs.active_learning(3)
Smiles Mol score h Training Success enamine_searched enamine_id regression
162 [H]c1nc([H])c(OC(=O)N([H])OC([H])([H])[H])c([H... <rdkit.Chem.rdchem.Mol object at 0x76de7d5dfed0> <NA> <NA> False NaN False NaN 2.01
170 [H]c1nc([H])c(S(=O)(=O)N([H])C(=O)OC([H])([H])... <rdkit.Chem.rdchem.Mol object at 0x76de7d5e02e0> <NA> <NA> False NaN False NaN 2.01
182 [H]c1nc([H])c([C@@]([H])(C(=O)N([H])OC([H])([H... <rdkit.Chem.rdchem.Mol object at 0x76de7d5e0820> <NA> <NA> False NaN False NaN 1.93
# The query methods available in modAL.acquisition are made available, these include
# Query.greedy(), 
# Query.PI(tradeoff=0) - highest probability of improvement
# Query.EI(tradeoff=0) - highest expected improvement
# Query.UCB(beta=1) - highest upper confidence bound (employes modAL.models.BayesianOptimizer)

# Models include the scikit:
# Model.linear()
# Model.elastic_net()
# Model.random_forest()
# Model.gradient_boosting_regressor()
# Model.mlp_regressor()

# Model.gaussian_process()  # uses a TanimotoKernel by default, meaning that it
#                           # compares the fingerprints of all the training dataset
#                           # with the cases not yet studied, which can be expensive
#                           # computationally

cs.model = Model.linear()
cs.query = Query.Greedy()
cs.active_learning()
Smiles Mol score h Training Success enamine_searched enamine_id regression
18 [H]ON([H])C(=O)N([H])c1c([H])nc([H])c([H])c1[H] <rdkit.Chem.rdchem.Mol object at 0x76de7d5bff40> <NA> <NA> False NaN False NaN 2.99

Search the Enamine database usuing the sw.docking.org (check if online)#

Please note that you should check whether you have the permission to use this interface. Furthermore, you are going to need the pip package pydockingorg

# search only molecules similar to the best molecule score-wise (n_best)
# and return up to 5
new_enamines = cs.add_enamine_molecules(n_best=1, results_per_search=10)
Querying Enamine REAL. Looking up 1 smiles.
Found 10 in 6.730192184448242
Enamine returned with 10 rows in 6.7s.
Dask obabel protonation + scaffold test finished in 0.05s.
Tested scaffold presence. Kept 10/10.
Adding:  10


/home/dresio/code/fegrow/fegrow/package.py:1229: UserWarning: Only one H vector is assumed and used. Picking <NA> hydrogen on the scaffold. 
  warnings.warn(f"Only one H vector is assumed and used. Picking {vl.h[0]} hydrogen on the scaffold. ")
new_enamines
Smiles Mol score h Training Success enamine_searched enamine_id
200 O=C(C(O[H])([H])[H])N(c1c(c(c(nc1[H])[H])[H])[... <NA> <NA> <NA> False <NA> False PV-002350579485
201 C(C(=O)N(c1c(c(c(nc1[H])[H])[H])[H])[H])([H])(... <NA> <NA> <NA> False <NA> False PV-002362554605
202 N(C(=O)N(c1c(c(c(nc1[H])[H])[H])[H])[H])([H])[H] <NA> <NA> <NA> False <NA> False PV-002540479822
203 C(OC(=O)N(C(c1c(c(c(nc1[H])[H])[H])[H])([H])[H... <NA> <NA> <NA> False <NA> False PV-002472056239
204 O=C([O-])C(N(c1c(nc(c(Br)c1[H])[H])[H])[H])([H... <NA> <NA> <NA> False <NA> False Z2060314917
205 O=C(C(O[H])([H])[H])N(C(c1c(c(c(nc1[H])[H])[H]... <NA> <NA> <NA> False <NA> False Z1551688424
206 C(C(=O)N(c1c(c(c(Br)nc1[H])[H])[H])[H])([H])([... <NA> <NA> <NA> False <NA> False Z1442921413
207 C(C(=O)N(c1c(c(c(C([H])([H])[H])nc1[H])[H])[H]... <NA> <NA> <NA> False <NA> False PV-002273680800
208 C(C(=O)N(c1c(c(c(Cl)nc1[H])[H])[H])[H])([H])([... <NA> <NA> <NA> False <NA> False PV-002441695625
209 C(C(=O)N(c1c(c(c(N([H])[H])nc1[H])[H])[H])[H])... <NA> <NA> <NA> False <NA> False PV-003001152073