7: Active Learning and Enamine
Authors: Mateusz K Bieniek, Ben Cree, Rachael Pirie, Joshua T. Horton, Natalie J. Tatum, Daniel J. Cole
Overview
Configure the Active Learning
import pandas as pd
import prody
from rdkit import Chem
import fegrow
from fegrow import ChemSpace
from fegrow.testing import core_5R83_path, smiles_5R83_path, rec_5R83_path
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterHierarchyMatcher> already registered; second conversion method ignored.
<frozen importlib._bootstrap>:241: RuntimeWarning: to-Python converter for boost::shared_ptr<RDKit::FilterCatalogEntry> already registered; second conversion method ignored.
# create the chemical space
cs = ChemSpace()
# we're not growing the scaffold, we're superimposing bigger molecules on it
cs.add_scaffold(Chem.SDMolSupplier(core_5R83_path)[0])
cs.add_protein(rec_5R83_path)
/home/dresio/code/fegrow/fegrow/package.py:595: UserWarning: ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. Use a Dask cluster with processes as a work around (see the documentation for an example of this workaround) .
warnings.warn("ANI uses TORCHAni which is not threadsafe, leading to random SEGFAULTS. "
Dask can be watched on http://192.168.178.20:8989/status
/home/dresio/code/fegrow/fegrow/package.py:799: UserWarning: The template does not have an attachement (Atoms with index 0, or in case of Smiles the * character. )
warnings.warn("The template does not have an attachement (Atoms with index 0, "
Generated 7 conformers.
Generated 5 conformers.
Generated 12 conformers.
Removed 0 conformers.
Removed 0 conformers.
Removed 0 conformers.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
Using force field
Optimising conformer: 100%|███████████████████████| 7/7 [00:01<00:00, 6.48it/s]
Using force field
Optimising conformer: 100%|███████████████████████| 5/5 [00:00<00:00, 10.19it/s]
Using force field
Optimising conformer: 100%|█████████████████████| 12/12 [00:02<00:00, 5.38it/s]
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
[13:17:40] DEPRECATION WARNING: please use MorganGenerator
Generated 2 conformers.
Generated 5 conformers.
Generated 5 conformers.
Removed 0 conformers.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
Removed 0 conformers.
Removed 0 conformers.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
Using force field
Optimising conformer: 20%|████▌ | 1/5 [00:00<00:00, 5.66it/s]
using ani2x
Optimising conformer: 100%|███████████████████████| 5/5 [00:01<00:00, 3.49it/s]
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
warnings.warn("cuaev not installed")
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/__init__.py:59: UserWarning: Dependency not satisfied, torchani.ase will not be available
warnings.warn("Dependency not satisfied, torchani.ase will not be available")
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 0%| | 0/5 [00:00<?, ?it/s]
Optimising conformer: 40%|█████████▏ | 2/5 [00:06<00:08, 2.97s/it]
[Aimising conformer: 50%|███████████▌ | 1/2 [00:04<00:04, 4.56s/it]
Optimising conformer: 100%|███████████████████████| 2/2 [00:05<00:00, 2.95s/it]
Optimising conformer: 100%|███████████████████████| 5/5 [00:10<00:00, 2.06s/it]
Generated 2 conformers.
Generated 2 conformers.
Generated 2 conformers.
Removed 0 conformers.
Removed 0 conformers.
Removed 0 conformers.
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/parmed/structure.py:1799: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
coords = np.asanyarray(value, dtype=np.float64)
Using force field
Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00, 6.47it/s]
Using force field
Optimising conformer: 100%|███████████████████████| 2/2 [00:00<00:00, 9.18it/s]
using ani2x
/home/dresio/software/mambaforge/envs/fegrow-onechannel/lib/python3.11/site-packages/torchani/resources/
failed to equip `nnpops` with error: No module named 'NNPOps'
Optimising conformer: 100%|███████████████████████| 2/2 [00:02<00:00, 1.06s/it]
# turn on the caching in RAM (optional)
cs.set_dask_caching()
# load 50k Smiles
smiles = pd.read_csv(smiles_5R83_path).Smiles.to_list()
# for testing, sort by size and pick small
smiles.sort(key=len)
# take 200 smallest smiles
smiles = smiles[:200]
# here we add Smiles which should already have been matched
# to the scaffold (rdkit Mol.HasSubstructureMatch)
cs.add_smiles(smiles)
|
Smiles |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
2D |
0 |
[H]c1nc([H])c(SF)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
1 |
[H]c1nc([H])c(SI)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
2 |
[H]c1nc([H])c(SCl)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
3 |
[H]c1nc([H])c(SBr)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
4 |
[H]c1nc([H])c(C#CF)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
... |
... |
... |
... |
... |
... |
... |
... |
... |
195 |
[H]c1nc([H])c(-n2nnc(F)c2[H])c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
196 |
[H]c1nc([H])c(-c2nnn(F)c2[H])c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
197 |
[H]c1nc([H])c(C([H])([H])C#N)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
198 |
[H]c1nc([H])c(C(=O)N([H])C#N)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
199 |
[H]c1nc([H])c(N([H])C(=O)C#N)c([H])c1[H] |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
|
200 rows × 8 columns
Active Learning
# There is nothing to train the model on, so initially "first_random" is used by default
random1 = cs.active_learning(3, first_random=True)
random2 = cs.active_learning(3, first_random=True)
# note the different indices selected (unless you're lucky!)
print(random1.index.to_list(), random2.index.to_list())
[149, 49, 151] [160, 112, 153]
/home/dresio/code/fegrow/fegrow/package.py:1284: UserWarning: Selecting randomly the first samples to be studied (no score data yet).
warnings.warn("Selecting randomly the first samples to be studied (no score data yet). ")
# now evaluate the first selection
random1_results = cs.evaluate(random1, ani=False)
# check the scores, note that they were updated in the master dataframe too
random1_results
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
149 |
[H]c1nc([H])c(OC([H])([H])F)c([H])c1[H] |
<fegrow.package.RMol object at 0x72731d573f10> |
3.283 |
<NA> |
True |
True |
False |
NaN |
49 |
[H]c1nc([H])c(OC(F)(F)F)c([H])c1[H] |
<fegrow.package.RMol object at 0x72731d573600> |
3.291 |
<NA> |
True |
True |
False |
NaN |
151 |
[H]c1nc([H])c(C([H])([H])SF)c([H])c1[H] |
<fegrow.package.RMol object at 0x72731d54ee80> |
3.856 |
<NA> |
True |
True |
False |
NaN |
# by default Gaussian Process with Greedy approach is used
# note that this time
greedy1 = cs.active_learning(3)
greedy2 = cs.active_learning(3)
print(greedy1.index.to_list(), greedy2.index.to_list())
[113, 168, 191] [113, 168, 191]
# learn in cycles
for cycle in range(2):
greedy = cs.active_learning(3)
greedy_results = cs.evaluate(greedy)
# save the new results
greedy_results.to_csv(f'notebook6_iteration{cycle}_results.csv')
# save the entire chemical space with all the results
cs.to_sdf('notebook6_chemspace.sdf')
computed = cs.df[~cs.df.score.isna()]
print('Computed cases in total: ', len(computed))
Computed cases in total: 9
from fegrow.al import Model, Query
# This is the default configuration
cs.model = Model.gaussian_process()
cs.query = Query.Greedy()
cs.active_learning(3)
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
regression |
166 |
[H]c1nc([H])c(OC([H])([H])I)c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731c1db4c0> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.718 |
197 |
[H]c1nc([H])c(C([H])([H])C#N)c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731d552c70> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.633 |
189 |
[H]c1nc([H])c(OC([H])([H])Cl)c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731c1dbed0> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.704 |
cs.query = Query.UCB(beta=10)
cs.active_learning(3)
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
regression |
33 |
[H]C#CSc1c([H])nc([H])c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731d687920> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.137 |
7 |
[H]c1nc([H])c(SC#N)c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731d687a00> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.087 |
54 |
[H]C(=O)Sc1c([H])nc([H])c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731c1d87b0> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
2.087 |
# The query methods available in modAL.acquisition are made available, these include
# Query.greedy(),
# Query.PI(tradeoff=0) - highest probability of improvement
# Query.EI(tradeoff=0) - highest expected improvement
# Query.UCB(beta=1) - highest upper confidence bound (employes modAL.models.BayesianOptimizer)
# Models include the scikit:
# Model.linear()
# Model.elastic_net()
# Model.random_forest()
# Model.gradient_boosting_regressor()
# Model.mlp_regressor()
# Model.gaussian_process() # uses a TanimotoKernel by default, meaning that it
# # compares the fingerprints of all the training dataset
# # with the cases not yet studied, which can be expensive
# # computationally
cs.model = Model.linear()
cs.query = Query.Greedy()
cs.active_learning()
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
regression |
177 |
[H]c1nc([H])c(Sc2nnn([H])n2)c([H])c1[H] |
<rdkit.Chem.rdchem.Mol object at 0x72731c1db990> |
<NA> |
<NA> |
False |
NaN |
False |
NaN |
1.8 |
Search the Enamine database usuing the sw.docking.org (check if online)
Please note that you should check whether you have the permission to use this interface.
Furthermore, you are going to need the pip package pydockingorg
# search only molecules similar to the best molecule score-wise (n_best)
# and return up to 5
new_enamines = cs.add_enamine_molecules(n_best=1, results_per_search=10)
Querying Enamine REAL. Looking up 1 smiles.
Found 10 in 6.407189130783081
Enamine returned with 10 rows in 6.4s.
Dask obabel protonation + scaffold test finished in 0.06s.
Tested scaffold presence. Kept 10/10.
Adding: 10
/home/dresio/code/fegrow/fegrow/package.py:1229: UserWarning: Only one H vector is assumed and used. Picking <NA> hydrogen on the scaffold.
warnings.warn(f"Only one H vector is assumed and used. Picking {vl.h[0]} hydrogen on the scaffold. ")
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
200 |
C(SC(c1c(c(c(nc1[H])[H])[H])[H])([H])[H])([H])... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-002558062946 |
201 |
C(SC(c1c(c(c(Br)nc1[H])[H])[H])([H])[H])([H])(... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
Z3340872668 |
202 |
C(SC(c1c(c(c(C([H])([H])[H])nc1[H])[H])[H])([H... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-002903174203 |
203 |
C(SC(c1c(c(c(Cl)nc1[H])[H])[H])([H])[H])([H])(... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-004253211555 |
204 |
C(SC(c1c(c(c(F)nc1[H])[H])[H])([H])[H])([H])([... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-005723429185 |
205 |
C(SC(c1c(c(c(nc1Br)[H])[H])[H])([H])[H])([H])(... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
Z2832853555 |
206 |
C(SC(c1c(c(c(nc1C([H])([H])[H])[H])[H])[H])([H... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-003024225282 |
207 |
C(SC(c1c(c(c(nc1Cl)[H])[H])[H])([H])[H])([H])(... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-004696925594 |
208 |
C(SC(c1c(c(c(nc1F)[H])[H])[H])([H])[H])([H])([... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-005922678029 |
209 |
C(SC(c1c(nc(c(c1Cl)[H])[H])[H])([H])[H])([H])(... |
<NA> |
<NA> |
<NA> |
False |
<NA> |
False |
PV-002978169168 |
# we marked the molecules to avoid searching for them again
# for that we use the column "enamine_searched"
cs.df[cs.df.enamine_searched == True]
|
Smiles |
Mol |
score |
h |
Training |
Success |
enamine_searched |
enamine_id |
regression |
168 |
[H]c1nc([H])c(C([H])([H])SI)c([H])c1[H] |
<fegrow.package.RMol object at 0x7272ea02c720> |
3.858 |
<NA> |
True |
True |
True |
NaN |
3.858 |