Skip to content

al #

Classes:

  • TanimotoKernel

    Custom Gaussian process kernel that computes Tanimoto similarity.

  • Query

TanimotoKernel #

TanimotoKernel()

Bases: NormalizedKernelMixin, StationaryKernelMixin, Kernel

Custom Gaussian process kernel that computes Tanimoto similarity.

Methods:

  • __call__

    Computes the pairwise Tanimoto similarity.

Source code in fegrow/al.py
42
43
def __init__(self):
    """Initializer."""

__call__ #

__call__(X, Y=None, eval_gradient=False)

Computes the pairwise Tanimoto similarity.

Parameters:

  • X

    Numpy array with shape [batch_size_a, num_features].

  • Y

    Numpy array with shape [batch_size_b, num_features]. If None, X is used.

  • eval_gradient

    Whether to compute the gradient.

Returns:

  • Numpy array with shape [batch_size_a, batch_size_b].

Raises:

  • NotImplementedError

    If eval_gradient is True.

Source code in fegrow/al.py
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def __call__(self, X, Y=None, eval_gradient=False):  # pylint: disable=invalid-name
    """Computes the pairwise Tanimoto similarity.

    Args:
      X: Numpy array with shape [batch_size_a, num_features].
      Y: Numpy array with shape [batch_size_b, num_features]. If None, X is
        used.
      eval_gradient: Whether to compute the gradient.

    Returns:
      Numpy array with shape [batch_size_a, batch_size_b].

    Raises:
      NotImplementedError: If eval_gradient is True.
    """
    if eval_gradient:
        raise NotImplementedError
    if Y is None:
        Y = X
    return _dask_tanimito_similarity(X, Y)

Query #

Methods:

  • Greedy

    Takes the best instances by inference value sorted in ascending order.

  • PI

    Maximum PI query strategy. Selects the instance with highest probability of improvement.

  • EI

    Maximum EI query strategy. Selects the instance with highest expected improvement.

  • UCB

    Maximum UCB query strategy. Selects the instance with highest upper confidence bound.

Greedy staticmethod #

Greedy() -> Callable

Takes the best instances by inference value sorted in ascending order.

Returns:

  • Callable

    The greedy function.

Source code in fegrow/al.py
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
@staticmethod
def Greedy() -> Callable:
    """Takes the best instances by inference value sorted in ascending order.

    Returns:
      The greedy function.
    """

    def greedy(optimizer, features, n_instances=1):
        """Takes the best instances by inference value sorted in ascending order.

        Args:
          optimizer: BaseLearner. Model to use to score instances.
          features: modALinput. Featurization of the instances to choose from.
          n_instances: Integer. The number of instances to select.

        Returns:
          Indices of the instances chosen.
        """
        return np.argpartition(optimizer.predict(features), n_instances)[
            :n_instances
        ]

    return functools.partial(greedy, fegrow_label="greedy")

PI staticmethod #

PI(tradeoff: float = 0) -> Callable

Maximum PI query strategy. Selects the instance with highest probability of improvement.

Parameters:

  • tradeoff (float, default: 0 ) –

    Value controlling the tradeoff parameter.

Returns:

  • Callable

    The function with pre-populated parameters.

Source code in fegrow/al.py
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
@staticmethod
def PI(tradeoff: float = 0) -> Callable:
    """
    Maximum PI query strategy. Selects the instance with highest probability of improvement.

    Args:
        tradeoff: Value controlling the tradeoff parameter.

    Returns:
        The function with pre-populated parameters.
    """
    return functools.partial(max_PI, tradeoff=tradeoff, fegrow_label="PI")

EI staticmethod #

EI(tradeoff: float = 0) -> Callable

Maximum EI query strategy. Selects the instance with highest expected improvement.

Parameters:

  • tradeoff (float, default: 0 ) –

    Value controlling the tradeoff parameter.

Returns:

  • Callable

    The function with pre-populated parameters.

Source code in fegrow/al.py
106
107
108
109
110
111
112
113
114
115
116
117
@staticmethod
def EI(tradeoff: float = 0) -> Callable:
    """
    Maximum EI query strategy. Selects the instance with highest expected improvement.

    Args:
        tradeoff: Value controlling the tradeoff parameter.

    Returns:
        The function with pre-populated parameters.
    """
    return functools.partial(max_EI, tradeoff=tradeoff, fegrow_label="EI")

UCB staticmethod #

UCB(beta: float = 1) -> Callable

Maximum UCB query strategy. Selects the instance with highest upper confidence bound.

Parameters:

  • beta (float, default: 1 ) –

    Value controlling the beta parameter.

Returns:

  • Callable

    The function with pre-populated parameters.

Source code in fegrow/al.py
119
120
121
122
123
124
125
126
127
128
129
130
@staticmethod
def UCB(beta: float = 1) -> Callable:
    """
    Maximum UCB query strategy. Selects the instance with highest upper confidence bound.

    Args:
        beta: Value controlling the beta parameter.

    Returns:
        The function with pre-populated parameters.
    """
    return functools.partial(max_UCB, beta=beta, fegrow_label="UCB")

_dask_tanimito_similarity #

_dask_tanimito_similarity(a, b)

Fixme this does not need to use matmul anymore because it's not a single core. This can be transitioned to simple row by row dispatching.

Source code in fegrow/al.py
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def _dask_tanimito_similarity(a, b):
    """
    Fixme this does not need to use matmul anymore because it's not a single core.
    This can be transitioned to simple row by row dispatching.
    """
    logger.info(f"About to compute tanimoto for array lengths {len(a)} and {len(b)}")
    start = time.time()
    chunk_size = 8_000
    da = dask.array.from_array(a, chunks=chunk_size)
    db = dask.array.from_array(b, chunks=chunk_size)
    aa = dask.array.sum(da, axis=1, keepdims=True)
    bb = dask.array.sum(db, axis=1, keepdims=True)
    ab = dask.array.matmul(da, db.T)
    td = dask.array.true_divide(ab, aa + bb.T - ab)
    td_computed = td.compute()
    logger.info(
        f"Computed tanimoto similarity in {time.time() - start:.2f}s for array lengths {len(a)} and {len(b)}"
    )
    return td_computed