IDaRS Theory

IDaRS stands for Iterative Draw and Rank Sampling, an algorithm introduced in an article by Bilal et al., “Development and validation of a weakly supervised deep learning framework to predict the status of molecular pathways and key mutations in colorectal cancer from routine histology images: a retrospective study”. [See DOI: https://doi.org/10.1016/S2589-7500(21)00180-1], where Supplementary Materials are also available.

The algorithm is used to infer (predict) tile-level labels from slide-level labels, transforming information that is originally known only at low resolution into high resolution information. It is novel, general, and also faster and more accurate than previous procedures.

In this section, we discuss the IDaRS algorithm from a theoretical point of view. Pseudocode for the algorithm is given below.

Each WSI is divided into small rectangles, called tiles, all of the same size, with \(h\) rows and \(w\) columns of pixels, for some user-chosen \(h\) and \(w\). Let \(F\) be a particular feature of interest, which may or may not exist in a particular tile or WSI. In the experiments described in the paper referred to above, \(F\) is some genetic abnormality (perhaps determined by IHC or PCR), but it could be any feature that makes sense both for tiles and for WSIs, and such that, if a WSI has the feature \(F\), then some of its tiles also have that feature. For the IDaRS procedure, it is assumed that each WSI used in this section comes provided with a label \(True=1\), saying that the WSI has the feature \(F\), or a label \(False=0\) saying that it does not.

After training, IDaRS can be used to provide the label for an unlabelled WSI. It can also be used to locate regions with the feature \(F\), potentially improving biological understanding. To see an inference on a trained IDaRS model [click here].

Let \(T\) be a tile of one of our WSIs. We set \(p(T)\) equal to the label of the WSI containing \(T\), so that \(p(T)\) is equal to either \(0 \text{ or } 1\), and, as \(T\) varies over the tiles of a fixed WSI, \(p(T)\) is constant. We wish to estimate \(q(T)\in[0,1]\), the probability that \(T\) has feature \(F\).

Given a set \(\mathcal{S}\) of tiles, possibly drawn from many WSIs, we use, as the loss function, cross entropy (or some variant of it) \(L_{\mathcal{S}}\) , a real-valued function, defined by:

\[L_{\mathcal{S}}(q) = \sum_{T\in \mathcal{S}}(p(T)\log(q(T))+(1-p(T))\log(1-q(T))).\]

This function has a unique local minimum, when its domain is the set of ALL functions \(q:\mathcal{S}\to[0,1]\), and this local minimum is also a global minimum, at \(q=p\). However, we require \(q(T)\) to depend only on the pixel values of \(T\). To express this mathematically, let \(\pi:\mathcal{S}\to\mathbb{R}^{h\times w\times 3}\) be the map that sends a tile \(T\) to the RGB intensities of its \(h.w\) pixels. Given a function \(q_0:\mathbb{R}^{h\times w\times 3}\to[0,1]\), we define \(q = \pi\circ q_0\), and then \(L_{\mathcal{S}}(q)\) can be calculated. We expect many local minima for \(L_{\mathcal{S}}\ ,\) each having values greater than the global minimum at \(q=p\).

Parameters \(r\) and \(k\) are chosen by the user, and the following algorithm is applied, starting with \(q=p\). Successive values for \(q_0\), and hence for \(q\), are produced by the algorithm, using stochastic gradient descent as usual.

\(nts = \emptyset\) # Next Training Set starts empty
for each labelled WSI \(W_i\)
determine the subset \(\mathcal{S}_i\) of tumour tiles
add \(r+k\) randomly chosen tiles of \(\mathcal{S}_i\) to nts
for each epoch
\(ts = nts\)
# processing \(ts\) is fast, because \(ts\) is comparatively small
randomize and divide \(ts\) into batches of a fixed convenient size
for each batch
calculate loss per tile
# next, change the weights that determine \(q_0,\) and, hence, also \(q\).
backpropagate the loss per batch
# create the next training set \(nts\)
\(nts = \emptyset\) # new training set starts empty
for each \(W_i\)
to \(nts\) add the \(k\) top probability tiles in \(\mathcal{S}_i \cap ts\)
# alternatively add the \(k\) tiles with smallest loss
to \(nts\) add \(r\) further tiles randomly chosen from \(\mathcal{S}_i\)

The above pseudocode gives a crude but correct summary of the Python computer program discussed and explained in the paper by Bilal et al, cited in the first paragraph above. The pseudocode is also a correct summary of the slightly different IDaRS program in this repository. It is more careful than the pseudocode presented in the Supplement to the original paper.

The IDaRS algorithm is effective because it is very likely that the \(k\) chosen tiles combined with iteratively updated random \(r\) tiles will contribute most to moving the weights in the desired direction.