SlideGraphConstructor

class SlideGraphConstructor[source]

Construct a graph using the SlideGraph+ (Liu et al. 2021) method.

This uses a hybrid agglomerative clustering which uses a weighted combination of spatial distance (within the WSI) and feature-space distance to group patches into nodes. See the build function for more details on the graph construction method.

Methods

build

Build a graph via hybrid clustering in spatial and feature space.

visualise

Visualise a graph.

static build(points, features, lambda_d=0.003, lambda_f=0.001, lambda_h=0.8, connectivity_distance=4000, neighbour_search_radius=2000, feature_range_thresh=0.0001)[source]

Build a graph via hybrid clustering in spatial and feature space.

The graph is constructed via hybrid hierarchical clustering followed by Delaunay triangulation of these cluster centroids. This is part of the SlideGraph pipeline but may be used to construct a graph in general from point coordinates and features.

The clustering uses a distance kernel, ranging between 0 and 1, which is a weighted product of spatial distance (distance between coordinates in points, e.g. WSI location) and feature-space distance (e.g. ResNet features).

Points which are spatially further apart than neighbour_search_radius are given a similarity of 1 (most dissimilar). This significantly speeds up computation. This distance metric is then used to form clusters via hierarchical/agglomerative clustering.

Next, a Delaunay triangulation is applied to the clusters to connect the neighouring clusters. Only clusters which are closer than connectivity_distance in the spatial domain will be connected.

Parameters:
  • points (ArrayLike) – A list of (x, y) spatial coordinates, e.g. pixel locations within a WSI.

  • features (ArrayLike) – A list of features associated with each coordinate in points. Must be the same length as points.

  • lambda_d (Number) – Spatial distance (d) weighting.

  • lambda_f (Number) – Feature distance (f) weighting.

  • lambda_h (Number) – Clustering distance threshold. Applied to the similarity kernel (1-fd). Ranges between 0 and 1. Defaults to 0.8. A good value for this parameter will depend on the intra-cluster variance.

  • connectivity_distance (Number) – Spatial distance threshold to consider points as connected during the Delaunay triangulation step.

  • neighbour_search_radius (Number) – Search radius (L2 norm) threshold for points to be considered as similar for clustering. Points with a spatial distance above this are not compared and have a similarity set to 1 (most dissimilar).

  • feature_range_thresh (Number) – Minimal range for which a feature is considered significant. Features which have a range less than this are ignored. Defaults to 1e-4. If falsy (None, False, 0, etc.), then no features are removed.

Returns:

A dictionary defining a graph for serialisation (e.g. JSON or msgpack) or converting into a torch-geometric Data object where each node is the centroid (mean) of the features in a cluster.

The dictionary has the following entries:

  • numpy.ndarray - x:

    Features of each node (mean of features in a cluster). Required for torch-geometric Data.

  • numpy.ndarray - edge_index:

    Edge index matrix defining connectivity. Required for torch-geometric Data.

  • numpy.ndarray - coords:

    Coordinates of each node within the WSI (mean of point in a cluster). Useful for visualisation over the WSI.

Return type:

dict

Example

>>> rng = np.random.default_rng()
>>> points = rng.random((99, 2)) * 1000
>>> features = np.array([
...     rng.random(11) * n
...     for n, _ in enumerate(points)
... ])
>>> graph_dict = SlideGraphConstructor.build(points, features)
classmethod visualise(graph, color=None, node_size=25, edge_color=(0, 0, 0, 0.33), ax=None)[source]

Visualise a graph.

The visualisation is a scatter plot of the graph nodes and the connections between them. By default, nodes are coloured according to the features of the graph via a UMAP embedding to the sRGB color space. This can be customised by passing a color argument which can be a single color, a list of colors, or a function which takes the graph and returns a list of colors for each node. The edge color(s) can be customised in the same way.

Parameters:
  • graph (dict) –

    The graph to visualise as a dictionary with the following entries:

    • numpy.ndarray - x:

      Features of each node (mean of features in a cluster). Required

    • numpy.ndarray - edge_index:

      Edge index matrix defining connectivity. Required

    • numpy.ndarray - coordinates:

      Coordinates of each node within the WSI (mean of point in a cluster). Required

  • color (np.array or str or callable) – Colours of the nodes in the plot. If it is a callable, it should take a graph as input and return a numpy array of matplotlib colours. If None then a default function is used (UMAP on graph[“x”]).

  • node_size (int or np.ndarray or callable) – Size of the nodes in the plot. If it is a function then it is called with the graph as an argument.

  • edge_color (str) – Colour of edges in the graph plot.

  • ax (matplotlib.axes.Axes) – The axes which were plotted on.

Returns:

The axes object to plot the graph on.

Return type:

matplotlib.axes.Axes

Example

>>> rng = np.random.default_rng()
>>> points = rng.random((99, 2)) * 1000
>>> features = np.array([
...     rng.random(11) * n
...     for n, _ in enumerate(points)
... ])
>>> graph_dict = SlideGraphConstructor.build(points, features)
>>> fig, ax = plt.subplots()
>>> slide_dims = wsi.info.slide_dimensions
>>> ax.imshow(wsi.get_thumbnail(), extent=(0, *slide_dims, 0))
>>> SlideGraphConstructor.visualise(graph_dict, ax=ax)
>>> plt.show()