PatchExtractor

class PatchExtractor(input_img, patch_size, input_mask=None, resolution=0, units='level', pad_mode='constant', pad_constant_values=0, within_bound=False)[source]

Class for extracting and merging patches in standard and whole-slide images.

Parameters
  • input_img (str, pathlib.Path, numpy.ndarray) – input image for patch extraction.

  • patch_size (int or tuple(int)) – patch size tuple (width, height).

  • input_mask (str, pathlib.Path, numpy.ndarray, or WSIReader) – input mask that is used for position filtering when extracting patches i.e., patches will only be extracted based on the highlighted regions in the input_mask. input_mask can be either path to the mask, a numpy array, VirtualWSIReader, or one of ‘otsu’ and ‘morphological’ options. In case of ‘otsu’ or ‘morphological’, a tissue mask is generated for the input_image using tiatoolbox TissueMasker functionality.

  • resolution (int or float or tuple of float) – resolution at which to read the image, default = 0. Either a single number or a sequence of two numbers for x and y are valid. This value is in terms of the corresponding units. For example: resolution=0.5 and units=”mpp” will read the slide at 0.5 microns per-pixel, and resolution=3, units=”level” will read at level at pyramid level / resolution layer 3.

  • units (str) – the units of resolution, default = “level”. Supported units are: microns per pixel (mpp), objective power (power), pyramid / resolution level (level), Only pyramid / resolution levels (level) embedded in the whole slide image are supported.

  • pad_mode (str) – Method for padding at edges of the WSI. Default to ‘constant’. See numpy.pad() for more information.

  • pad_constant_values (int or tuple(int)) – Values to use with constant padding. Defaults to 0. See numpy.pad() for more.

  • within_bound (bool) – whether to extract patches beyond the input_image size limits. If False, extracted patches at margins will be padded appropriately based on pad_constant_values and pad_mode. If False, patches at the margin that their bounds exceed the mother image dimensions would be neglected. Default is False.

wsi

input image for patch extraction of type WSIReader.

Type

WSIReader

patch_size

patch size tuple (width, height).

Type

tuple(int)

resolution

resolution at which to read the image.

Type

tuple(int)

units

the units of resolution.

Type

str

n

current state of the iterator.

Type

int

locations_df

A table containing location and/or type of patces in (x_start, y_start, class) format.

Type

pd.DataFrame

coord_list

An array containing coordinates of patches in (x_start, y_start, x_end, y_end) format to be used for slidingwindow patch extraction.

Type

numpy.ndarray

pad_mode

Method for padding at edges of the WSI. See numpy.pad() for more information.

Type

str

pad_constant_values

Values to use with constant padding. Defaults to 0. See numpy.pad() for more.

Type

int or tuple(int)

stride

stride in (x, y) direction for patch extraction. Not used for PointsPatchExtractor

Type

tuple(int)

Methods

filter_coordinates

Indicates which coordinate is valid for mask-based patch extraction.

filter_coordinates_fast

Validate patch extraction coordinates based on the input mask.

get_coordinates

Calculate patch tiling coordinates.

static filter_coordinates(mask_reader, coordinates_list, func=None, resolution=None, units=None)[source]

Indicates which coordinate is valid for mask-based patch extraction. Locations are being validated by a custom or build-in func.

Parameters
  • mask_reader (VirtualReader) – a virtual pyramidal reader of the mask related to the WSI from which we want to extract the patches.

  • coordinates_list (ndarray and np.int32) – Coordinates to be checked via the func. They must be in the same resolution as requested resolution and units. The shape of coordinates_list is (N, K) where N is the number of coordinate sets and K is either 2 for centroids or 4 for bounding boxes. When using the default func=None, K should be 4, as we expect the coordinates_list to be refer to bounding boxes in [start_x, start_y, end_x, end_y] format.

  • func – The coordinate validator function. A function that takes reader and coordinate as arguments and return True or False as indication of coordinate validity.

Returns

list of flags to indicate which coordinate is valid.

Return type

ndarray

static filter_coordinates_fast(mask_reader, coordinates_list, coord_resolution, coord_units, mask_resolution=None)[source]

Validate patch extraction coordinates based on the input mask.

This function indicates which coordinate is valid for mask-based patch extraction based on checks in low resolution.

Parameters
  • mask_reader (VirtualReader) – a virtual pyramidal reader of the mask related to the WSI from which we want to extract the patches.

  • coordinates_list (ndarray and np.int32) – Coordinates to be checked via the func. They must be in the same resolution as requested resolution and units. The shape of coordinates_list is (N, K) where N is the number of coordinate sets and K is either 2 for centroids or 4 for bounding boxes. When using the default func=None, K should be 4, as we expect the coordinates_list to be refer to bounding boxes in [start_x, start_y, end_x, end_y] format.

  • coord_resolution (str) – the resolution value at which coordinates_list are generated.

  • coord_resolution – the resolution unit at which coordinates_list are generated.

  • mask_resolution (floar) – resolution at which mask array is extracted. It is supposed to be in the same units as coord_resolution i.e., coord_units. If not provided, a default value will be selected based on coord_units.

Returns

list of flags to indicate which coordinate is valid.

Return type

ndarray

static get_coordinates(image_shape=None, patch_input_shape=None, patch_output_shape=None, stride_shape=None, input_within_bound=False, output_within_bound=False)[source]

Calculate patch tiling coordinates.

Parameters
  • image_shape (a tuple (int, int) or numpy.ndarray of shape (2,)) – This argument specifies the shape of mother image (the image we want to) extract patches from) at requested resolution and units and it is expected to be in (width, height) format.

  • patch_input_shape (a tuple (int, int) – numpy.ndarray of shape (2,)): Specifies the input shape of requested patches to be extracted from mother image at desired resolution and units. This argument is also expected to be in (width, height) format.

  • patch_output_shape (a tuple (int, int) – numpy.ndarray of shape (2,)): Specifies the output shape of requested patches to be extracted from mother image at desired resolution and units. This argument is also expected to be in (width, height) format. If this is not provided, patch_output_shape will be the same as patch_input_shape.

  • stride_shape (a tuple (int, int) or numpy.ndarray of shape (2,)) – The stride that is used to calcualte the patch location during the patch extraction. If patch_output_shape is provided, next stride location will base on the output rather than the input.

  • input_within_bound (bool) – Whether to include the patches where their input location exceed the margins of mother image. If True, the patches with input location exceeds the image_shape would be neglected. Otherwise, those patches would be extracted with Reader function and appropriate padding.

  • output_within_bound (bool) – Whether to include the patches where their output location exceed the margins of mother image. If True, the patches with output location exceeds the image_shape would be neglected. Otherwise, those patches would be extracted with Reader function and appropriate padding.

Returns

a list of corrdinates in [start_x, start_y, end_x, end_y] format to be used for patch extraction.

Return type

coord_list