PatchExtractor¶
- class PatchExtractor(input_img, patch_size, input_mask=None, resolution=0, units='level', pad_mode='constant', pad_constant_values=0, min_mask_ratio=0, store_filter=None, *, within_bound=False)[source]¶
Class for extracting and merging patches in standard and whole-slide images.
- Parameters:
input_img (str, Path,
numpy.ndarray
,WSIReader
) – Input image for patch extraction.patch_size (int or tuple(int)) – Patch size tuple (width, height).
input_mask (str | Path | np.ndarray | wsireader.VirtualWSIReader | AnnotationStore | None) – (str, pathlib.Path,
numpy.ndarray
, orVirtualWSIReader
): Input mask that is used for position filtering when extracting patches i.e., patches will only be extracted based on the highlighted regions in the input_mask. input_mask can be either path to the mask, a numpy array,VirtualWSIReader
, or one of ‘otsu’ and ‘morphological’ options. In case of ‘otsu’ or ‘morphological’, a tissue mask is generated for the input_image using tiatoolboxTissueMasker
functionality. May also be an annotation store, in which case the mask is generated based on the annotations. All annotations are used by default; the ‘store_filter’ argument can be used to specify a filter for a subset of annotations to use to build the mask.resolution (Resolution) – Resolution at which to read the image, default = 0. Either a single number or a sequence of two numbers for x and y are valid. This value is in terms of the corresponding units. For example: resolution=0.5 and units=”mpp” will read the slide at 0.5 microns per-pixel, and resolution=3, units=”level” will read at level at pyramid level / resolution layer 3.
units (Units) – Units of resolution, default = “level”.
pad_mode (str) – Method for padding at edges of the WSI. Default to ‘constant’. See
numpy.pad()
for more information.pad_constant_values (int or tuple(int)) – Values to use with constant padding. Defaults to 0. See
numpy.pad()
for more.within_bound (bool) – Whether to extract patches beyond the input_image size limits. If False, extracted patches at margins will be padded appropriately based on pad_constant_values and pad_mode. If True, patches at the margins whose bounds would exceed the mother image dimensions would be neglected. Default is False.
min_mask_ratio (float) – Area in percentage that a patch needs to contain of positive mask to be included. Defaults to 0.
store_filter (str) – Filter to apply to the annotations when generating the mask. Default is None, which uses all annotations. Only used if the provided mask is an annotation store.
- resolution¶
Resolution at which to read the image.
- Type:
Resolution
- units¶
Units of resolution.
- Type:
Units
- locations_df¶
A table containing location and/or type of patches in (x_start, y_start, class) format.
- Type:
pd.DataFrame
- coordinate_list¶
An array containing coordinates of patches in (x_start, y_start, x_end, y_end) format to be used for slidingwindow patch extraction.
- Type:
- pad_mode¶
Method for padding at edges of the WSI. See
numpy.pad()
for more information.- Type:
- pad_constant_values¶
Values to use with constant padding. Defaults to 0. See
numpy.pad()
for more.
- stride¶
Stride in (x, y) direction for patch extraction. Not used for
PointsPatchExtractor
- min_mask_ratio¶
Only patches with positive area percentage above this value are included
- Type:
Initialize
PatchExtractor
.Methods
Validate patch extraction coordinates based on the input mask.
Calculate patch tiling coordinates.
- static filter_coordinates(mask_reader, coordinates_list, wsi_shape, min_mask_ratio=0, func=None)[source]¶
Validate patch extraction coordinates based on the input mask.
This function indicates which coordinate is valid for mask-based patch extraction based on checks in low resolution.
- Parameters:
mask_reader (
VirtualReader
) – A virtual pyramidal reader of the mask related to the WSI from which we want to extract the patches.coordinates_list (ndarray and np.int32) – Coordinates to be checked via the func. They must be at the same resolution as requested resolution and units. The shape of coordinates_list is (N, K) where N is the number of coordinate sets and K is either 2 for centroids or 4 for bounding boxes. When using the default func=None, K should be 4, as we expect the coordinates_list to be bounding boxes in [start_x, start_y, end_x, end_y] format.
wsi_shape (tuple(int, int)) – Shape of the WSI in the requested resolution and units.
min_mask_ratio (float) – Only patches with positive area percentage above this value are included. Defaults to 0. Has no effect if func is not None.
func (callable) – Function to be used to validate the coordinates. The function must take a numpy.ndarray of the mask and a numpy.ndarray of the coordinates as input and return a bool indicating whether the coordinate is valid or not. If None, a default function that accepts patches with positive area proportion above min_mask_ratio is used.
- Returns:
list of flags to indicate which coordinate is valid.
- Return type:
- static get_coordinates(patch_output_shape: None = None, image_shape: tuple[int, int] | ndarray | None = None, patch_input_shape: tuple[int, int] | ndarray | None = None, stride_shape: tuple[int, int] | ndarray | None = None, *, input_within_bound: bool = False, output_within_bound: bool = False) ndarray [source]¶
- static get_coordinates(patch_output_shape: tuple[int, int] | ndarray, image_shape: tuple[int, int] | ndarray | None = None, patch_input_shape: tuple[int, int] | ndarray | None = None, stride_shape: tuple[int, int] | ndarray | None = None, *, input_within_bound: bool = False, output_within_bound: bool = False) tuple[ndarray, ndarray]
Calculate patch tiling coordinates.
- Parameters:
image_shape (tuple (int, int) or
numpy.ndarray
) – This argument specifies the shape of mother image (the image we want to extract patches from) at requested resolution and units and it is expected to be in (width, height) format.patch_input_shape (tuple (int, int) or
numpy.ndarray
) – Specifies the input shape of requested patches to be extracted from mother image at desired resolution and units. This argument is also expected to be in (width, height) format.patch_output_shape (tuple (int, int) or
numpy.ndarray
) – Specifies the output shape of requested patches to be extracted from mother image at desired resolution and units. This argument is also expected to be in (width, height) format. If this is not provided, patch_output_shape will be the same as patch_input_shape.stride_shape (tuple (int, int) or
numpy.ndarray
) – The stride that is used to calculate the patch location during the patch extraction. If patch_output_shape is provided, next stride location will base on the output rather than the input.input_within_bound (bool) – Whether to include the patches where their input location exceed the margins of mother image. If True, the patches with input location exceeds the image_shape would be neglected. Otherwise, those patches would be extracted with Reader function and appropriate padding.
output_within_bound (bool) – Whether to include the patches where their output location exceed the margins of mother image. If True, the patches with output location exceeds the image_shape would be neglected. Otherwise, those patches would be extracted with Reader function and appropriate padding.
- Returns:
A list of coordinates in [start_x, start_y, end_x, end_y] format to be used for patch extraction.
- Return type:
coord_list