Read and Visualize a WSI#

Click to open in: [GitHub][Colab]

About this demo#

This demo reads a whole slide image (WSI) using tiatoolbox. We load a sample WSI, gather some key information, and then extract some image patches. We demo our modules wsireader details and slide_info details.

Reading in a WSI#

We load a small WSI, specified with the help of the path variable user_sample_wsi_path (default value None). Unless this None value is changed, the WSI is downloaded from the web, as seen in the code below, and saved with filename given by the variable sample_file_name in the directory given by data_dir. Data generated by the notebook is stored under data_dir, providing rapid local access.

data_dir = "./tmp"
sample_file_name = "sample_wsi_small.svs"

user_sample_wsi_path = None

if user_sample_wsi_path is None:
    sample_wsi_path = "%s/%s" % (data_dir, sample_file_name)
else:
    sample_wsi_path = user_sample_wsi_path
if not os.path.exists(sample_wsi_path):
    os.mkdir(data_dir)
    r = requests.get(
        " https://tiatoolbox.dcs.warwick.ac.uk/sample_wsis/CMU-1-Small-Region.svs"
    )
    with open(sample_wsi_path, "wb") as f:
        f.write(r.content)

Our code shields the user from the incompatible formats produced by different models of scanners from different vendors. The function WSIReader.open has as input a particular WSI, with a particular image format, and outputs an object wsi_reader_v1, whose base class is WSIReader, and whose derived class depends on the image format (see details). The reader wsi_reader_v1 provides important information about the WSI. Member functions obtain pixel- or patch-level information, using format-independent code, as we illustrate below.

wsi_reader_v1 = WSIReader.open(input_img=sample_wsi_path)
print(type(wsi_reader_v1))

<class 'tiatoolbox.wsicore.wsireader.OpenSlideWSIReader'>

First, let’s check the basic WSI information, such as magnification, dimension, etc. (mpp = microns per pixel).

wsi_info = wsi_reader_v1.info.as_dict()
# Print one item per line
print(*list(wsi_info.items()), sep="\n")

('objective_power', 20.0)
('slide_dimensions', (2220, 2967))
('level_count', 1)
('level_dimensions', ((2220, 2967),))
('level_downsamples', [1.0])
('vendor', 'aperio')
('mpp', (0.499, 0.499))
('file_path', PosixPath('tmp/sample_wsi_small.svs'))

Thumbnail#

To see a thumbnail of the WSI, we use the slide_thumbnail method of wsi_reader_v1. We load the thumbnail at $\times1.25$ objective power as follows:

wsi_thumb = wsi_reader_v1.slide_thumbnail(resolution=1.25, units="power")
plt.imshow(wsi_thumb)
plt.axis("off")
plt.show()

../../_images/288b00164df1a981b921aa76fdc89bd1390139fb84c0a78e58ddb3c6e6f5c0a6.png

Reading WSI regions#

i) Read Rect#

We can read a region of the whole slide image at a given location and size using the read_rect method. Location can be stated as (x, y) tuple giving the top left pixel in the baseline (level 0) reference frame, and size as (width, height) tuple giving the desired output image size.

Reads can be performed at different resolutions by supplying a pair of arguments for the resolution and the units of resolution. resolution can be given as integer or float or tuple(float), default = 0. Either a single number or a sequence of two numbers for x and y as tuple are valid. This value is in terms of the corresponding units. For example: resolution=0.5 and units=’’mpp’’ will read the slide at 0.5 microns per-pixel, and resolution=3, units=’’level’’ will read at level at pyramid level / resolution layer 3. Supported units are: microns per pixel (‘’mpp’’), objective power (‘’power’’), pyramid / resolution level (‘’level’’), pixels per baseline pixel (‘’baseline’’), default =’’level’’.

# location coordinates in (x,y)
location = (1000, 1000)

# size of the region in (width, height)
size = (256, 256)

# read the region using wsi reader's read rect at 0.5 mpp
img = wsi_reader_v1.read_rect(
    location,
    size,
    resolution=0.5,
    units="mpp",
)

plt.imshow(img)
plt.axis("off")
plt.show()

../../_images/6c8f938981cb974738ee3a8bdff4335a8279977228360f9a8cafa281fb7c01e0.png

ii) Read bound#

Read a region of the whole slide image within given bounds. Bounds is a tuple of (start_x, start_y, end_x, end_y) i.e., (left, top, right, bottom) of the region in baseline reference frame. However, with coord_space=’’resolution’’, the bound is expected to be at the requested resolution system.

Reads can be performed at different resolutions by supplying a pair of arguments for the resolution and the units of resolution. resolution can be given as integer or float or tuple(float), default = 0. Either a single number or a sequence of two numbers for x and y as tuple are valid. This value is in terms of the corresponding units. For example: resolution=0.5 and units=’’mpp’’ will read the slide at 0.5 microns per-pixel, and resolution=3, units=’’level’’ will read at level at pyramid level / resolution layer 3. Supported units are: microns per pixel (‘’mpp’’), objective power (‘’power’’), pyramid / resolution level (‘’level’’), pixels per baseline pixel (‘’baseline’’), default =’’level’’.

coord_space = ‘’baseline’’ by default, this is a flag to indicate if the input bounds is in the baseline coordinate system (‘’baseline’’) or is in the requested resolution system (‘’resolution’’).

# specify the bounds in terms of rectangle (left, top, right, bottom)
bounds = [1000, 2000, 2000, 3000]

# read the region using wsi reader's read bounds at level 0
img = wsi_reader_v1.read_bounds(bounds, resolution=0, units="level")

plt.imshow(img)
plt.axis("off")
plt.show()

../../_images/5aa932ed68e2a08244f3fbb4779a99b2222e654359eb9d8469e0cf53de88d0a0.png

A Mask#

Over the next several cells, we will use the WSIReader object to help create a collection of patches covering the tissue area, with limited overlapping. We want to be able to visualise these patches at the high resolution of the original WSI, but it is computationally more efficient to work as much as possible with the much smaller low resolution thumbnail.

The first task is to distinguish between tissue and glass (no tissue) in the image. We compute a mask, by which we mean a binary colouring of the pixels to either black=glass or white=tissue. The white area is deliberately made a little larger than tissue area, as this will be appropriate for our task. Our function simple_get_mask binarises the thumbnail on the basis of pixel intensity, using the Otsu method. Morphological operations improve the result.

import math, cv2
from skimage import morphology

# an example function using intensity thresholding and morphological operations to obtain tissue regions
def simple_get_mask(rgb):
    gray = cv2.cvtColor(rgb, cv2.COLOR_RGB2GRAY)
    _, mask = cv2.threshold(
        gray, 0, 255, cv2.THRESH_OTSU
    )  # threshold using the OTSU method
    mask = morphology.remove_small_objects(mask == 0, min_size=100, connectivity=2)
    mask = morphology.remove_small_holes(mask, area_threshold=100)
    mask = morphology.binary_dilation(mask, morphology.disk(5))
    return mask


# plot thumbnail (left) with its tissue mask (right)
wsi_thumb_mask = simple_get_mask(wsi_thumb)
plt.subplot(1, 2, 1)
plt.imshow(wsi_thumb)
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(wsi_thumb_mask, cmap="gray")
plt.axis("off")
plt.show()

../../_images/077020d5299b636556faeba457e0a93ec3535e2584cec1bef3d6025a7bc49d48.png

Extracting patches 1: Superpixels and `SLIC`#

We wish to extract patches from our WSI. We will construct a set of patches covering the tissue region, without too much overlap. This will be achieved using the SLIC algorithm, described here. Our implementation is slic from skimage. This program segments the tissue region into disjoint superpixels. A superpixel is, by definition, an irregularly shaped, connected union of pixels with common traits, the traits being chosen according to the problem. In SLIC, each superpixel consists of pixels that are near each other in the 2d plane of the image, and are also near (or as near as possible, given the various constraints) to each other in 3d colour space. Pixels cluster according to these traits, while, simultaneously, the program aims for a certain number of superpixels (specified by nr_expected_superpixels, one of the arguments to slic). The result is a collection of superpixels, each superpixel having approximately the same area. The numpy array wsi_superpixels_mask, output by slic, has the same shape as wsi_thumb_mask. The $N$ superpixels are numbered $1,2…N-1,N$. Each low resolution pixel in the $i$-th superpixel corresponds to an entry $i$ in the array, and each pixel that is not in any superpixel corresponds to a zero entry.

from scipy.ndimage.measurements import center_of_mass
from skimage.segmentation import slic
from skimage.segmentation import mark_boundaries

lores_mag = 1.25  # thumbnail objective power (lores = low resolution)
hires_mag = 20  # original WSI objective power (hires = high resolution)
hires_patch_size = 128  # 128x128 original WSI pixels
# map patch size at hires to lores
lores_patch_size = int(hires_patch_size / (hires_mag / lores_mag))
nr_expected_superpixels = math.ceil(np.sum(wsi_thumb_mask) / (lores_patch_size**2))

wsi_superpixels_mask = slic(
    wsi_thumb,
    mask=wsi_thumb_mask,
    n_segments=nr_expected_superpixels,
    compactness=1000,
    sigma=1,
)

print(
    "#Actual Patches / #Expected Patches : %d/%d"
    % (np.unique(wsi_superpixels_mask).shape[0], nr_expected_superpixels)
)

#Actual Patches / #Expected Patches : 203/202

In the image below, the position of each superpixel is made clear by marking the boundary in yellow. Each centre of mass (blue dot in the image below) of each superpixel is used as the centre of a new square patch. The algorithm encourages the centres to arrange themselves, where possible, in the pattern of a square grid, usually tilted. The patches are all the same size, and they cover the tissue area. They may overlap, but the overlap is limited. To locate these centres, we need a coordinate system. The algorithm encourages the centres to arrange themselves roughly in the pattern of a square grid, usually tilted. Instead of the familiar (x,y) coordinates, we use (row,column) coordinates because our data is held in arrays. The origin is at the top-left of the image, and the first coordinate increases as one moves down. Up to this point, we have worked at low resolution for computational efficiency. But examination of each patch in detail requires the use of high resolution data, and we change coordinates accordingly.

lores_superpixels_center = center_of_mass(
    wsi_superpixels_mask,
    labels=wsi_superpixels_mask,
    index=np.unique(wsi_superpixels_mask)[1:],
)  # omit zeros--they correspond to pixels not in any superpx
lores_superpixels_center = np.array(
    lores_superpixels_center
)  # coordinates Y,X because 2d-array uses row,col
lores_superpixels_center = lores_superpixels_center.astype(np.int32)
selected_indices = wsi_thumb_mask[
    lores_superpixels_center[:, 0], lores_superpixels_center[:, 1]
]
lores_superpixels_center = lores_superpixels_center[selected_indices]

# show the patches region and their centres of mass
plt.imshow(mark_boundaries(wsi_thumb, wsi_superpixels_mask))
plt.scatter(lores_superpixels_center[:, 1], lores_superpixels_center[:, 0], s=2)
plt.axis("off")
plt.show()

../../_images/4ba0119cb8466a193968d2316b63330ca2cbed5f71de40d5d2f5640eaa05ed29.png

We then convert the centres of each region to the top-left position of the patches at high resolution.

# convert to top left idx at hires_mag level
lores_superpixels_top_left = lores_superpixels_center - (lores_patch_size // 2)
hires_superpixels_top_left = lores_superpixels_top_left * (hires_mag / lores_mag)
hires_superpixels_top_left = hires_superpixels_top_left.astype(np.int32)

We will now load some patches for visualisation.

nr_viz_patches = 16

# for illustration purpose, we only visualise a small number of examples
selected_indices = np.random.randint(
    0, hires_superpixels_top_left.shape[0], size=(4 * nr_viz_patches,)
)
hires_superpixel_top_left = hires_superpixels_top_left[selected_indices]

patch_list = []
for patch_coord in hires_superpixels_top_left:
    patch = wsi_reader_v1.read_region(
        location=patch_coord[::-1], level=0, size=hires_patch_size
    )
    patch_list.append(patch)

# plot the first 16
sub_patches = np.array(patch_list[:16])
sub_patches = np.reshape(sub_patches, (4, 4, hires_patch_size, hires_patch_size, 3))
sub_patches = np.transpose(sub_patches, (0, 2, 1, 3, 4))
sub_patches = np.reshape(sub_patches, (4 * hires_patch_size, 4 * hires_patch_size, 3))
plt.imshow(sub_patches)
plt.axis("off")
plt.show()

../../_images/478473a545a5eaaf71eaaf2ecb76f9d6f838949ff32f9d65f386b876cb61f7c0.png

If you want to extract the entire WSI (including the background), you can use the built-in save_tiles functionality of the WSIReader object.

We start by creating another WSIReader object and then save tiles using the save_tiles function with the keywords tile_objective_value, tile_read_size and output_dir. These correspond to the magnification set for reading patches, their expected width & height and the output destination. The word tile refers to a very large image patch. Here, tiles are read at 20x objective magnification, each of size $1000\times1000$, and will be saved in the data_dir.

# create a file handler
wsi_reader_v2 = WSIReader.open(input_img=sample_wsi_path)
wsi_reader_v2.save_tiles(
    output_dir=data_dir + "/output/",
    tile_objective_value=20,
    tile_read_size=(1000, 1000),
)

Tile0:  start_w:0, end_w:1000, start_h:0, end_h:1000, width:1000, height:1000
Tile1:  start_w:1000, end_w:2000, start_h:0, end_h:1000, width:1000, height:1000
Tile2:  start_w:2000, end_w:2220, start_h:0, end_h:1000, width:220, height:1000
Tile3:  start_w:0, end_w:1000, start_h:1000, end_h:2000, width:1000, height:1000
Tile4:  start_w:1000, end_w:2000, start_h:1000, end_h:2000, width:1000, height:1000
Tile5:  start_w:2000, end_w:2220, start_h:1000, end_h:2000, width:220, height:1000
Tile6:  start_w:0, end_w:1000, start_h:2000, end_h:2967, width:1000, height:967
Tile7:  start_w:1000, end_w:2000, start_h:2000, end_h:2967, width:1000, height:967
Tile8:  start_w:2000, end_w:2220, start_h:2000, end_h:2967, width:220, height:967

We check the content of the output folder and visualize some tiles. The extracted tiles are saved in the directory {data_dir}/output/{sample_file_name}. The folder contains Output.csv summarizing the extracted tiles.

import pandas as pd

tile_summary = pd.read_csv(
    "{data_dir}/output/{sample_file_name}/Output.csv".format(
        data_dir=data_dir, sample_file_name=sample_file_name
    )
)
print(tile_summary)

   iter              Tile_Name  start_w  end_w  start_h  end_h  size_w  size_h
   0        Tile_20_0_0.jpg        0   1000        0   1000    1000    1000
   1     Tile_20_1000_0.jpg     1000   2000        0   1000    1000    1000
   2     Tile_20_2000_0.jpg     2000   2220        0   1000    1000     220
   3     Tile_20_0_1000.jpg        0   1000     1000   2000    1000    1000
   4  Tile_20_1000_1000.jpg     1000   2000     1000   2000    1000    1000
   5  Tile_20_2000_1000.jpg     2000   2220     1000   2000    1000     220
   6     Tile_20_0_2000.jpg        0   1000     2000   2967     967    1000
   7  Tile_20_1000_2000.jpg     1000   2000     2000   2967     967    1000
   8  Tile_20_2000_2000.jpg     2000   2220     2000   2967     967     220

We plot Tile_20_1000_1000.jpg as an example.

sample_tile = cv2.imread(
    "%s/output/%s/%s" % (data_dir, sample_file_name, tile_summary.iloc[4]["Tile_Name"])
)
sample_tile = cv2.cvtColor(sample_tile, cv2.COLOR_BGR2RGB)

plt.imshow(sample_tile)
plt.axis("off")
plt.show()

../../_images/d368c6b91637d42c49b31111e5b613b045f2d72899928f72fdeb0ffef0f5782e.png

If you think that patch extraction is too complex a task, don’t worry. There are some powerful tools in the Tiatoolbox that can help you automatically extract patches from WSIs at your desired resolution, size, and stride with only one line of code. Please refer to 04_example_patchextraction.ipynb example notebook for more information.

jp2 format#

The base class WSIReader also contains a derived class for WSIs in jp2 format (see above for more details).We will briefly illustrate this by downloading a jp2 WSI from the internet and visualizing its thumbnail.

sample_wsi_path = "%s/sample_jp2.jp2" % data_dir
if not os.path.exists(sample_wsi_path):
    r = requests.get("https://tiatoolbox.dcs.warwick.ac.uk/sample_wsis/test1.jp2")
    with open(sample_wsi_path, "wb") as f:
        f.write(r.content)

wsi_reader_v3 = WSIReader.open(input_img=sample_wsi_path)
print(type(wsi_reader_v3))
wsi_thumb = wsi_reader_v3.slide_thumbnail(resolution=1.25, units="power")
plt.imshow(wsi_thumb)
plt.axis("off")
plt.show()

<class 'tiatoolbox.wsicore.wsireader.OmnyxJP2WSIReader'>

../../_images/804ad042dc92408a0dba883f6417dd84dc066f9fba2c7936977a2a8f578dd3fb.png