AnnotationStore

class AnnotationStore(*args, **kwargs)[source]

Annotation store abstract base class.

Return an instance of a subclass of AnnotationStore.

Methods

add_from_geojson

Add annotations from a .geojson file to an existing store.

append

Insert a new annotation, returning the key.

append_many

Bulk append of annotations.

bquery

Query the store for annotation bounding boxes.

clear

Remove all annotations from the store.

commit

Commit any in-memory changes to disk.

deserialize_geometry

Deserialize a geometry from a string or bytes.

dump

Serialise a copy of the whole store to a file-like object.

dumps

Serialise and return a copy of store as a string or bytes.

features

Return annotations as a list of geoJSON features.

from_dataframe

Converts to AnnotationStore from pandas.DataFrame.

from_geojson

Create a new database with annotations loaded from a geoJSON file.

from_ndjson

Load annotations from NDJSON.

iquery

Query the store for annotation keys.

keys

Return an iterable (usually generator) of all keys in the store.

nquery

Query for annotations within a distance of another annotation.

open

Load a store object from a path or file-like object.

patch

Patch an annotation at given key.

patch_many

Bulk patch of annotations.

pquery

Query the store for annotation properties.

query

Query the store for annotations.

remove

Remove annotation from the store with its unique key.

remove_many

Bulk removal of annotations by keys.

serialise_geometry

Serialise a geometry to a string or bytes.

setdefault

Return the value of the annotation with the given key.

to_dataframe

Converts AnnotationStore to pandas.DataFrame.

to_geodict

Return annotations as a dictionary in geoJSON format.

to_geojson

Serialise the store to geoJSON.

to_ndjson

Serialise to New Line Delimited JSON.

transform

Transform all annotations in the store using provided function.

values

Return an iterable of all annotation in the store.

Parameters:
Return type:

ABC

add_from_geojson(fp, scale_factor=(1, 1), origin=(0, 0), transform=None)[source]

Add annotations from a .geojson file to an existing store.

Make the best effort to create valid shapely geometries from provided contours.

Parameters:
  • fp (Union[IO, str, Path]) – The file path or handle to load from.

  • scale_factor (float) – The scale factor to use when loading the annotations. All coordinates will be multiplied by this factor to allow import of annotations saved at non-baseline resolution.

  • origin (Tuple[float, float]) – The x and y coordinates to use as the origin for the annotations.

  • transform (Callable) – A function to apply to each annotation after loading. Should take an annotation as input and return an annotation. Defaults to None. Intended to facilitate modifying the way annotations are loaded to accommodate the specifics of different annotation formats.

  • self (AnnotationStore)

Return type:

None

append(annotation, key=None)[source]

Insert a new annotation, returning the key.

Parameters:
  • annotation (Annotation) – The shapely annotation to insert.

  • key (str) – Optional. The unique key used to identify the annotation in the store. If not given a new UUID4 will be generated and returned instead.

  • self (AnnotationStore)

Returns:

The unique key of the newly inserted annotation.

Return type:

str

append_many(annotations, keys=None)[source]

Bulk append of annotations.

This may be more performant than repeated calls to append.

Parameters:
  • annotations (iter(Annotation)) – An iterable of annotations.

  • keys (iter(str)) – An iterable of unique keys associated with each geometry being inserted. If None, a new UUID4 is generated for each geometry.

  • self (AnnotationStore)

Returns:

A list of unique keys for the inserted geometries.

Return type:

list(str)

bquery(geometry=None, where=None)[source]

Query the store for annotation bounding boxes.

Acts similarly to AnnotationStore.query except it checks for intersection between stored and query geometry bounding boxes. This may be faster than a regular query in some cases, e.g. for SQliteStore with a large number of annotations.

Note that this method only checks for bounding box intersection and therefore may give a different result to using AnnotationStore.query with a box polygon and the “intersects” geometry predicate. Also note that geometry predicates are not supported for this method.

Parameters:
  • geometry (Geometry or Iterable) – Geometry to use when querying. This can be a bounds (iterable of length 4) or a Shapely geometry (e.g. Polygon). If a geometry is provided, the bounds of the geometry will be used for the query. Full geometry intersection is not used for the query method.

  • where (str or bytes or Callable) – A statement which should evaluate to a boolean value. Only annotations for which this predicate is true will be returned. Defaults to None (assume always true). This may be a string, Callable, or pickled function as bytes. Callables are called to filter each result returned from the annotation store backend in python before being returned to the user. A pickle object is, where possible, hooked into the backend as a user defined function to filter results during the backend query. Strings are expected to be in a domain specific language and are converted to SQL on a best-effort basis. For supported operators of the DSL see tiatoolbox.annotation.dsl. E.g. a simple python expression props[“class”] == 42 will be converted to a valid SQLite predicate when using SQLiteStore and inserted into the SQL query. This should be faster than filtering in python after or during the query. Additionally, the same string can be used across different backends (e.g. the previous example predicate string is valid for both DictionaryStore and a SQliteStore). On the other hand it has many more limitations. It is important to note that untrusted user input should never be accepted to this argument as arbitrary code can be run via pickle or the parsing of the string statement.

  • self (AnnotationStore)

Returns:

A list of bounding boxes for each Annotation.

Return type:

list

Example

>>> from tiatoolbox.annotation.storage import DictionaryStore
>>> from shapely.geometry import Polygon
>>> store = DictionaryStore()
>>> store.append(
...     Annotation(
...         geometry=Polygon.from_bounds(0, 0, 1, 1),
...         properties={"class": 42},
...     ),
...     key="foo",
... )
>>> store.bquery(where="props['class'] == 42")
{'foo': (0.0, 0.0, 1.0, 1.0)}
clear()[source]

Remove all annotations from the store.

This is a naive implementation, it simply iterates over all annotations and removes them. Faster implementations may be possible in specific cases and may be implemented by subclasses.

Parameters:

self (AnnotationStore)

Return type:

None

abstract commit()[source]

Commit any in-memory changes to disk.

Parameters:

self (AnnotationStore)

Return type:

None

static deserialize_geometry(data)[source]

Deserialize a geometry from a string or bytes.

This default implementation will deserialize bytes as well-known binary (WKB) and strings as well-known text (WKT). This can be overridden to deserialize other formats such as geoJSON etc.

Parameters:

data (bytes or str) – The serialised representation of a Shapely geometry.

Returns:

The deserialized Shapely geometry.

Return type:

Geometry

abstract dump(fp)[source]

Serialise a copy of the whole store to a file-like object.

Parameters:
  • fp (Path or str or IO) – A file path or file handle object for output to disk.

  • self (AnnotationStore)

Return type:

None

abstract dumps()[source]

Serialise and return a copy of store as a string or bytes.

Returns:

The serialised store.

Return type:

str or bytes

Parameters:

self (AnnotationStore)

features()[source]

Return annotations as a list of geoJSON features.

Returns:

List of features as dictionaries.

Return type:

list

Parameters:

self (AnnotationStore)

classmethod from_dataframe(df)[source]

Converts to AnnotationStore from pandas.DataFrame.

Parameters:

df (DataFrame)

Return type:

AnnotationStore

classmethod from_geojson(fp, scale_factor=(1, 1), origin=(0, 0), transform=None)[source]

Create a new database with annotations loaded from a geoJSON file.

Parameters:
  • fp (Union[IO, str, Path]) – The file path or handle to load from.

  • scale_factor (Tuple[float, float]) – The scale factor in each dimension to use when loading the annotations. All coordinates will be multiplied by this factor to allow import of annotations saved at non-baseline resolution.

  • origin (Tuple[float, float]) – The x and y coordinates to use as the origin for the annotations.

  • transform (Callable) – A function to apply to each annotation after loading. Should take an annotation as input and return an annotation. Defaults to None. Intended to facilitate modifying the way annotations are loaded to accomodate the specifics of different annotation formats.

Returns:

A new annotation store with the annotations loaded from the file.

Return type:

AnnotationStore

Example

To load annotations from a GeoJSON exported by QuPath, with measurements stored in a ‘measurements’ property as a list of name-value pairs, and unpack those measurements into a flat dictionary of properties of each annotation: >>> from tiatoolbox.annotation.storage import SQLiteStore >>> def unpack_qupath(ann: Annotation) -> Annotation: >>> #Helper function to unpack QuPath measurements. >>> props = ann.properties >>> measurements = props.pop(“measurements”) >>> for m in measurements: >>> props[m[“name”]] = m[“value”] >>> return ann >>> store = SQLiteStore.from_geojson( … “exported_file.geojson”, … transform=unpack_qupath, … )

classmethod from_ndjson(fp)[source]

Load annotations from NDJSON.

Expects each line to be a JSON object with the following format:

{
     "key": "...",
     "geometry": {
         "type": "...",
         "coordinates": [...]
     },
     "properties": {
         "...": "..."
     }
}

That is a geoJSON object with an additional key field. If this key field is missing, then a new UUID4 key will be generated for this annotation.

Parameters:

fp (IO) – A file-like object supporting .read.

Returns:

The loaded annotations.

Return type:

AnnotationStore

iquery(geometry, where=None, geometry_predicate='intersects')[source]

Query the store for annotation keys.

Acts the same as AnnotationStore.query except returns keys instead of annotations.

Parameters:
  • geometry (Geometry or Iterable) – Geometry to use when querying. This can be a bounds (iterable of length 4) or a Shapely geometry (e.g. Polygon).

  • where (str or bytes or Callable) – A statement which should evaluate to a boolean value. Only annotations for which this predicate is true will be returned. Defaults to None (assume always true). This may be a string, Callable, or pickled function as bytes. Callables are called to filter each result returned from the annotation store backend in python before being returned to the user. A pickle object is, where possible, hooked into the backend as a user defined function to filter results during the backend query. Strings are expected to be in a domain specific language and are converted to SQL on a best-effort basis. For supported operators of the DSL see tiatoolbox.annotation.dsl. E.g. a simple python expression props[“class”] == 42 will be converted to a valid SQLite predicate when using SQLiteStore and inserted into the SQL query. This should be faster than filtering in python after or during the query. Additionally, the same string can be used across different backends (e.g. the previous example predicate string is valid for both DictionaryStore `and a `SQliteStore). On the other hand it has many more limitations. It is important to note that untrusted user input should never be accepted to this argument as arbitrary code can be run via pickle or the parsing of the string statement.

  • geometry_predicate (str) – A string which define which binary geometry predicate to use when comparing the query geometry and a geometry in the store. Only annotations for which this binary predicate is true will be returned. Defaults to “intersects”. For more information see the shapely documentation on binary predicates.

  • self (AnnotationStore)

Returns:

A list of keys for each Annotation.

Return type:

list

keys()[source]

Return an iterable (usually generator) of all keys in the store.

Returns:

An iterable of keys.

Return type:

Iterable[str]

Parameters:

self (AnnotationStore)

nquery(geometry=None, where=None, n_where=None, distance=5.0, geometry_predicate='intersects', mode='poly-poly')[source]

Query for annotations within a distance of another annotation.

Parameters:
  • geometry (Geometry) – A geometry to use to query for the initial set of annotations to perform a neighbourhood search around. If None, all annotations in the store are considered. Defaults to None.

  • where (str or bytes or Callable) – A statement which should evaluate to a boolean value. Only annotations for which this predicate is true will be returned. Defaults to None (assume always true). This may be a string, Callable, or pickled function as bytes. Callables are called to filter each result returned the annotation store backend in python before being returned to the user. A pickle object is, where possible, hooked into the backend as a user defined function to filter results during the backend query. Strings are expected to be in a domain specific language and are converted to SQL on a best-effort basis. For supported operators of the DSL see tiatoolbox.annotation.dsl. E.g. a simple python expression props[“class”] == 42 will be converted to a valid SQLite predicate when using SQLiteStore and inserted into the SQL query. This should be faster than filtering in python after or during the query. It is important to note that untrusted user input should never be accepted to this argument as arbitrary code can be run via pickle or the parsing of the string statement.

  • n_where (str or bytes or Callable) – Predicate to filter the nearest annotations by. Defaults to None (assume always true). See where for more details.

  • distance (float) – The distance to search for annotations within. Defaults to 5.0.

  • geometry_predicate (str) – The predicate to use when comparing geometries. Defaults to “intersects”. Other options include “within” and “contains”. Ignored if mode is “boxpoint-boxpoint” or “box-box”.

  • mode (tuple[str, str] or str) –

    The method to use for determining distance during the query. Defaults to “box-box”. This may significantly change performance depending on the backend. Possible options are:

    • ”poly-poly”: Polygon boundary to polygon boundary.

    • ”boxpoint-boxpoint”: Bounding box centre point to bounding box centre point.

    • ”box-box”: Bounding box to bounding box.

    May be specified as a dash separated string or a tuple of two strings. The first string is the mode for the query geometry and the second string is the mode for the nearest annotation geometry.

  • self (AnnotationStore)

Returns:

A dictionary mapping annotation keys to another dictionary which represents an annotation key and all annotations within distance of it.

Return type:

Dict[str, Dict[str, Annotation]]

The mode argument is used to determine how to calculate the distance between annotations. The default mode is “box-box”.

The “box-box” mode uses the bounding boxes of stored annotations and the query geometry when determining if annotations are within the neighbourhood.

"box-box" mode

The “poly-poly” performs full polygon-polygon intersection with the polygon boundary of stored annotations and the query geometry to determine if annotations are within the neighbourhood.

"poly-poly" mode

The “boxpoint-boxpoint” mode uses the centre point of the bounding box of stored annotations and the query geometry when determining if annotations are within the neighbourhood.

"boxpoint-boxpoint" mode

Examples

Example bounding box query with one neighbour within a distance of 2.0.

>>> from shapely.geometry import Point, Polygon
>>> from tiatoolbox.annotation.storage import Annotation, SQLiteStore
>>> store = SQLiteStore()
>>> annotation = Annotation(Point(0, 0), {"class": 42})
>>> store.append(annotation, "foo")
>>> neighbour = Annotation(Point(1, 1), {"class": 123})
>>> store.add(neighbour, "bar")
>>> store.nquery((-.5, -.5, .5, .5), distance=2.0)
{
  "foo": {
    Annotation(POINT (0 0), {'class': 42}): {
      "bar": Annotation(POINT (1 1), {'class': 123}),
    }
  },
}

Example bounding box query with no neighbours within a distance of 1.0.

>>> from shapely.geometry import Point
>>> from tiatoolbox.annotation.storage import Annotation, SQLiteStore
>>> store = SQLiteStore()
>>> annotation = Annotation(Point(0, 0), {"class": 42})
>>> store.add(annotation, "foo")
>>> store.nquery((-.5, -.5, .5, .5), distance=1.0)
{"foo": {Annotation(POINT (0 0), {'class': 42}): {}}}

Example of querying for TILs - lympocytes within 3 units of tumour cells.

>>> from tiatoolbox.annotation.storage import SQLiteStore
>>> store = SQLiteStore("hovernet-pannuke-output.db")
>>> tils = store.nquery(
...     where="props['class'] == 1",   # Tumour cells
...     n_where="props['class'] == 0",  # Lymphocytes
...     distance=32.0,  # n_where within 32 units of where
...     mode="point-point",  # Use point to point distance
... )
abstract classmethod open(fp)[source]

Load a store object from a path or file-like object.

Parameters:

fp (Path or str or IO) – The file path or file handle.

Returns:

An instance of an annotation store.

Return type:

AnnotationStoreABC

patch(key, geometry=None, properties=None)[source]

Patch an annotation at given key.

Partial update of an annotation. Providing only a geometry will update the geometry and leave properties unchanged. Providing a properties dictionary applies a patch operation to the properties. Only updating the properties which are given and leaving the rest unchanged. To completely replace an annotation use __setitem__.

Parameters:
  • key (str) – The key of the annotation to update.

  • geometry (Geometry) – The new geometry. If None, the geometry is not updated.

  • properties (dict) – A dictionary of properties to patch and their new values. If None, the existing properties are not altered.

  • self (AnnotationStore)

Return type:

None

patch_many(keys, geometries=None, properties_iter=None)[source]

Bulk patch of annotations.

This may be more efficient than calling patch repeatedly in a loop.

Parameters:
  • geometries (iter(Geometry)) – An iterable of geometries to update.

  • properties_iter (iter(dict)) – An iterable of properties to update.

  • keys (iter(str)) – An iterable of keys for each annotation to be updated.

  • self (AnnotationStore)

Return type:

None

pquery(select, geometry=None, where=None, *, unique=True, squeeze=True)[source]

Query the store for annotation properties.

Acts similarly to AnnotationStore.query but returns only the value defined by select.

Parameters:
  • select (str or bytes or Callable) – A statement defining the value to look up from the annotation properties. If select = “*”, all properties are returned for each annotation (unique must be False).

  • geometry (Geometry or Iterable) – Geometry to use when querying. This can be a bounds (iterable of length 4) or a Shapely geometry (e.g. Polygon). If a geometry is provided, the bounds of the geometry will be used for the query. Full geometry intersection is not used for the query method.

  • where (str or bytes or Callable) – A statement which should evaluate to a boolean value. Only annotations for which this predicate is true will be returned. Defaults to None (assume always true). This may be a string, Callable, or pickled function as bytes. Callables are called to filter each result returned from the annotation store backend in python before being returned to the user. A pickle object is, where possible, hooked into the backend as a user defined function to filter results during the backend query. Strings are expected to be in a domain specific language and are converted to SQL on a best-effort basis. For supported operators of the DSL see tiatoolbox.annotation.dsl. E.g. a simple python expression props[“class”] == 42 will be converted to a valid SQLite predicate when using SQLiteStore and inserted into the SQL query. This should be faster than filtering in python after or during the query. It is important to note that untrusted user input should never be accepted to this argument as arbitrary code can be run via pickle or the parsing of the string statement.

  • unique (bool) – If True, only unique values for each selected property will be returned as a list of sets. If False, all values will be returned as a dictionary mapping keys values. Defaults to True.

  • squeeze (bool) – If True, when querying for a single value with unique=True, the result will be a single set instead of a list of sets.

  • self (AnnotationStore)

Return type:

dict[str, object] | set[object]

Examples

>>> from tiatoolbox.annotation.storage import DictionaryStore
>>> from shapely.geometry import Point
>>> store = DictionaryStore()
>>> annotation =  Annotation(
...     geometry=Point(0, 0),
...     properties={"class": 42},
... )
>>> store.append(annotation, "foo")
>>> store.pquery("*", unique=False)
... {'foo': {'class': 42}}
>>> from tiatoolbox.annotation.storage import DictionaryStore
>>> from shapely.geometry import Point
>>> store = DictionaryStore()
>>> annotation =  Annotation(
...     geometry=Point(0, 0),
...     properties={"class": 42},
... )
>>> store.append(annotation, "foo")
>>> store.pquery("props['class']")
... {42}
>>> annotation =  Annotation(Point(1, 1), {"class": 123})
>>> store.append(annotation, "foo")
>>> store.pquery("props['class']")
... {42, 123}
query(geometry=None, where=None, geometry_predicate='intersects', min_area=None, distance=0)[source]

Query the store for annotations.

Parameters:
  • geometry (Geometry or Iterable) – Geometry to use when querying. This can be a bounds (iterable of length 4) or a Shapely geometry (e.g. Polygon).

  • where (str or bytes or Callable) – A statement which should evaluate to a boolean value. Only annotations for which this predicate is true will be returned. Defaults to None (assume always true). This may be a string, Callable, or pickled function as bytes. Callables are called to filter each result returned from the annotation store backend in python before being returned to the user. A pickle object is, where possible, hooked into the backend as a user defined function to filter results during the backend query. Strings are expected to be in a domain specific language and are converted to SQL on a best-effort basis. For supported operators of the DSL see tiatoolbox.annotation.dsl. E.g. a simple python expression props[“class”] == 42 will be converted to a valid SQLite predicate when using SQLiteStore and inserted into the SQL query. This should be faster than filtering in python after or during the query. Additionally, the same string can be used across different backends (e.g. the previous example predicate string is valid for both DictionaryStore `and a `SQliteStore). On the other hand it has many more limitations. It is important to note that untrusted user input should never be accepted to this argument as arbitrary code can be run via pickle or the parsing of the string statement.

  • geometry_predicate (str) –

    A string defining which binary geometry predicate to use when comparing the query geometry and a geometry in the store. Only annotations for which this binary predicate is true will be returned. Defaults to “intersects”. For more information see the shapely documentation on binary predicates.

  • min_area (float) – Minimum area of the annotation geometry. Only annotations with an area greater than or equal to this value will be returned. Defaults to None (no min).

  • distance (float) – Distance used when performing a distance based query. E.g. “centers_within_k” geometry predicate.

  • self (AnnotationStore)

Returns:

A list of Annotation objects.

Return type:

list

remove(key)[source]

Remove annotation from the store with its unique key.

Parameters:
Return type:

None

remove_many(keys)[source]

Bulk removal of annotations by keys.

Parameters:
  • keys (iter(str)) – An iterable of keys for the annotation to be removed.

  • self (AnnotationStore)

Return type:

None

static serialise_geometry(geometry)[source]

Serialise a geometry to a string or bytes.

This defaults to well-known text (WKT) but may be overridden to any other format which a Shapely geometry could be serialised to e.g. well-known binary (WKB) or geoJSON etc.

Parameters:

geometry (Geometry) – The Shapely geometry to be serialised.

Returns:

The serialised geometry.

Return type:

bytes or str

setdefault(key, default=None)[source]

Return the value of the annotation with the given key.

If the key does not exist, insert the default value and return it.

Parameters:
  • key (str) – The key of the annotation to be fetched.

  • default (Annotation) – The value to return if the key is not found.

  • self (AnnotationStore)

Returns:

The annotation or default if the key is not found.

Return type:

Annotation

to_dataframe()[source]

Converts AnnotationStore to pandas.DataFrame.

Parameters:

self (AnnotationStore)

Return type:

DataFrame

to_geodict()[source]

Return annotations as a dictionary in geoJSON format.

Returns:

Dictionary of annotations in geoJSON format.

Return type:

dict

Parameters:

self (AnnotationStore)

to_geojson(fp=None)[source]

Serialise the store to geoJSON.

For more information on the geoJSON format see: - https://geojson.org/ - https://tools.ietf.org/html/rfc7946

Parameters:
  • fp (IO) – A file-like object supporting .read. Defaults to None which returns geoJSON as a string.

  • self (AnnotationStore)

Returns:

None if writing to file or the geoJSON string if fp is None.

Return type:

Optional[str]

to_ndjson(fp=None)[source]

Serialise to New Line Delimited JSON.

Each line contains a JSON object with the following format:

{
     "key": "...",
     "geometry": {
         "type": "...",
         "coordinates": [...]
     },
     "properties": {
         "...": "..."
     }
}

That is a geoJSON object with an additional key field.

For more information on the NDJSON format see: - ndjson Specification: http://ndjson.org - JSON Lines Documentation: https://jsonlines.org - Streaming JSON: https://w.wiki/4Qan - GeoJSON RFC: https://tools.ietf.org/html/rfc7946 - JSON RFC: https://tools.ietf.org/html/rfc7159

Parameters:
  • fp (IO) – A file-like object supporting .read. Defaults to None which returns geoJSON as a string.

  • self (AnnotationStore)

Returns:

None if writing to file or the geoJSON string if`fp` is None.

Return type:

Optional[str]

transform(transform)[source]

Transform all annotations in the store using provided function.

Useful for transforming coordinates from slide space into patch/tile/core space, or to a different resolution, for example.

Parameters:
  • transform (Callable[Geometry, Geometry]) – A function that takes a geometry and returns a new transformed geometry.

  • self (AnnotationStore)

Return type:

None

values()[source]

Return an iterable of all annotation in the store.

Returns:

An iterable of annotations.

Return type:

Iterable[Annotation]

Parameters:

self (AnnotationStore)