explabox.digestibles

Ingestibles are turned into digestibles, containing information to explore/examine/explain/expose your model.

class explabox.digestibles.Dataset(instances, labels, type='dataset', subtype=None, callargs=None, **kwargs)

Bases: MetaInfo

Digestible for dataset.

Examples

Construct a dataset with 5 instances and get instance 2 through 4:

>>> dataset = Dataset(instances, ['positive', 'negative', 'positive', 'neutral', 'positive'])
>>> dataset[2:4]

Get the first instance:

>>> dataset.head(n=1)

Randomly sample two instances:

>>> dataset.sample(n=2, seed=0)

Get all instances in the dataset labelled as ‘positive’:

>>> dataset.filter('positive')
Parameters:
  • instances (_type_) – Instances.

  • labels (Sequence[LT]) – Ground-truth labels (annotated).

  • type (str, optional) – Type description. Defaults to “dataset”.

  • subtype (Optional[str], optional) – Subtype description. Defaults to None.

  • callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.

property content

Content as dictionary.

property data

Get data property.

filter(indexer)

Filter dataset by label, filter function or boolean list/array.

Examples

Filter by label ‘positive’:

>>> dataset.filter('positive')

Filter if ‘@’ character in data:

>>> dataset.filter(lambda data, label: '@' in data)

Filter if ‘@’ character not in instance and label in (‘neutral’, ‘negative’):

>>> def filter_fn(instance):
...     return '@' not in instance['data'] and instance['label'] in (frozenset({'neutral'}), frozenset({'negative'}))
>>> dataset.filter(filter_fn)

Filter by boolean sequence (should be equal length to the number of instances):

>>> dataset.filter([True] * len(dataset))
Parameters:

indexer (Union[Callable[[dict], bool], Callable[[DT, LT], bool], Sequence[bool], LT]) – Filter to apply.

Raises:

ValueError – Boolean array should be equal length to number of instances.

Returns:

Filtered dataset.

Return type:

Dataset

get_by_index(index)

Get item(s) by integer index.

Return type:

Dataset

get_by_key(index)

Get item(s) by key.

Return type:

Dataset

head(n=10)

Get the first n elements in the dataset.

Parameters:

n (int, optional) – Number of elements >= 0. Defaults to 10.

Raises:

ValueError – n should be >= 0.

Returns:

First n elements.

Return type:

Dataset

property instances

Get instances property

property keys

Get keys property

property labels

Get labels property.

sample(n=1, seed=None)

Get a random sample of size n.

Parameters:
  • n (int, optional) – Number of elements >= 0. Defaults to 1.

  • seed (int, optional) – Seed for reproducibility; if None it takes a random seed. Defaults to None.

Raises:

ValueError – n should be >= 0.

Returns:

Random subsample.

Return type:

Dataset

tail(n=10)

Get the last n elements in the dataset.

Parameters:

n (int, optional) – Number of elements >= 0. Defaults to 10.

Raises:

ValueError – n should be >= 0.

Returns:

Last n elements.

Return type:

Dataset

class explabox.digestibles.Descriptives(labels, label_counts, tokenized_lengths, type='descriptives', callargs=None, **kwargs)

Bases: MetaInfo

Digestible for descriptive statistics.

Parameters:
  • labels (Sequence[LT]) – Names of labels.

  • label_counts (Dict[str, Dict[LT, int]]) – Counts per label per split.

  • tokenized_lengths (dict) – Descriptive statistics for lengths of tokenized instances.

  • type (str, optional) – Type description. Defaults to “descriptives”.

  • subtype (Optional[str], optional) – Subtype description. Defaults to None.

  • callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.

property content

Content as dictionary.

class explabox.digestibles.Performance(labels, metrics, type='model_performance', subtype='classification', callargs=None, **kwargs)

Bases: MetaInfo

Digestible for performance metrics.

Parameters:
  • labels (Sequence[LT]) – Names of labels.

  • metrics (dict) – Performance metrics per label.

  • type (str, optional) – Type description. Defaults to “model_performance”.

  • subtype (Optional[str], optional) – Subtype description. Defaults to None.

  • callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.

property content

Content as dictionary.

property metrics

Metrics values.

class explabox.digestibles.WronglyClassified(instances, contingency_table, type='wrongly_classified', callargs=None, **kwargs)

Bases: Instances

Digestible for wrongly classified instances

Parameters:
  • instances (_type_) – Instances.

  • contingency_table (Dict[Tuple[LT, LT], FrozenSet[KT]]) – Classification contingency table as returned from instancelib.analysis.base.contingency_table().

  • type (str, optional) – Type description. Defaults to “wrongly_classified”.

  • callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.

property content

Content as dictionary.

property wrongly_classified

Wrongly classified instances, grouped by their ground-truth value, predicted value and instances.