explabox.digestibles
Ingestibles are turned into digestibles, containing information to explore/examine/explain/expose your model.
- class explabox.digestibles.Dataset(instances, labels, type='dataset', subtype=None, callargs=None, **kwargs)
Bases:
MetaInfo
Digestible for dataset.
Examples
Construct a dataset with 5 instances and get instance 2 through 4:
>>> dataset = Dataset(instances, ['positive', 'negative', 'positive', 'neutral', 'positive']) >>> dataset[2:4]
Get the first instance:
>>> dataset.head(n=1)
Randomly sample two instances:
>>> dataset.sample(n=2, seed=0)
Get all instances in the dataset labelled as ‘positive’:
>>> dataset.filter('positive')
- Parameters:
instances (_type_) – Instances.
labels (Sequence[LT]) – Ground-truth labels (annotated).
type (str, optional) – Type description. Defaults to “dataset”.
subtype (Optional[str], optional) – Subtype description. Defaults to None.
callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.
- property content
Content as dictionary.
- property data
Get data property.
- filter(indexer)
Filter dataset by label, filter function or boolean list/array.
Examples
Filter by label ‘positive’:
>>> dataset.filter('positive')
Filter if ‘@’ character in data:
>>> dataset.filter(lambda data, label: '@' in data)
Filter if ‘@’ character not in instance and label in (‘neutral’, ‘negative’):
>>> def filter_fn(instance): ... return '@' not in instance['data'] and instance['label'] in (frozenset({'neutral'}), frozenset({'negative'})) >>> dataset.filter(filter_fn)
Filter by boolean sequence (should be equal length to the number of instances):
>>> dataset.filter([True] * len(dataset))
- Parameters:
indexer (Union[Callable[[dict], bool], Callable[[DT, LT], bool], Sequence[bool], LT]) – Filter to apply.
- Raises:
ValueError – Boolean array should be equal length to number of instances.
- Returns:
Filtered dataset.
- Return type:
- head(n=10)
Get the first n elements in the dataset.
- Parameters:
n (int, optional) – Number of elements >= 0. Defaults to 10.
- Raises:
ValueError – n should be >= 0.
- Returns:
First n elements.
- Return type:
- property instances
Get instances property
- property keys
Get keys property
- property labels
Get labels property.
- sample(n=1, seed=None)
Get a random sample of size n.
- Parameters:
n (int, optional) – Number of elements >= 0. Defaults to 1.
seed (int, optional) – Seed for reproducibility; if None it takes a random seed. Defaults to None.
- Raises:
ValueError – n should be >= 0.
- Returns:
Random subsample.
- Return type:
- class explabox.digestibles.Descriptives(labels, label_counts, tokenized_lengths, type='descriptives', callargs=None, **kwargs)
Bases:
MetaInfo
Digestible for descriptive statistics.
- Parameters:
labels (Sequence[LT]) – Names of labels.
label_counts (Dict[str, Dict[LT, int]]) – Counts per label per split.
tokenized_lengths (dict) – Descriptive statistics for lengths of tokenized instances.
type (str, optional) – Type description. Defaults to “descriptives”.
subtype (Optional[str], optional) – Subtype description. Defaults to None.
callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.
- property content
Content as dictionary.
- class explabox.digestibles.Performance(labels, metrics, type='model_performance', subtype='classification', callargs=None, **kwargs)
Bases:
MetaInfo
Digestible for performance metrics.
- Parameters:
labels (Sequence[LT]) – Names of labels.
metrics (dict) – Performance metrics per label.
type (str, optional) – Type description. Defaults to “model_performance”.
subtype (Optional[str], optional) – Subtype description. Defaults to None.
callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.
- property content
Content as dictionary.
- property metrics
Metrics values.
- class explabox.digestibles.WronglyClassified(instances, contingency_table, type='wrongly_classified', callargs=None, **kwargs)
Bases:
Instances
Digestible for wrongly classified instances
- Parameters:
instances (_type_) – Instances.
contingency_table (Dict[Tuple[LT, LT], FrozenSet[KT]]) – Classification contingency table as returned from instancelib.analysis.base.contingency_table().
type (str, optional) – Type description. Defaults to “wrongly_classified”.
callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.
- property content
Content as dictionary.
- property wrongly_classified
Wrongly classified instances, grouped by their ground-truth value, predicted value and instances.