explabox.explain

Add explainability to your model/dataset with the Explainer class.

class explabox.explain.Explainer(data=None, model=None, ingestibles=None, **kwargs)

Bases: Readable, IngestiblesMixin

The Explainer creates explanations corresponding to a model and dataset (with ground-truth labels).

With the Explainer you can use explainble AI (XAI) methods for explaining the whole dataset (global), model behavior on the dataset (global), and specific predictions/decisions (local).

The Explainer requires ‘data’ and ‘model’ defined. It is included in the Explabox under the .explain property.

Examples

Construct the explainer:

>>> from explabox.explain import Explainer
>>> explainer = Explainer(data=data, model=model)

Get a local explanation with LIME (https://github.com/marcotcr/lime) and kernelSHAP (https://github.com/slundberg/shap):

>>> explainer.explain_prediction('I love this so much!', methods=['lime', 'kernel_shap'])

See the top-25 tokens for predicted classifier labels on the test set:

>>> explainer.token_frequency(k=25, explain_model=True, splits='test')

Select the top-5 prototypical examples in the train set:

>>> explainer.prototypes(n=5, splits='train')
Parameters:
  • data (Optional[Environment], optional) – Data for ingestibles. Defaults to None.

  • model (Optional[AbstractClassifier], optional) – Model for ingestibles. Defaults to None.

  • ingestibles (Optional[Ingestible], optional) – Ingestible. Defaults to None.

explain_prediction(sample, *args, methods=['lime'], **kwargs)

Explain specific sample locally.

Parameters:
  • sample (Union[int, str]) – Identifier of sample in dataset (int) or input (str).

  • methods (Union[str, List[str]]) – List of methods to get explanations from. Choose from ‘lime’, ‘shap’, ‘baylime’, ‘tree’, ‘rules’, ‘foil_tree’.

  • *args – Positional arguments passed to local explanation technique.

  • **kwargs – Keyword arguments passed to local explanation technique.

Returns:

Explanations for each selected method, unless method is unknown (returns None).

Return type:

Optional[MultipleReturn]

prototypes(method='mmdcritic', n=5, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, seed=0)

Select n prototypes (representative samples) for the given split(s).

Parameters:
  • method (str, optional) – Method(s) to apply. Choose from [‘mmdcritic’, ‘kmedoids’]. Defaults to ‘mmdcritic’.

  • n (int, optional) – Number of prototypes to generate. Defaults to 5.

  • splits (Union[str, List[str]], optional) – Name(s) of split(s). Defaults to “test”.

  • embedder (Optional[Embedder], optional) – Embedder used. Defaults to TfidfVectorizer.

  • labelwise (bool, optional) – Select for each label. Defaults to False.

  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

Raises:

ValueError – Unknown method selected.

Returns:

Prototypes for each methods and split.

Return type:

Union[Instances, MultipleReturn]

prototypes_criticisms(n_prototypes=5, n_criticisms=3, splits='test', embedder=<class 'text_explainability.data.embedding.TfidfVectorizer'>, labelwise=False, **kwargs)

Select n prototypes (representative samples) and n criticisms (outliers) for the given split(s).

Parameters:
  • n_prototypes (int, optional) – Number of prototypes to generate. Defaults to 5.

  • n_criticsms (int, optional) – Number of criticisms to generate. Defaults to 3.

  • splits (Union[str, List[str]], optional) – Name(s) of split(s). Defaults to “test”.

  • embedder (Optional[Embedder], optional) – Embedder used. Defaults to TfidfVectorizer.

  • labelwise (bool, optional) – Select for each label. Defaults to False.

  • n_criticisms (int) –

Returns:

Prototypes for each methods and split.

Return type:

Union[Instances, MultipleReturn]

token_frequency(splits='test', explain_model=True, labelwise=True, k=25, filter_words=<Proxy at 0x74f8fe7fd640 wrapping ['de', 'het', 'een'] at 0x74f8fe7cc180 with factory <function lazy.<locals>.<lambda>>>, lower=True, seed=0, **count_vectorizer_kwargs)

Show the top-k number of tokens for each ground-truth or predicted label.

Parameters:
  • splits (Union[str, List[str]], optional) – Split names to get the explanation for. Defaults to ‘test’.

  • explain_model (bool, optional) – Whether to explain the model (True) or ground-truth labels (False). Defaults to True.

  • labelwise (bool, optional) – Whether to summarize the counts for each label seperately. Defaults to True.

  • k (Optional[int], optional) – Limit to the top-k words per label, or all words if None. Defaults to 25.

  • filter_words (List[str], optional) – Words to filter out from top-k. Defaults to [‘a’, ‘an’, ‘the’].

  • lower (bool, optional) – Whether to make all tokens lowercase. Defaults to True.

  • seed (int, optional) –

  • **count_vectorizer_kwargs – Optional arguments passed to CountVectorizer/FastCountVectorizer.

Returns:

Each label with corresponding top words and their frequency

Return type:

Union[FeatureList, MultipleReturn]

token_information(splits='test', explain_model=True, k=25, filter_words=<Proxy at 0x74f8fe7fd940 wrapping ['de', 'het', 'een'] at 0x74f8fe7d4d00 with factory <function lazy.<locals>.<lambda>>>, lower=True, seed=0, **count_vectorizer_kwargs)

Show the top-k token mutual information for a dataset or model.

Parameters:
  • splits (Union[str, List[str]], optional) – Split names to get the explanation for. Defaults to ‘test’.

  • explain_model (bool, optional) – Whether to explain the model (True) or ground-truth labels (False). Defaults to True.

  • labelwise (bool, optional) – Whether to summarize the counts for each label seperately. Defaults to True.

  • k (Optional[int], optional) – Limit to the top-k words per label, or all words if None. Defaults to 25.

  • filter_words (List[str], optional) – Words to filter out from top-k. Defaults to [‘a’, ‘an’, ‘the’].

  • lower (bool, optional) – Whether to make all tokens lowercase. Defaults to True.

  • seed (int, optional) –

  • **count_vectorizer_kwargs – Optional arguments passed to CountVectorizer/FastCountVectorizer.

Returns:

k labels, sorted based on their mutual information with

the output (predictive model labels or ground-truth labels)

Return type:

Union[FeatureList, MultipleReturn]

class explabox.explain.FeatureList(used_features, scores, labels=None, labelset=None, original_scores=None, type='global_explanation', subtype='feature_list', callargs=None, **kwargs)

Bases: BaseReturnType, UsedFeaturesMixin

Save scores per feature, grouped per label.

Examples of scores are feature importance scores, or counts of features in a dataset.

Parameters:
  • used_features (Union[Sequence[str], Sequence[int]]) – Used features per label.

  • scores (Union[Sequence[int], Sequence[float]]) – Scores per label.

  • labels (Optional[Sequence[int]], optional) – Label indices to include, if none provided defaults to ‘all’. Defaults to None.

  • labelset (Optional[Sequence[str]], optional) – Lookup for label names. Defaults to None.

  • original_scores (Optional[Sequence[float]], optional) – Probability scores for each class. Defaults to None.

  • type (Optional[str]) – Type description. Defaults to ‘explanation’.

  • subtype (Optional[str], optional) – Subtype description. Defaults to ‘feature_list’.

  • callargs (Optional[dict], optional) – Call arguments for reproducibility. Defaults to None.

  • **kwargs – Optional meta descriptors.

property content
get_raw_scores(normalize=False)

Get saved scores per label as np.ndarray.

Parameters:

normalize (bool, optional) – Normalize scores (ensure they sum to one). Defaults to False.

Returns:

Scores.

Return type:

np.ndarray

get_scores(normalize=False)

Get scores per label.

Parameters:

normalize (bool, optional) – Whether to normalize the scores (sum to one). Defaults to False.

Returns:

Scores per label, if no labelset

is not set, defaults to ‘all’

Return type:

Dict[Union[str, int], Tuple[Union[str, int], Union[float, int]]]

property scores

Saved scores (e.g. feature importance).

class explabox.explain.Instances(instances, original_scores=None, type='global_explanation', subtype='prototypes', callargs=None, **kwargs)

Bases: BaseReturnType

Parameters:
  • original_scores (Sequence[float] | None) –

  • type (str | None) –

  • subtype (str | None) –

  • callargs (dict | None) –

property content

Subpackages: