explabox.expose.text

Functions/classes for sensitivity testing (fairness and robustness) for text data.

class explabox.expose.text.Exposer(data=None, model=None, ingestibles=None, **kwargs)

Bases: Readable, IngestiblesMixin

The Exposer exposes your model and/or data, by performing sensitivity tests.

With the Exposer you can see model sensitivity to random inputs (robustness), test model generalizability (robustness), and see the effect of adjustments of attributes in the inputs (e.g. swapping male pronouns for female pronouns; fairness), for the dataset as a whole (global) as well as for individual instances (local).

The Exposer requires ‘data’ and ‘model’ defined. It is included in the Explabox under the .expose property.

Examples

See how performance of a model on the test dataset is affected when text is randomly changed to uppercase:

>>> from explabox.expose import Exposer
>>> exposer = Exposer(data=data, model=model)
>>> exposer.compare_metric(splits='test', perturbation='random_upper')

Parameters:

data (Optional[Environment], optional) – Data for ingestibles. Defaults to None.
model (Optional[AbstractClassifier], optional) – Model for ingestibles. Defaults to None.
ingestibles (Optional[Ingestible], optional) – Ingestible. Defaults to None.

compare_metric(perturbation, splits='test')

Compare metrics for each ground-truth label and attribute after applying a dataset-wide perturbation.

Examples

Compare metric of model performance (e.g. accuracy, precision) before and after mapping each instance in the test dataset to uppercase:

>>> box.expose.compare_metric(splits='test', peturbation='upper')

Add ‘!!!’ to the end of each text in the ‘train’ and ‘test’ split and see how it affects performance:

>>> from explabox.expose.text import OneToOnePerturbation
>>> perturbation_fn = OneToOnePerturbation(lambda x: f'{x}!!!')
>>> box.expose.compare_metrics(splits=['train', 'test'], perturbation=perturbation_fn)

Parameters:

perturbation (Union[OneToOnePerturbation, str]) – Custom perturbation or one of the default ones, picked by their string: ‘lower’, ‘upper’, ‘random_lower’, ‘random_upper’, ‘add_typos’, ‘random_case_swap’, ‘swap_random’ (swap characters), ‘delete_random’ (delete characters), ‘repeat’ (repeats twice).
splits (Union[str, List[str]], optional) – Split to apply the perturbation to. Defaults to “test”.

Raises:

ValueError – Unknown perturbation.

Returns:

Original label (before perturbation), perturbed label (after: perturbation) and metrics for label-attribute pair.

Return type:

Union[LabelMetrics, MultipleReturn]

input_space(generators, n_samples=100, min_length=0, max_length=100, seed=0, **kwargs)

Test the robustness of a machine learning model to different input types (safety).

Example

Test a pretrained black-box model for its robustness to 1000 random strings (length 0 to 500), containing whitespace characters, ASCII (upper, lower and numbers), emojis and Russian Cyrillic characters:

>>> from explabox import Explabox, RandomEmojis, RandomCyrillic
>>> box = Explabox(data=data, model=model)
>>> box.expose.input_space(generators=['whitespace', 'ascii', RandomEmojis(base=True), RandomCyrillic('ru')],
...                        n_samples=1000,
...                        min_length=0,
...                        max_length=500)

Parameters:

generators (Union[str, RandomString, List[Union[RandomString, str]]]) – Random character generators. If ‘all’ select all generators. For strings choose from ‘ascii’, ‘emojis, ‘whitespace’, ‘spaces’, ‘ascii_upper’, ‘ascii_lower’, ‘digits’, ‘punctuation’, ‘cyrillic’.
n_samples (int, optional) – Number of test samples. Defaults to 100.
min_length (int, optional) – Input minimum length. Defaults to 0.
max_length (int, optional) – Input maximum length. Defaults to 100.
seed (Optional[int], optional) – Seed for reproducibility purposes. Defaults to 0.

Returns:

Percentage of success cases, list of succeeded/failed instances

Return type:

SuccessTest

invariance(pattern, expectation, **kwargs)

Test for the failure rate under invariance.

Example

Test if predictions remain ‘positive’ for 50 samples of the pattern ‘I {like|love} {name} from {city}!’:

>>> from explabox import Explabox
>>> box = Explabox(data=data, model=model)
>>> box.expose.invariance('I {like|love} {name} from {city}!', expectation='positive', n_samples=50)

Parameters:

pattern (str) – String pattern to generate examples from.
expectation (Optional[LT], optional) – Expected outcome values. Defaults to None.
**kwargs – Optional arguments passed onto the data.generate.from_pattern() function.

Returns:

Percentage of success cases, list of succeeded (invariant)/failed (variant) instances

Return type:

SuccessTest

mean_score(pattern, selected_labels='all', **kwargs)

Calculate mean (probability) score for the given labels, for data generated from a pattern.

Example

Calculate the mean score for the ‘positive’ label for ‘I {like|love} {name} from {city}!’:

>>> from explabox import Explabox
>>> box = Explabox(data=data, model=model)
>>> box.expose.mean_score('I {like|love} {name} from {city}!', selected_labels='positive', seed=0)

Parameters:

pattern (str) – Pattern to generate instance from.
selected_labels (Optional[Union[LT, List[LT]]], optional) – Label name to select(s). If None or ‘all’ it is replaced by all labels. Defaults to ‘all’.
**kwargs – Optional arguments passed onto the data.generate.from_pattern() function.

Return type:

MeanScore | MultipleReturn

Returns:

Mean score for one label or all selected labels.

Return type:

Union[MeanScore, MultipleReturn]

Parameters:

pattern (str) –
selected_labels (LT | List[LT] | None) –

Submodules:

explabox.expose.text.characters module

Character-level perturbations.

explabox.expose.text.characters.add_typos(n=1, **kwargs)

Create a Perturbation object that adds keyboard typos within words.

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See `naw.KeyboardAug`_ for optional constructor arguments.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.delete_random(n=1, **kwargs)

Create a Perturbation object with random character deletions in words.

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’delete’ by default).

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_case_swap(n=1)

Create a Perturbation object that randomly swaps characters case (lower to higher or vice versa).

Parameters:: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns:: Object able to apply perturbations on strings or TextInstances.
Return type:: Perturbation

explabox.expose.text.characters.random_lower(n=1)

Create a Perturbation object that randomly swaps characters to lowercase.

Parameters:: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns:: Object able to apply perturbations on strings or TextInstances.
Return type:: Perturbation

explabox.expose.text.characters.random_spaces(n=1, **kwargs)

Create a Perturbation object that adds random spaces within words (splits them up).

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See naw.SplitAug for optional constructor arguments.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_upper(n=1)

Create a Perturbation object that randomly swaps characters to uppercase.

Parameters:: n (int, optional) – Number of perturbed instances required. Defaults to 1.
Returns:: Object able to apply perturbations on strings or TextInstances.
Return type:: Perturbation

explabox.expose.text.characters.swap_random(n=1, **kwargs)

Create a Perturbation object that randomly swaps characters within words.

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.
**kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’swap’ by default).

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.sentences module

Sentence-level perturbations.

explabox.expose.text.sentences.repeat_k_times(k=10, connector=' ')

Repeat a string k times.

Parameters:

k (int, optional) – Number of times to repeat a string. Defaults to 10.
connector (Optional[str], optional) – Connector between adjacent repeats. Defaults to ‘ ‘.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.sentences.to_lower()

Make all characters in a string lowercase.

Returns:: Object able to apply perturbations on strings or TextInstances.
Return type:: Perturbation

explabox.expose.text.sentences.to_upper()

Make all characters in a string uppercase.

Returns:: Object able to apply perturbations on strings or TextInstances.
Return type:: Perturbation

explabox.expose.text.words module

Word-level perturbations.