explabox.expose.text

Functions/classes for sensitivity testing (fairness and robustness) for text data.

class explabox.expose.text.Exposer(data=None, model=None, ingestibles=None, **kwargs)

Bases: Readable, IngestiblesMixin

The Exposer exposes your model and/or data, by performing sensitivity tests.

With the Exposer you can see model sensitivity to random inputs (robustness), test model generalizability (robustness), and see the effect of adjustments of attributes in the inputs (e.g. swapping male pronouns for female pronouns; fairness), for the dataset as a whole (global) as well as for individual instances (local).

The Exposer requires ‘data’ and ‘model’ defined. It is included in the Explabox under the .expose property.

Examples

See how performance of a model on the test dataset is affected when text is randomly changed to uppercase:

>>> from explabox.expose import Exposer
>>> exposer = Exposer(data=data, model=model)
>>> exposer.compare_metric(splits='test', perturbation='random_upper')
Parameters:
  • data (Optional[Environment], optional) – Data for ingestibles. Defaults to None.

  • model (Optional[AbstractClassifier], optional) – Model for ingestibles. Defaults to None.

  • ingestibles (Optional[Ingestible], optional) – Ingestible. Defaults to None.

compare_metric(perturbation, splits='test')

Compare metrics for each ground-truth label and attribute after applying a dataset-wide perturbation.

Examples

Compare metric of model performance (e.g. accuracy, precision) before and after mapping each instance in the test dataset to uppercase:

>>> box.expose.compare_metric(splits='test', peturbation='upper')

Add ‘!!!’ to the end of each text in the ‘train’ and ‘test’ split and see how it affects performance:

>>> from explabox.expose.text import OneToOnePerturbation
>>> perturbation_fn = OneToOnePerturbation(lambda x: f'{x}!!!')
>>> box.expose.compare_metrics(splits=['train', 'test'], perturbation=perturbation_fn)
Parameters:
  • perturbation (Union[OneToOnePerturbation, str]) – Custom perturbation or one of the default ones, picked by their string: ‘lower’, ‘upper’, ‘random_lower’, ‘random_upper’, ‘add_typos’, ‘random_case_swap’, ‘swap_random’ (swap characters), ‘delete_random’ (delete characters), ‘repeat’ (repeats twice).

  • splits (Union[str, List[str]], optional) – Split to apply the perturbation to. Defaults to “test”.

Raises:

ValueError – Unknown perturbation.

Returns:

Original label (before perturbation), perturbed label (after

perturbation) and metrics for label-attribute pair.

Return type:

Union[LabelMetrics, MultipleReturn]

input_space(generators, n_samples=100, min_length=0, max_length=100, seed=0, **kwargs)

Test the robustness of a machine learning model to different input types (safety).

Example

Test a pretrained black-box model for its robustness to 1000 random strings (length 0 to 500), containing whitespace characters, ASCII (upper, lower and numbers), emojis and Russian Cyrillic characters:

>>> from explabox import Explabox, RandomEmojis, RandomCyrillic
>>> box = Explabox(data=data, model=model)
>>> box.expose.input_space(generators=['whitespace',
...                                    'ascii',
...                                    RandomEmojis(base=True),
...                                    RandomCyrillic('ru')],
...                        n_samples=1000,
...                        min_length=0,
...                        max_length=500)
Parameters:
  • generators (Union[str, RandomString, List[Union[RandomString, str]]]) – Random character generators. If ‘all’ select all generators. For strings choose from ‘ascii’, ‘emojis, ‘whitespace’, ‘spaces’, ‘ascii_upper’, ‘ascii_lower’, ‘digits’, ‘punctuation’, ‘cyrillic’.

  • n_samples (int, optional) – Number of test samples. Defaults to 100.

  • min_length (int, optional) – Input minimum length. Defaults to 0.

  • max_length (int, optional) – Input maximum length. Defaults to 100.

  • seed (Optional[int], optional) – Seed for reproducibility purposes. Defaults to 0.

Returns:

Percentage of success cases, list of succeeded/failed instances

Return type:

SuccessTest

invariance(pattern, expectation, **kwargs)

Test for the failure rate under invariance.

Example

Test if predictions remain ‘positive’ for 50 samples of the pattern ‘I {like|love} {name} from {city}!’:

>>> from explabox import Explabox
>>> box = Explabox(data=data, model=model)
>>> box.expose.invariance('I {like|love} {name} from {city}!', expectation='positive', n_samples=50)
Parameters:
  • pattern (str) – String pattern to generate examples from.

  • expectation (Optional[LT], optional) – Expected outcome values. Defaults to None.

  • **kwargs – Optional arguments passed onto the data.generate.from_pattern() function.

Returns:

Percentage of success cases, list of succeeded (invariant)/failed (variant) instances

Return type:

SuccessTest

mean_score(pattern, selected_labels='all', **kwargs)

Calculate mean (probability) score for the given labels, for data generated from a pattern.

Example

Calculate the mean score for the ‘positive’ label for ‘I {like|love} {name} from {city}!’:

>>> from explabox import Explabox
>>> box = Explabox(data=data, model=model)
>>> box.expose.mean_score('I {like|love} {name} from {city}!', selected_labels='positive', seed=0)
Parameters:
  • pattern (str) – Pattern to generate instance from.

  • selected_labels (Optional[Union[LT, List[LT]]], optional) – Label name to select(s). If None or ‘all’ it is replaced by all labels. Defaults to ‘all’.

  • **kwargs – Optional arguments passed onto the data.generate.from_pattern() function.

Return type:

MeanScore | MultipleReturn

Returns:

Mean score for one label or all selected labels.

Return type:

Union[MeanScore, MultipleReturn]

Parameters:
  • pattern (str) –

  • selected_labels (LT | List[LT] | None) –

class explabox.expose.text.LabelMetrics(instances, label_metrics, type='sensitivity', subtype='label_metrics', callargs=None, **kwargs)

Bases: Instances

Return type for labelwise metrics.

Parameters:
  • instances (_type_) – Instances.

  • label_metrics (_type_) – Metric for each label.

  • type (Optional[str], optional) – Type description. Defaults to ‘sensitivity’.

  • subtype (Optional[str], optional) – Subtype description. Defaults to ‘label_metrics’.

  • callargs (Optional[dict], optional) – Arguments used when the function was called. Defaults to None.

property content

Content as dictionary.

class explabox.expose.text.MeanScore(scores, label, instances, type='sensitivity', subtype='mean_score', callargs=None, **kwargs)

Bases: Instances

Return type for text_sensitivity.mean_score().

Parameters:
  • scores (List[float]) – Score for each instance.

  • label (str) – Name of label.

  • instances (_type_) – Instances.

  • type (Optional[str], optional) – Type description. Defaults to ‘sensitivity’.

  • subtype (Optional[str], optional) – Subtype description. Defaults to ‘mean_score’.

  • callargs (Optional[dict], optional) – Arguments used when the function was called. Defaults to None.

property content

Content as dictionary.

class explabox.expose.text.OneToManyPerturbation(perturbation_function)

Bases: Perturbation

Apply a perturbation function to a single TextInstance, getting a multiple results per instance.

Parameters:

perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instances. Should return None if no perturbation has been applied.

classmethod from_dictionary(dictionary, label_from, label_to, n=10, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToManyPerturbation from a dictionary.

Example

Replace the word ‘good’ (positive) with ‘bad’, ‘mediocre’, ‘terrible’ (negative) up to 5 times in each instance. The default tokenizer/detokenizer assumes word-level tokens:

>>> replacements = {'good': ['bad', 'mediocre', 'terrible']}
>>> OneToManyPerturbation.from_dictionary(replacement,
>>>                                       n=5,
>>>                                       label_from='positive',
>>>                                       label_to='negative')
Parameters:
  • dictionary (Dict[str, List[str]]) – Lookup dictionary to map tokens (e.g. words, characters).

  • label_from (LT) – Attribute label of original instance (left-hand side of dictionary).

  • label_to (LT) – Attribute label of perturbed instance (right-hand side of dictionary).

  • n (int, optional) – Number of instances to generate. Defaults to 10.

  • tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.

  • detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_function(function, label_from='original', label_to='perturbed', n=10, perform_once=False)

Construct a OneToManyPerturbation from a perturbation applied to a string.

Parameters:
  • function (Callable[[str], Optional[Union[str, Sequence[str]]]]) – Function to apply to each string. Return None if no change was applied.

  • label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.

  • label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.

  • n (int, optional) – Number of instances to generate. Defaults to 10.

  • perform_once (bool, optional) – If the n parameter is in class construction perform once. Defaults to False.

classmethod from_nlpaug(augmenter, label_from='original', label_to='perturbed', n=10, **augment_kwargs)

Construct a OneToManyPerturbation from a nlpaug Augmenter.

Example

Add n=5 versions of keyboard typing mistakes to lowercase characters in a sentence using nlpaug.augmenter.char.KeyboardAug():

>>> import nlpaug.augmenter.char as nac
>>> augmenter = nac.KeyboardAug(include_upper_case=False,
>>>                             include_special_char=False)
>>> OneToManyPerturbation.from_nlpaug(augmenter, n=5, label_from='no_typos', label_to='typos')
Parameters:
  • augmenter (Augmenter) – Class with .augment() function applying a perturbation to a string.

  • label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.

  • label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.

  • n (int, optional) – Number of instances to generate. Defaults to 10.

  • **augment_kwargs – Optional arguments passed to .augment() function.

perturb(instance)

Apply a perturbation function to a single TextInstance, getting a multiple results per instance.

Parameters:
  • perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instances. Should return None if no perturbation has been applied.

  • instance (TextInstance) –

Returns:

None if no perturbation has been applied.

Otherwise a sequence of perturbed TextInstances, and attribute labels for the original and perturbed instances.

Return type:

Optional[Sequence[Tuple[TextInstance, Sequence[Tuple[KT, LT]]]]]

class explabox.expose.text.OneToOnePerturbation(perturbation_function)

Bases: Perturbation

Apply a perturbation function to a single TextInstance, getting a single result per instance.

Parameters:

perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instance. Should return None if no perturbation has been applied.

classmethod from_dictionary(dictionary, label_from, label_to, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from a dictionary.

Example

Replace the word ‘a’ or ‘an’ (indefinite article) with ‘the’ (definite article) in each instance. The default tokenizer/detokenizer assumes word-level tokens:

>>> replacements = {'a': 'the',
>>>                 'an': 'the'}
>>> OneToOnePerturbation.from_dictionary(replacement,
>>>                                      label_from='indefinite',
>>>                                      label_to='definite')

Replace the character ‘.’ with ‘!’ (character-level replacement): >>> from text_explainability import character_tokenizer, character_detokenizer >>> OneToOnePerturbation.from_dictionary({‘.’: ‘!’}, >>> label_from=’not_excited’, >>> label_to=’excited’, >>> tokenizer=character_tokenizer, >>> detokenizer=character_detokenizer)

Parameters:
  • dictionary (Dict[str, str]) – Lookup dictionary to map tokens (e.g. words, characters).

  • label_from (LT) – Attribute label of original instance (left-hand side of dictionary).

  • label_to (LT) – Attribute label of perturbed instance (right-hand side of dictionary).

  • tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.

  • detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_list(mapping_list, label_from='original', label_to='perturbed', tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from a list.

A function is constructed that aims to map any value in the list to any other value in the list.

Example

For example, if list [‘Amsterdam’, ‘Rotterdam’, ‘Utrecht’] is provided it aims to map ‘Amsterdam’ to ‘Rotterdam’ or ‘Utrecht’, ‘Rotterdam’ to ‘Amsterdam’ to ‘Utrecht’ and ‘Utrecht’ to ‘Rotterdam’ or ‘Amsterdam’. If None of these is possible, it returns None.

>>> map_list = ['Amsterdam', 'Rotterdam', 'Utrecht']
>>> OneToOnePerturbation.from_list(map_list)
Parameters:
  • mapping_list (List[str]) – Lookup list of tokens (e.g. words, characters).

  • label_from (LT) – Attribute label of original instance (non-replaced).

  • label_to (LT) – Attribute label of perturbed instance (replaced).

  • tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.

  • detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

classmethod from_nlpaug(augmenter, label_from='original', label_to='perturbed', **augment_kwargs)

Construct a OneToOnePerturbation from a nlpaug Augmenter.

Example

Add random spaces to words in a sentence using nlpaug.augmenter.word.SplitAug():

>>> import nlpaug.augmenter.word as naw
>>> OneToOnePerturbation.from_nlpaug(naw.SplitAug(), label_to='with_extra_space')

Or add keyboard typing mistakes to lowercase characters in a sentence using nlpaug.augmenter.char.KeyboardAug():

>>> import nlpaug.augmenter.char as nac
>>> augmenter = nac.KeyboardAug(include_upper_case=False,
>>>                             include_special_char=False,
>>>                             include_numeric=False)
>>> OneToOnePerturbation.from_nlpaug(augmenter, label_from='no_typos', label_to='typos')
Parameters:
  • augmenter (Augmenter) – Class with .augment() function applying a perturbation to a string.

  • label_from (LT, optional) – Attribute label of original instance. Defaults to ‘original’.

  • label_to (LT, optional) – Attribute label of perturbed instance. Defaults to ‘perturbed’.

  • **augment_kwargs – Optional arguments passed to .augment() function.

classmethod from_string(prefix=None, suffix=None, replacement=None, label_from='original', label_to='perturbed', connector=' ', connector_before=None, connector_after=None)

Construct a OneToOnePerturbation from a string (replacement, prefix and/or suffix).

Provides the ability to replace each instance string with a new one, add a prefix to each instance string and/or add a suffix to each instance string. At least one of prefix, suffix or replacement should be a string to apply the replacement.

Example

Add a random unrelated string ‘Dit is ongerelateerd.’ to each instance (as prefix), where you expect that predictions will not change:

>>> OneToOnePerturbation.from_string(prefix='Dit is ongerelateerd.', label_to='with_prefix')

Or add a negative string ‘Dit is negatief!’ to each instance (as suffix on the next line), where you expect that instances will have the same label or become more negative:

>>> OneToOnePerturbation.from_string(suffix='Dit is negatief!',
>>>                                  connector_after='\n',
>>>                                  label_to='more_negative')

Or replace all instances with ‘UNKWRDZ’: >>> OneToOnePerturbation.from_string(replacement=’UNKWRDZ’)

Raises:

ValueError – At least one of prefix, suffix and replacement should be provided.

Parameters:
  • label_from (LT) – Attribute label of original instance. Defaults to ‘original’.

  • label_to (LT) – Attribute label of perturbed instance. Defaults to ‘perturbed’.

  • prefix (Optional[str], optional) – Text to add before instance.data. Defaults to None.

  • suffix (Optional[str], optional) – Text to add after instance.data. Defaults to None.

  • replacement (Optional[str], optional) – Text to replace instance.data with. Defaults to None.

  • connector (str) – General connector between prefix, instance.data and suffix. Defaults to ‘ ‘.

  • connector_before (Optional[str], optional) – Overrides connector between prefix and instance.data, if it is None connector is used. Defaults to None.

  • connector_after (Optional[str], optional) – Overrides connector between instance.data and suffix, if it is None connector is used. Defaults to None.

classmethod from_tuples(tuples, label_from, label_to, tokenizer=<function word_tokenizer>, detokenizer=<function word_detokenizer>)

Construct a OneToOnePerturbation from tuples.

A function is constructed where if first aims to perform the mapping from the tokens on the left-hand side (LHS) to the right-hand side (RHS), and if this has no result it aims to perform the mapping from the tokens on the RHS to the LHS.

Example

For example, if [(‘he’, ‘she’)] with label_from=’male’ and label_to=’female’ is provided it first checks whether the tokenized instance contains the word ‘he’ (and if so applies the perturbation and returns), and otherwise aims to map ‘she’ to ‘he’. If neither is possible, it returns None.

>>> tuples = [('he', 'she'),
>>>.          ('his', 'her')]
>>> OneToOnePerturbation.from_tuples(tuples, label_from='male', label_to='female')
Parameters:
  • tuples (List[Tuple[str, str]]) – Lookup tuples to map tokens (e.g. words, characters).

  • label_from (LT) – Attribute label of original instance (left-hand side of tuples).

  • label_to (LT) – Attribute label of perturbed instance (right-hand side of tuples).

  • tokenizer (Callable, optional) – Function to tokenize instance data (e.g. words, characters). Defaults to default_tokenizer.

  • detokenizer (Callable, optional) – Function to detokenize tokens into instance data. Defaults to default_detokenizer.

perturb(instance)

Apply a perturbation function to a single TextInstance, getting a single result per instance.

Parameters:
  • perturbation_function (Callable) – Perturbation function to apply, including attribute label of original instance and the resulting instance. Should return None if no perturbation has been applied.

  • instance (TextInstance) –

Returns:

None if no perturbation has been applied.

Otherwise a sequence of perturbed TextInstances, and attribute labels for the original and perturbed instances.

Return type:

Optional[Sequence[Tuple[TextInstance, Sequence[Tuple[KT, LT]]]]]

class explabox.expose.text.RandomAscii(seed=0)

Bases: RandomString

Generate random ASCII characters.

Parameters:

seed (int) –

class explabox.expose.text.RandomCyrillic(languages='ru', upper=True, lower=True, seed=0)

Bases: RandomString

Generate containing random Cyrillic characters.

Can generate text in Bulgarian (‘bg’), Macedonian (‘mk’), Russian (‘ru’), Serbian (‘sr’), Ukrainian (‘uk’), and all combinations thereof.

Parameters:
  • languages (Union[List[str], str], optional) – Cyrillic languages to select. Defaults to ‘ru’.

  • upper (bool, optional) – Whether to include

  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • lower (bool) –

Raises:
  • ValueError – Either upper or lower should be True.

  • ValueError – One of the selected languages is unknown.

class explabox.expose.text.RandomDigits(seed=0)

Bases: RandomString

Generate strings containing random digits.

Parameters:

seed (int) –

class explabox.expose.text.RandomEmojis(seed=0, base=True, dingbats=True, flags=True, components=True)

Bases: RandomString

Generate strings containing a subset of random unicode emojis.

Parameters:
  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • base (bool, optional) – Include base emojis (e.g. smiley face). Defaults to True.

  • dingbats (bool, optional) – Include dingbat emojis. Defaults to True.

  • flags (bool, optional) – Include flag emojis. Defaults to True.

  • components (bool, optional) – Include emoji components (e.g. skin color modifier or country flags). Defaults to True.

Raises:

ValueError – At least one of base, dingbats, flags should be True.

class explabox.expose.text.RandomSpaces(seed=0)

Bases: RandomString

Generate strings with a random number of spaces.

Parameters:

seed (int) –

class explabox.expose.text.RandomString(seed=0, options='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\\'()*+, -./:;<=>?@[\\\\]^_`{|}~ \\t\\n\\r\\x0b\\x0c')

Bases: Readable, SeedMixin

Base class for random data (string) generation.

Parameters:
  • seed (int, optional) – Seed for reproducibility. Defaults to 0.

  • options (Union[str, List[str]], optional) – Characters or strings to generate data from. Defaults to string.printable.

generate(n, min_length=0, max_length=100)

Generate n instances of random strings.

Example

Create a TextInstanceProvider containing n=10 strings of random characters from ‘12345xXyY!?’ between length 3 and 10:

>>> RandomString(seed=0, options='12345xXyY!?').generate_list(n=10, min_length=3, max_length=10)
Parameters:
  • n (int) – Number of instances to generate.

  • min_length (int, optional) – Minimum length of random instance. Defaults to 0.

  • max_length (int, optional) – Maximum length of random instance. Defaults to 100.

Raises:

ValueErrormin_length should be smaller than max_length.

Returns:

Provider containing generated instances.

Return type:

TextInstanceProvider

generate_list(n, min_length=0, max_length=100)

Generate n instances of random strings and return as list.

Example

Generate a list of random characters from u’ABCabcU0001F600’ between length 10 and 50 (n=10 strings):

>>> RandomString(seed=0, options=u'ABCabc\U0001F600').generate_list(n=10, min_length=10, max_length=50)
Parameters:
  • n (int) – Number of instances to generate.

  • min_length (int, optional) – Minimum length of random instance. Defaults to 0.

  • max_length (int, optional) – Maximum length of random instance. Defaults to 100.

Raises:

ValueErrormin_length should be smaller than max_length.

Returns:

List containing generated instances.

Return type:

List[str]

class explabox.expose.text.RandomUpper(seed=0)

Bases: RandomString

Generate random ASCII uppercase characters.

Parameters:

seed (int) –

class explabox.expose.text.RandomWhitespace(seed=0)

Bases: RandomString

Generate strings with a random number whitespace characters.

Parameters:

seed (int) –

class explabox.expose.text.SuccessTest(success_percentage, successes, failures, predictions=None, type='robustness', subtype='input_space', callargs=None, **kwargs)

Bases: Instances

Return type for success test.

Parameters:
  • success_percentage (float) – Percentage of successful cases.

  • successes (_type_) – Instances that succeeded.

  • failures (_type_) – Instances that failed.

  • predictions (Optional[Union[LabelProvider, list, dict]], optional) – Predictions to subdivide successes/ failures into labels. Defaults to None.

  • type (str, optional) – Type description. Defaults to ‘robustness’.

  • subtype (str, optional) – Subtype description. Defaults to ‘input_space’.

  • callargs (Optional[dict], optional) – Arguments used when the function was called. Defaults to None.

property content

Content as dictionary.

explabox.expose.text.compare_accuracy(*args, **kwargs)

Compare accuracy scores for each ground-truth label and attribute.

explabox.expose.text.compare_metric(env, model, perturbation, **kwargs)

Get metrics for each ground-truth label and attribute.

Examples

Compare metric of model performance (e.g. accuracy, precision) before and after mapping each instance in a dataset to uppercase.

>>> from text_sensitivity.perturbation.sentences import to_upper
>>> compare_metric(env, model, to_upper)

Compare metric when randomly adding 10 perturbed instances with typos to each instance in a dataset.

>>> from text_sensitivity.perturbation.characters import add_typos
>>> compare_metric(env, model, add_typos(n=10))
Parameters:
  • env (TextEnvironment) – Environment containing original instances (.dataset) and ground-truth labels (.labels).

  • model (AbstractClassifier) – Black-box model to compare metrics on.

  • perturbation (Perturbation) – Peturbation to apply.

Returns:

Original label (before perturbation), perturbed label (after perturbation) and metrics for label-attribute pair.

Return type:

LabelMetrics

explabox.expose.text.compare_precision(*args, **kwargs)

Compare precision scores for each ground-truth label and attribute.

explabox.expose.text.compare_recall(*args, **kwargs)

Compare recall scores for each ground-truth label and attribute.

Submodules:

explabox.expose.text.characters module

Character-level perturbations.

explabox.expose.text.characters.add_typos(n=1, **kwargs)

Create a Perturbation object that adds keyboard typos within words.

Parameters:
  • n (int, optional) – Number of perturbed instances required. Defaults to 1.

  • **kwargs – See `naw.KeyboardAug`_ for optional constructor arguments.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.delete_random(n=1, **kwargs)

Create a Perturbation object with random character deletions in words.

Parameters:
  • n (int, optional) – Number of perturbed instances required. Defaults to 1.

  • **kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’delete’ by default).

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_case_swap(n=1)

Create a Perturbation object that randomly swaps characters case (lower to higher or vice versa).

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_lower(n=1)

Create a Perturbation object that randomly swaps characters to lowercase.

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_spaces(n=1, **kwargs)

Create a Perturbation object that adds random spaces within words (splits them up).

Parameters:
  • n (int, optional) – Number of perturbed instances required. Defaults to 1.

  • **kwargs – See naw.SplitAug for optional constructor arguments.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.random_upper(n=1)

Create a Perturbation object that randomly swaps characters to uppercase.

Parameters:

n (int, optional) – Number of perturbed instances required. Defaults to 1.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.characters.swap_random(n=1, **kwargs)

Create a Perturbation object that randomly swaps characters within words.

Parameters:
  • n (int, optional) – Number of perturbed instances required. Defaults to 1.

  • **kwargs – See nac.RandomCharAug for optional constructor arguments (uses action=’swap’ by default).

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.sentences module

Sentence-level perturbations.

explabox.expose.text.sentences.repeat_k_times(k=10, connector=' ')

Repeat a string k times.

Parameters:
  • k (int, optional) – Number of times to repeat a string. Defaults to 10.

  • connector (Optional[str], optional) – Connector between adjacent repeats. Defaults to ‘ ‘.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.sentences.to_lower()

Make all characters in a string lowercase.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.sentences.to_upper()

Make all characters in a string uppercase.

Returns:

Object able to apply perturbations on strings or TextInstances.

Return type:

Perturbation

explabox.expose.text.words module

Word-level perturbations.