Model Utils¶
display_parameters¶
-
textbrewer.utils.
display_parameters
(model, max_level=None)[source]¶ Display the numbers and memory usage of module parameters.
- Parameters
model (torch.nn.Module or dict) – the model to be inspected.
max_level (int or None) – The max level to display. If
max_level==None
, show all the levels.
- Returns
A formatted string and a
LayerNode
object representing the model.
Data Utils¶
This module provides the following data augmentation methods.
masking¶
-
textbrewer.data_utils.
masking
(tokens, p=0.1, mask='[MASK]')[source]¶ Returns a new list by replacing elements in tokens by mask with probability p.
- Parameters
tokens (list) – list of tokens or token ids.
p (float) – probability to mask each element in tokens.
- Returns
A new list by replacing elements in tokens by mask with probability p.
deleting¶
-
textbrewer.data_utils.
deleting
(tokens, p=0.1)[source]¶ Returns a new list by deleting elements in tokens with probability p.
- Parameters
tokens (list) – list of tokens or token ids.
p (float) – probability to delete each element in tokens.
- Retunrns:
a new list by deleting elements in :tokens with probability p.
n_gram_sampling¶
-
textbrewer.data_utils.
n_gram_sampling
(tokens, p_ng=[0.2, 0.2, 0.2, 0.2, 0.2], l_ng=[1, 2, 3, 4, 5])[source]¶ Samples a length l from l_ng with probability distribution p_ng, then returns a random span of length l from tokens.
- Parameters
tokens (list) – list of tokens or token ids.
p_ng (list) – probability distribution of the n-grams, should sum to 1.
l_ng (list) – specify the n-grams.
- Returns
a n-gram random span from tokens.
short_disorder¶
-
textbrewer.data_utils.
short_disorder
(tokens, p=[0.9, 0.1, 0, 0, 0])[source]¶ Returns a new list by disordering tokens with probability distribution p at every possible position. Let abc be a 3-gram in tokens, there are five ways to disorder, corresponding to five probability values:
abc -> abcabc -> bacabc -> cbaabc -> cababc -> bca- Parameters
tokens (list) – list of tokens or token ids.
p (list) – probability distribution of 5 disorder types, should sum to 1.
- Returns
a new disordered list
long_disorder¶
-
textbrewer.data_utils.
long_disorder
(tokens, p=0.1, length=20)[source]¶ Performs a long-range disordering. If
length>1
, then swaps the two halves of each span of length length in tokens; iflength<=1
, treats length as the relative length. For example:>>>long_disorder([0,1,2,3,4,5,6,7,8,9,10], p=1, length=0.4) [2, 3, 0, 1, 6, 7, 4, 5, 8, 9]
- Parameters
tokens (list) – list of tokens or token ids.
p (list) – probability to swaps the two halves of a spans at possible positions.
length (int or float) – length of the disordered span.
- Returns
a new disordered list