toad.transform module

class toad.transform.Transformer[source]

Bases: sklearn.base.TransformerMixin, toad.utils.mixin.RulesMixin

Base class for transformers

fit()

fit method, see details in fit_ method

transform(X, *args, **kwargs)[source]

transform method, see details in transform_ method

default_rule()[source]
export(**kwargs)[source]

export rules to dict or a json file

Parameters:to_json (str|IOBase) – json file to save rules
Returns:dictionary of rules
Return type:dict
fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

load(rules, update=False, **kwargs)[source]

load rules from dict or json file

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
  • update (bool) – if need to use updating instead of replacing rules
rules
update(*args, **kwargs)[source]

update rules

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
class toad.transform.WOETransformer[source]

Bases: toad.transform.Transformer

WOE transformer

fit_(X, y)[source]

fit WOE transformer

Parameters:
  • X (DataFrame|array-like) –
  • y (str|array-like) –
  • select_dtypes (str|numpy.dtypes) – ‘object’, ‘number’ etc. only selected dtypes will be transform
transform_(rule, X, default='min')[source]

transform function for single feature

Parameters:
  • X (array-like) –
  • default (str) – ‘min’(default), ‘max’ - the strategy to be used for unknown group
Returns:

array-like

default_rule()[source]
export(**kwargs)[source]

export rules to dict or a json file

Parameters:to_json (str|IOBase) – json file to save rules
Returns:dictionary of rules
Return type:dict
fit()

fit method, see details in fit_ method

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

load(rules, update=False, **kwargs)[source]

load rules from dict or json file

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
  • update (bool) – if need to use updating instead of replacing rules
rules
transform(X, *args, **kwargs)[source]

transform method, see details in transform_ method

update(*args, **kwargs)[source]

update rules

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
class toad.transform.Combiner[source]

Bases: toad.transform.Transformer, toad.utils.mixin.BinsMixin

Combiner for merge data

fit_(X, y=None, method='chi', empty_separate=False, **kwargs)[source]

fit combiner

Parameters:
  • X (DataFrame|array-like) – features to be combined
  • y (str|array-like) – target data or name of target in X
  • method (str) – the strategy to be used to merge X, same as .merge, default is chi
  • n_bins (int) – counts of bins will be combined
  • empty_separate (bool) – if need to combine empty values into a separate group
transform_(rule, X, labels=False, ellipsis=16, **kwargs)[source]

transform X by combiner

Parameters:
  • X (DataFrame|array-like) – features to be transformed
  • labels (bool) – if need to use labels for resulting bins, False by default
  • ellipsis (int) – max length threshold that labels will not be ellipsis, None for skipping ellipsis
Returns:

array-like

set_rules(map, reset=False)[source]

set rules for combiner

Parameters:
  • map (dict|array-like) – map of splits
  • reset (bool) – if need to reset combiner
Returns:

self

ELSE_GROUP = 'else'
EMPTY_BIN = -1
NUMBER_EXP = re.compile('\\[(-inf|-?\\d+(.\\d+)?)\\s*[~-]\\s*(inf|-?\\d+(.\\d+)?)\\)')
default_rule()[source]
export(**kwargs)[source]

export rules to dict or a json file

Parameters:to_json (str|IOBase) – json file to save rules
Returns:dictionary of rules
Return type:dict
fit()

fit method, see details in fit_ method

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

classmethod format_bins(bins, index=False, ellipsis=None)[source]

format bins to label

Parameters:
  • bins (ndarray) – bins to format
  • index (bool) – if need index prefix
  • ellipsis (int) – max length threshold that labels will not be ellipsis, None for skipping ellipsis
Returns:

array of labels

Return type:

ndarray

load(rules, update=False, **kwargs)[source]

load rules from dict or json file

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
  • update (bool) – if need to use updating instead of replacing rules
classmethod parse_bins(bins)[source]

parse labeled bins to array

rules
transform(X, *args, **kwargs)[source]

transform method, see details in transform_ method

update(*args, **kwargs)[source]

update rules

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
class toad.transform.GBDTTransformer[source]

Bases: toad.transform.Transformer

GBDT transformer

fit_(X, y, **kwargs)[source]

fit GBDT transformer

Parameters:
  • X (DataFrame|array-like) –
  • y (str|array-like) –
  • select_dtypes (str|numpy.dtypes) – ‘object’, ‘number’ etc. only selected dtypes will be transform,
transform_(rules, X)[source]

transform woe

Parameters:X (DataFrame|array-like) –
Returns:array-like
default_rule()[source]
export(**kwargs)[source]

export rules to dict or a json file

Parameters:to_json (str|IOBase) – json file to save rules
Returns:dictionary of rules
Return type:dict
fit()

fit method, see details in fit_ method

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Input samples.
  • y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
  • **fit_params (dict) – Additional fit parameters.
Returns:

X_new – Transformed array.

Return type:

ndarray array of shape (n_samples, n_features_new)

load(rules, update=False, **kwargs)[source]

load rules from dict or json file

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules
  • update (bool) – if need to use updating instead of replacing rules
rules
transform(X, *args, **kwargs)[source]

transform method, see details in transform_ method

update(*args, **kwargs)[source]

update rules

Parameters:
  • rules (dict) – dictionary of rules
  • from_json (str|IOBase) – json file of rules