toad.merge module

toad.merge.ChiMerge(feature, target, n_bins=None, min_samples=None, min_threshold=None, nan=-1, balance=True)

Chi-Merge

Parameters
  • feature (array-like) – feature to be merged

  • target (array-like) – a array of target classes

  • n_bins (int) – n bins will be merged into

  • min_samples (number) – min sample in each group, if float, it will be the percentage of samples

  • min_threshold (number) – min threshold of chi-square

Returns

array of split points

Return type

array

toad.merge.DTMerge(feature, target, nan=-1, n_bins=None, min_samples=1, **kwargs)

Merge by Decision Tree

Parameters
  • feature (array-like) –

  • target (array-like) – target will be used to fit decision tree

  • nan (number) – value will be used to fill nan

  • n_bins (int) – n groups that will be merged into

  • min_samples (int) – min number of samples in each leaf nodes

Returns

array of split points

Return type

array

toad.merge.KMeansMerge(feature, target=None, nan=-1, n_bins=None, random_state=1)

Merge by KMeans

Parameters
  • feature (array-like) –

  • target (array-like) – target will be used to fit kmeans model

  • nan (number) – value will be used to fill nan

  • n_bins (int) – n groups that will be merged into

  • random_state (int) – random state will be used for kmeans model

Returns

split points of feature

Return type

array

toad.merge.QuantileMerge(feature, nan=-1, n_bins=None, q=None)

Merge by quantile

Parameters
  • feature (array-like) –

  • nan (number) – value will be used to fill nan

  • n_bins (int) – n groups that will be merged into

  • q (array-like) – list of percentage split points

Returns

split points of feature

Return type

array

toad.merge.StepMerge(feature, nan=None, n_bins=None, clip_v=None, clip_std=None, clip_q=None)

Merge by step

Parameters
  • feature (array-like) –

  • nan (number) – value will be used to fill nan

  • n_bins (int) – n groups that will be merged into

  • clip_v (number | tuple) – min/max value of clipping

  • clip_std (number | tuple) – min/max std of clipping

  • clip_q (number | tuple) – min/max quantile of clipping

Returns

split points of feature

Return type

array

toad.merge.merge(feature, target=None, method='dt', return_splits=False, **kwargs)

merge feature into groups

Parameters
  • feature (array-like) –

  • target (array-like) –

  • method (str) – ‘dt’, ‘chi’, ‘quantile’, ‘step’, ‘kmeans’ - the strategy to be used to merge feature

  • return_splits (bool) – if needs to return splits

  • n_bins (int) – n groups that will be merged into

Returns

a array of merged label with the same size of feature array: list of split points

Return type

array