toad.merge module

toad.merge.ChiMerge()

Chi-Merge

Parameters:
  • feature (array-like) – feature to be merged
  • target (array-like) – a array of target classes
  • n_bins (int) – n bins will be merged into
  • min_samples (number) – min sample in each group, if float, it will be the percentage of samples
  • min_threshold (number) – min threshold of chi-square
Returns:

array of split points

Return type:

array

toad.merge.DTMerge()

Merge by Decision Tree

Parameters:
  • feature (array-like) –
  • target (array-like) – target will be used to fit decision tree
  • nan (number) – value will be used to fill nan
  • n_bins (int) – n groups that will be merged into
  • min_samples (int) – min number of samples in each leaf nodes
Returns:

array of split points

Return type:

array

toad.merge.KMeansMerge()

Merge by KMeans

Parameters:
  • feature (array-like) –
  • target (array-like) – target will be used to fit kmeans model
  • nan (number) – value will be used to fill nan
  • n_bins (int) – n groups that will be merged into
  • random_state (int) – random state will be used for kmeans model
Returns:

split points of feature

Return type:

array

toad.merge.QuantileMerge()

Merge by quantile

Parameters:
  • feature (array-like) –
  • nan (number) – value will be used to fill nan
  • n_bins (int) – n groups that will be merged into
  • q (array-like) – list of percentage split points
Returns:

split points of feature

Return type:

array

toad.merge.StepMerge()

Merge by step

Parameters:
  • feature (array-like) –
  • nan (number) – value will be used to fill nan
  • n_bins (int) – n groups that will be merged into
  • clip_v (number | tuple) – min/max value of clipping
  • clip_std (number | tuple) – min/max std of clipping
  • clip_q (number | tuple) – min/max quantile of clipping
Returns:

split points of feature

Return type:

array

toad.merge.merge

merge feature into groups

Parameters:
  • feature (array-like) –
  • target (array-like) –
  • method (str) – ‘dt’, ‘chi’, ‘quantile’, ‘step’, ‘kmeans’ - the strategy to be used to merge feature
  • return_splits (bool) – if needs to return splits
  • n_bins (int) – n groups that will be merged into
Returns:

a array of merged label with the same size of feature array: list of split points

Return type:

array