toad.merge module¶
-
toad.merge.
ChiMerge
()¶ Chi-Merge
Parameters: - feature (array-like) – feature to be merged
- target (array-like) – a array of target classes
- n_bins (int) – n bins will be merged into
- min_samples (number) – min sample in each group, if float, it will be the percentage of samples
- min_threshold (number) – min threshold of chi-square
Returns: array of split points
Return type: array
-
toad.merge.
DTMerge
()¶ Merge by Decision Tree
Parameters: - feature (array-like) –
- target (array-like) – target will be used to fit decision tree
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- min_samples (int) – min number of samples in each leaf nodes
Returns: array of split points
Return type: array
-
toad.merge.
KMeansMerge
()¶ Merge by KMeans
Parameters: - feature (array-like) –
- target (array-like) – target will be used to fit kmeans model
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- random_state (int) – random state will be used for kmeans model
Returns: split points of feature
Return type: array
-
toad.merge.
QuantileMerge
()¶ Merge by quantile
Parameters: - feature (array-like) –
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- q (array-like) – list of percentage split points
Returns: split points of feature
Return type: array
-
toad.merge.
StepMerge
()¶ Merge by step
Parameters: - feature (array-like) –
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- clip_v (number | tuple) – min/max value of clipping
- clip_std (number | tuple) – min/max std of clipping
- clip_q (number | tuple) – min/max quantile of clipping
Returns: split points of feature
Return type: array
-
toad.merge.
merge
¶ merge feature into groups
Parameters: - feature (array-like) –
- target (array-like) –
- method (str) – ‘dt’, ‘chi’, ‘quantile’, ‘step’, ‘kmeans’ - the strategy to be used to merge feature
- return_splits (bool) – if needs to return splits
- n_bins (int) – n groups that will be merged into
Returns: a array of merged label with the same size of feature array: list of split points
Return type: array