toad.selection module¶

class toad.selection.StatsModel(estimator='ols', criterion='aic', intercept=False)[source]¶

Bases: object

get_estimator(name)[source]¶

stats(X, y)[source]¶

get_criterion(pre, y, k)[source]¶

t_value(pre, y, X, coef)[source]¶

p_value(t, n)[source]¶

loglikelihood(pre, y, k)[source]¶

toad.selection.stepwise(frame, target='target', estimator='ols', direction='both', criterion='aic', p_enter=0.01, p_remove=0.01, p_value_enter=0.2, intercept=False, max_iter=None, return_drop=False, exclude=None)[source]¶

stepwise to select features

Parameters:	frame (DataFrame) – dataframe that will be use to select target (str) – target name in frame estimator (str) – model to use for stats direction (str) – direction of stepwise, support ‘forward’, ‘backward’ and ‘both’, suggest ‘both’ criterion (str) – criterion to statistic model, support ‘aic’, ‘bic’ p_enter (float) – threshold that will be used in ‘forward’ and ‘both’ to keep features p_remove (float) – threshold that will be used in ‘backward’ to remove features intercept (bool) – if have intercept p_value_enter (float) – threshold that will be used in ‘both’ to remove features max_iter (int) – maximum number of iterate return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped
Return type:	DataFrame

toad.selection.drop_empty(frame, threshold=0.9, nan=None, return_drop=False, exclude=None)[source]¶

drop columns by empty

Parameters:	frame (DataFrame) – dataframe that will be used threshold (number) – drop the features whose empty num is greater than threshold. if threshold is float, it will be use as percentage nan (any) – values will be look like empty return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped
Return type:	DataFrame

toad.selection.drop_var(frame, threshold=0, return_drop=False, exclude=None)[source]¶

drop columns by variance

Parameters:	frame (DataFrame) – dataframe that will be used threshold (float) – drop features whose variance is less than threshold return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped
Return type:	DataFrame

toad.selection.drop_corr(frame, target=None, threshold=0.7, by='IV', return_drop=False, exclude=None)[source]¶

drop columns by correlation

Parameters:	frame (DataFrame) – dataframe that will be used target (str) – target name in dataframe threshold (float) – drop features that has the smallest weight in each groups whose correlation is greater than threshold by (array-like) – weight of features that will be used to drop the features return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped
Return type:	DataFrame

toad.selection.drop_iv(frame, target='target', threshold=0.02, return_drop=False, return_iv=False, exclude=None)[source]¶

drop columns by IV

Parameters:	frame (DataFrame) – dataframe that will be used target (str) – target name in dataframe threshold (float) – drop the features whose IV is less than threshold return_drop (bool) – if need to return features’ name who has been dropped return_iv (bool) – if need to return features’ IV exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped Series: list of features’ IV
Return type:	DataFrame

toad.selection.drop_vif(frame, threshold=3, return_drop=False, exclude=None)[source]¶

variance inflation factor

Parameters:	frame (DataFrame) – threshold (float) – drop features until all vif is less than threshold return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature names that will not be dropped
Returns:	selected dataframe array: list of feature names that has been dropped
Return type:	DataFrame

toad.selection.select(frame, target='target', empty=0.9, iv=0.02, corr=0.7, return_drop=False, exclude=None)[source]¶

select features by rate of empty, iv and correlation

Parameters:	frame (DataFrame) – target (str) – target’s name in dataframe empty (number) – drop the features which empty num is greater than threshold. if threshold is float, it will be use as percentage iv (float) – drop the features whose IV is less than threshold corr (float) – drop features that has the smallest IV in each groups which correlation is greater than threshold return_drop (bool) – if need to return features’ name who has been dropped exclude (array-like) – list of feature name that will not be dropped
Returns:	selected dataframe dict: list of dropped feature names in each step
Return type:	DataFrame

toad.selection module¶

Related Topics

This Page