toad.selection module¶
- toad.selection.stepwise(frame, target='target', estimator='ols', direction='both', criterion='aic', p_enter=0.01, p_remove=0.01, p_value_enter=0.2, intercept=False, max_iter=None, return_drop=False, exclude=None)[source]¶
stepwise to select features
- Parameters
frame (DataFrame) – dataframe that will be use to select
target (str) – target name in frame
estimator (str) – model to use for stats
direction (str) – direction of stepwise, support ‘forward’, ‘backward’ and ‘both’, suggest ‘both’
criterion (str) – criterion to statistic model, support ‘aic’, ‘bic’
p_enter (float) – threshold that will be used in ‘forward’ and ‘both’ to keep features
p_remove (float) – threshold that will be used in ‘backward’ to remove features
intercept (bool) – if have intercept
p_value_enter (float) – threshold that will be used in ‘both’ to remove features
max_iter (int) – maximum number of iterate
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped
- Return type
DataFrame
- toad.selection.drop_empty(frame, threshold=0.9, nan=None, return_drop=False, exclude=None)[source]¶
drop columns by empty
- Parameters
frame (DataFrame) – dataframe that will be used
threshold (number) – drop the features whose empty num is greater than threshold. if threshold is float, it will be use as percentage
nan (any) – values will be look like empty
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped
- Return type
DataFrame
- toad.selection.drop_var(frame, threshold=0, return_drop=False, exclude=None)[source]¶
drop columns by variance
- Parameters
frame (DataFrame) – dataframe that will be used
threshold (float) – drop features whose variance is less than threshold
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped
- Return type
DataFrame
- toad.selection.drop_corr(frame, target=None, threshold=0.7, by='IV', return_drop=False, exclude=None)[source]¶
drop columns by correlation
- Parameters
frame (DataFrame) – dataframe that will be used
target (str) – target name in dataframe
threshold (float) – drop features that has the smallest weight in each groups whose correlation is greater than threshold
by (array-like) – weight of features that will be used to drop the features
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped
- Return type
DataFrame
- toad.selection.drop_iv(frame, target='target', threshold=0.02, return_drop=False, return_iv=False, exclude=None)[source]¶
drop columns by IV
- Parameters
frame (DataFrame) – dataframe that will be used
target (str) – target name in dataframe
threshold (float) – drop the features whose IV is less than threshold
return_drop (bool) – if need to return features’ name who has been dropped
return_iv (bool) – if need to return features’ IV
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped Series: list of features’ IV
- Return type
DataFrame
- toad.selection.drop_vif(frame, threshold=3, return_drop=False, exclude=None)[source]¶
variance inflation factor
- Parameters
frame (DataFrame) –
threshold (float) – drop features until all vif is less than threshold
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature names that will not be dropped
- Returns
selected dataframe array: list of feature names that has been dropped
- Return type
DataFrame
- toad.selection.select(frame, target='target', empty=0.9, iv=0.02, corr=0.7, return_drop=False, exclude=None)[source]¶
select features by rate of empty, iv and correlation
- Parameters
frame (DataFrame) –
target (str) – target’s name in dataframe
empty (number) – drop the features which empty num is greater than threshold. if threshold is less than 1, it will be use as percentage
iv (float) – drop the features whose IV is less than threshold
corr (float) – drop features that has the smallest IV in each groups which correlation is greater than threshold
return_drop (bool) – if need to return features’ name who has been dropped
exclude (array-like) – list of feature name that will not be dropped
- Returns
selected dataframe dict: list of dropped feature names in each step
- Return type
DataFrame