toad.selection module¶
-
class
toad.selection.
StatsModel
(estimator='ols', criterion='aic', intercept=False)[source]¶ Bases:
object
-
toad.selection.
stepwise
(frame, target='target', estimator='ols', direction='both', criterion='aic', p_enter=0.01, p_remove=0.01, p_value_enter=0.2, intercept=False, max_iter=None, return_drop=False, exclude=None)[source]¶ stepwise to select features
Parameters: - frame (DataFrame) – dataframe that will be use to select
- target (str) – target name in frame
- estimator (str) – model to use for stats
- direction (str) – direction of stepwise, support ‘forward’, ‘backward’ and ‘both’, suggest ‘both’
- criterion (str) – criterion to statistic model, support ‘aic’, ‘bic’
- p_enter (float) – threshold that will be used in ‘forward’ and ‘both’ to keep features
- p_remove (float) – threshold that will be used in ‘backward’ to remove features
- intercept (bool) – if have intercept
- p_value_enter (float) – threshold that will be used in ‘both’ to remove features
- max_iter (int) – maximum number of iterate
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_empty
(frame, threshold=0.9, nan=None, return_drop=False, exclude=None)[source]¶ drop columns by empty
Parameters: - frame (DataFrame) – dataframe that will be used
- threshold (number) – drop the features whose empty num is greater than threshold. if threshold is float, it will be use as percentage
- nan (any) – values will be look like empty
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_var
(frame, threshold=0, return_drop=False, exclude=None)[source]¶ drop columns by variance
Parameters: - frame (DataFrame) – dataframe that will be used
- threshold (float) – drop features whose variance is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_corr
(frame, target=None, threshold=0.7, by='IV', return_drop=False, exclude=None)[source]¶ drop columns by correlation
Parameters: - frame (DataFrame) – dataframe that will be used
- target (str) – target name in dataframe
- threshold (float) – drop features that has the smallest weight in each groups whose correlation is greater than threshold
- by (array-like) – weight of features that will be used to drop the features
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_iv
(frame, target='target', threshold=0.02, return_drop=False, return_iv=False, exclude=None)[source]¶ drop columns by IV
Parameters: - frame (DataFrame) – dataframe that will be used
- target (str) – target name in dataframe
- threshold (float) – drop the features whose IV is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- return_iv (bool) – if need to return features’ IV
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped Series: list of features’ IV
Return type: DataFrame
-
toad.selection.
drop_vif
(frame, threshold=3, return_drop=False, exclude=None)[source]¶ variance inflation factor
Parameters: - frame (DataFrame) –
- threshold (float) – drop features until all vif is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
select
(frame, target='target', empty=0.9, iv=0.02, corr=0.7, return_drop=False, exclude=None)[source]¶ select features by rate of empty, iv and correlation
Parameters: - frame (DataFrame) –
- target (str) – target’s name in dataframe
- empty (number) – drop the features which empty num is greater than threshold. if threshold is float, it will be use as percentage
- iv (float) – drop the features whose IV is less than threshold
- corr (float) – drop features that has the smallest IV in each groups which correlation is greater than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature name that will not be dropped
Returns: selected dataframe dict: list of dropped feature names in each step
Return type: DataFrame