toad.stats module¶
-
toad.stats.
gini
(target)[source]¶ get gini index of a feature
Parameters: target (array-like) – list of target that will be calculate gini Returns: gini value Return type: number
-
toad.stats.
gini_cond
¶ get conditional gini index of a feature
Parameters: - feature (array-like) –
- target (array-like) –
Returns: conditional gini value. If feature is continuous, it will return the best gini value when the feature bins into two groups
Return type: number
-
toad.stats.
entropy
(target)[source]¶ get infomation entropy of a feature
Parameters: target (array-like) – Returns: information entropy Return type: number
-
toad.stats.
entropy_cond
¶ get conditional entropy of a feature
Parameters: - feature (array-like) –
- target (array-like) –
Returns: conditional information entropy. If feature is continuous, it will return the best entropy when the feature bins into two groups
Return type: number
-
toad.stats.
WOE
(y_prob, n_prob)[source]¶ get WOE of a group
Parameters: - y_prob – the probability of grouped y in total y
- n_prob – the probability of grouped n in total n
Returns: woe value
Return type: number
-
toad.stats.
IV
¶ get the IV of a feature
Parameters: - feature (array-like) –
- target (array-like) –
- n_bins (int) – n groups that the feature will bin into
- method (str) – the strategy to be used to merge feature, default is ‘dt’
- () (**kwargs) – other options for merge function
-
toad.stats.
badrate
(target)[source]¶ calculate badrate
Parameters: target (array-like) – target array which 1 is bad Returns: float
-
toad.stats.
column_quality
(feature, target, name='feature', iv_only=False, **kwargs)[source]¶ calculate quality of a feature
Parameters: - feature (array-like) –
- target (array-like) –
- name (str) – feature’s name that will be setted in the returned Series
- iv_only (bool) – if only calculate IV
Returns: a list of quality with the feature’s name
Return type: Series
-
toad.stats.
quality
(dataframe, target='target', iv_only=False, **kwargs)[source]¶ get quality of features in data
Parameters: - dataframe (DataFrame) – dataframe that will be calculate quality
- target (str) – the target’s name in dataframe
- iv_only (bool) – if only calculate IV
Returns: quality of features with the features’ name as row name
Return type: DataFrame