toad.stats module

toad.stats.gini(target)[source]

get gini index of a feature

Parameters:target (array-like) – list of target that will be calculate gini
Returns:gini value
Return type:number
toad.stats.gini_cond

get conditional gini index of a feature

Parameters:
  • feature (array-like) –
  • target (array-like) –
Returns:

conditional gini value. If feature is continuous, it will return the best gini value when the feature bins into two groups

Return type:

number

toad.stats.entropy(target)[source]

get infomation entropy of a feature

Parameters:target (array-like) –
Returns:information entropy
Return type:number
toad.stats.entropy_cond

get conditional entropy of a feature

Parameters:
  • feature (array-like) –
  • target (array-like) –
Returns:

conditional information entropy. If feature is continuous, it will return the best entropy when the feature bins into two groups

Return type:

number

toad.stats.probability(target, mask=None)[source]

get probability of target by mask

toad.stats.WOE(y_prob, n_prob)[source]

get WOE of a group

Parameters:
  • y_prob – the probability of grouped y in total y
  • n_prob – the probability of grouped n in total n
Returns:

woe value

Return type:

number

toad.stats.IV

get the IV of a feature

Parameters:
  • feature (array-like) –
  • target (array-like) –
  • return_sub (bool) – if need return IV of each groups
  • n_bins (int) – n groups that the feature will bin into
  • method (str) – the strategy to be used to merge feature, default is ‘dt’
  • () (**kwargs) – other options for merge function
toad.stats.badrate(target)[source]

calculate badrate

Parameters:target (array-like) – target array which 1 is bad
Returns:float
toad.stats.VIF(frame)[source]

calculate vif

Parameters:frame (ndarray|DataFrame) –
Returns:Series
class toad.stats.indicator(*args, is_class=False, **kwargs)[source]

Bases: toad.utils.decorator.Decorator

indicator decorator

name = 'indicator'
need_merge = False
dtype = None
wrapper(*args, **kwargs)[source]
toad.stats.column_quality(feature, target, name='feature', indicators=[], need_merge=False, **kwargs)[source]

calculate quality of a feature

Parameters:
  • feature (array-like) –
  • target (array-like) –
  • name (str) – feature’s name that will be setted in the returned Series
  • indicators (list) – list of indicator functions
  • need_merge (bool) – if need merge feature
Returns:

a list of quality with the feature’s name

Return type:

Series

toad.stats.quality(dataframe, target='target', cpu_cores=0, iv_only=False, indicators=['iv', 'gini', 'entropy', 'unique'], **kwargs)[source]

get quality of features in data

Parameters:
  • dataframe (DataFrame) – dataframe that will be calculate quality
  • target (str) – the target’s name in dataframe
  • iv_only (bool) – deprecated. if only calculate IV
  • cpu_cores (int) – the maximun number of CPU cores will be used, 0 means all CPUs will be used, -1 means all CPUs but one will be used.
Returns:

quality of features with the features’ name as row name

Return type:

DataFrame