toad.stats module

toad.stats.gini(target)[source]

get gini index of a feature

Parameters

target (array-like) – list of target that will be calculate gini

Returns

gini value

Return type

number

toad.stats.gini_cond(feature, target)[source]

get conditional gini index of a feature

Parameters
  • feature (array-like) –

  • target (array-like) –

Returns

conditional gini value. If feature is continuous, it will return the best gini value when the feature bins into two groups

Return type

number

toad.stats.entropy(target)[source]

get infomation entropy of a feature

Parameters

target (array-like) –

Returns

information entropy

Return type

number

toad.stats.entropy_cond(feature, target)[source]

get conditional entropy of a feature

Parameters
  • feature (array-like) –

  • target (array-like) –

Returns

conditional information entropy. If feature is continuous, it will return the best entropy when the feature bins into two groups

Return type

number

toad.stats.probability(target, mask=None)[source]

get probability of target by mask

toad.stats.WOE(y_prob, n_prob)[source]

get WOE of a group

Parameters
  • y_prob – the probability of grouped y in total y

  • n_prob – the probability of grouped n in total n

Returns

woe value

Return type

number

toad.stats.IV(feature, target, return_sub=False, **kwargs)[source]

get the IV of a feature

Parameters
  • feature (array-like) –

  • target (array-like) –

  • return_sub (bool) – if need return IV of each groups

  • n_bins (int) – n groups that the feature will bin into

  • method (str) – the strategy to be used to merge feature, default is ‘dt’

  • () (**kwargs) – other options for merge function

toad.stats.badrate(target)[source]

calculate badrate

Parameters

target (array-like) – target array which 1 is bad

Returns

float

toad.stats.VIF(frame)[source]

calculate vif

Parameters

frame (ndarray|DataFrame) –

Returns

Series

class toad.stats.indicator(*args, is_class=False, **kwargs)[source]

Bases: Decorator

indicator decorator

toad.stats.column_quality(feature, target, name='feature', indicators=[], need_merge=False, **kwargs)[source]

calculate quality of a feature

Parameters
  • feature (array-like) –

  • target (array-like) –

  • name (str) – feature’s name that will be setted in the returned Series

  • indicators (list) – list of indicator functions

  • need_merge (bool) – if need merge feature

Returns

a list of quality with the feature’s name

Return type

Series

toad.stats.quality(dataframe, target='target', cpu_cores=0, iv_only=False, indicators=['iv', 'gini', 'entropy', 'unique'], **kwargs)[source]

get quality of features in data

Parameters
  • dataframe (DataFrame) – dataframe that will be calculate quality

  • target (str) – the target’s name in dataframe

  • iv_only (bool) – deprecated. if only calculate IV

  • indicators (list) – indictors will be calc, it can be customized indictor functions, default is [‘iv’, ‘gini’, ‘entropy’, ‘unique’]

  • cpu_cores (int) – the maximun number of CPU cores will be used, 0 means all CPUs will be used, -1 means all CPUs but one will be used.

Returns

quality of features with the features’ name as row name

Return type

DataFrame

toad.stats.feature_bin_stats(df_bin, feature, target)[source]

calculate the detail info of a feature after bin

Parameters
  • df_bin (dataframe has featute and target columns) –

  • feature (str) –

  • target (str) –

Returns

contains good, bad, badrate, prop, y_prop, n_prop, woe, iv

Return type

DataFrame