Welcome to toad’s documentation!¶
Installation¶
via pip
pip install toad
via anaconda
conda install toad --channel conda-forge
via source code
python setup.py install
Contents¶
toad package¶
Submodules¶
toad.detector module¶
Command line tools for detecting csv data
Team: ESC
Examples
python detector.py -i xxx.csv -o report.csv
-
toad.detector.
getTopValues
(series, top=5, reverse=False)[source]¶ Get top/bottom n values
Parameters: - series (Series) – data series
- top (number) – number of top/bottom n values
- reverse (bool) – it will return bottom n values if True is given
Returns: Series of top/bottom n values and percentage. [‘value:percent’, None]
Return type: Series
-
toad.detector.
getDescribe
(series, percentiles=[0.25, 0.5, 0.75])[source]¶ Get describe of series
Parameters: - series (Series) – data series
- percentiles – the percentiles to include in the output
Returns: the describe of data include mean, std, min, max and percentiles
Return type: Series
-
toad.detector.
countBlank
(series, blanks=[None])[source]¶ Count number and percentage of blank values in series
Parameters: - series (Series) – data series
- blanks (list) – list of blank values
Returns: number of blanks str: the percentage of blank values
Return type: number
toad.merge module¶
-
toad.merge.
ChiMerge
()¶ Chi-Merge
Parameters: - feature (array-like) – feature to be merged
- target (array-like) – a array of target classes
- n_bins (int) – n bins will be merged into
- min_samples (number) – min sample in each group, if float, it will be the percentage of samples
- min_threshold (number) – min threshold of chi-square
Returns: array of split points
Return type: array
-
toad.merge.
DTMerge
()¶ Merge by Decision Tree
Parameters: - feature (array-like) –
- target (array-like) – target will be used to fit decision tree
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- min_samples (int) – min number of samples in each leaf nodes
Returns: array of split points
Return type: array
-
toad.merge.
KMeansMerge
()¶ Merge by KMeans
Parameters: - feature (array-like) –
- target (array-like) – target will be used to fit kmeans model
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- random_state (int) – random state will be used for kmeans model
Returns: split points of feature
Return type: array
-
toad.merge.
QuantileMerge
()¶ Merge by quantile
Parameters: - feature (array-like) –
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- q (array-like) – list of percentage split points
Returns: split points of feature
Return type: array
-
toad.merge.
StepMerge
()¶ Merge by step
Parameters: - feature (array-like) –
- nan (number) – value will be used to fill nan
- n_bins (int) – n groups that will be merged into
- clip_v (number | tuple) – min/max value of clipping
- clip_std (number | tuple) – min/max std of clipping
- clip_q (number | tuple) – min/max quantile of clipping
Returns: split points of feature
Return type: array
-
toad.merge.
merge
¶ merge feature into groups
Parameters: - feature (array-like) –
- target (array-like) –
- method (str) – ‘dt’, ‘chi’, ‘quantile’, ‘step’, ‘kmeans’ - the strategy to be used to merge feature
- return_splits (bool) – if needs to return splits
- n_bins (int) – n groups that will be merged into
Returns: a array of merged label with the same size of feature array: list of split points
Return type: array
toad.metrics module¶
-
toad.metrics.
KS
(score, target)[source]¶ calculate ks value
Parameters: - score (array-like) – list of score or probability that the model predict
- target (array-like) – list of real target
Returns: the max KS value
Return type: float
-
toad.metrics.
KS_bucket
(score, target, bucket=10, method='quantile', return_splits=False, **kwargs)[source]¶ calculate ks value by bucket
Parameters: - score (array-like) – list of score or probability that the model predict
- target (array-like) – list of real target
- bucket (int) – n groups that will bin into
- method (str) – method to bin score. quantile (default), step
- return_splits (bool) – if need to return splits of bucket
Returns: DataFrame
-
toad.metrics.
AIC
(y_pred, y, k, llf=None)[source]¶ Akaike Information Criterion
Parameters: - y_pred (array-like) –
- y (array-like) –
- k (int) – number of featuers
- llf (float) – result of log-likelihood function
-
toad.metrics.
BIC
(y_pred, y, k, llf=None)[source]¶ Bayesian Information Criterion
Parameters: - y_pred (array-like) –
- y (array-like) –
- k (int) – number of featuers
- llf (float) – result of log-likelihood function
-
toad.metrics.
F1
(score, target, split='best', return_split=False)[source]¶ calculate f1 value
Parameters: - score (array-like) –
- target (array-like) –
Returns: best f1 score float: best spliter
Return type: float
-
toad.metrics.
AUC
(score, target, return_curve=False)[source]¶ AUC Score
Parameters: - score (array-like) – list of score or probability that the model predict
- target (array-like) – list of real target
- return_curve (bool) – if need return curve data for ROC plot
Returns: auc score
Return type: float
-
toad.metrics.
PSI
(test, base, combiner=None, return_frame=False)[source]¶ calculate PSI
Parameters: - test (array-like) – data to test PSI
- base (array-like) – base data for calculate PSI
- combiner (Combiner|list|dict) – combiner to combine data
- return_frame (bool) – if need to return frame of proportion
Returns: float|Series
toad.plot module¶
-
toad.plot.
badrate_plot
(frame, x=None, target='target', by=None, freq=None, format=None, return_counts=False, return_proportion=False, return_frame=False)[source]¶ plot for badrate
Parameters: - frame (DataFrame) –
- x (str) – column in frame that will be used as x axis
- target (str) – target column in frame
- by (str) – column in frame that will be calculated badrate by it
- freq (str) – offset aliases string by pandas http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
- format (str) – format string for time
- return_counts (bool) – if need return counts plot
- return_frame (bool) – if need return frame
Returns: badrate plot Axes: counts plot Axes: proportion plot Dataframe: grouping detail data
Return type: Axes
-
toad.plot.
corr_plot
(frame, figure_size=(20, 15))[source]¶ plot for correlation
Parameters: frame (DataFrame) – frame to draw plot Returns: Axes
-
toad.plot.
proportion_plot
(x=None, keys=None)[source]¶ plot for comparing proportion in different dataset
Parameters: - x (Series|list) – series or list of series data for plot
- keys (str|list) – keys for each data
Returns: Axes
-
toad.plot.
roc_plot
(score, target, compare=None)[source]¶ plot for roc
Parameters: - score (array-like) – predicted score
- target (array-like) – true target
- compare (array-like) – another score for comparing with score
Returns: Axes
-
toad.plot.
bin_plot
(frame, x=None, target='target', iv=True, annotate_format='.2f')[source]¶ plot for bins
Parameters: - frame (DataFrame) –
- x (str) – column in frame that will be used as x axis
- target (str) – target column in frame
- iv (bool) – if need to show iv in plot
- annotate_format (str) – format str for axis annotation of chart
Returns: bins’ proportion and badrate plot
Return type: Axes
toad.scorecard module¶
-
class
toad.scorecard.
ScoreCard
(pdo=60, rate=2, base_odds=35, base_score=750, card=None, combiner={}, transer=None, **kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator
,toad.utils.mixin.RulesMixin
,toad.utils.mixin.BinsMixin
-
__init__
(pdo=60, rate=2, base_odds=35, base_score=750, card=None, combiner={}, transer=None, **kwargs)[source]¶ Parameters: - combiner (toad.Combiner) –
- transer (toad.WOETransformer) –
-
coef_
¶ coef of LR model
-
predict
(X, return_sub=False)[source]¶ predict score :param X: X to predict :type X: 2D-DataFrame|dict
Returns: predicted score DataFrame|dict: sub score for each feature Return type: array-like
-
get_reason
(X, base_effect=None, threshold_score=None, keep=3)[source]¶ calculate top-effect-of-features as reasons
Parameters: - X (2D DataFrame) – X to find reason
- base_effect (Series) – base effect score of each feature
- threshold_score (float) – threshold to find top k most important features, show the highest top k features when prediction score > threshold and show the lowest top k when prediction score <= threshold default is the sum of base_effect score
- keep (int) – top k most important reasons to keep, default 3
Returns: top k most important reasons for each feature
Return type: DataFrame
-
predict_proba
(X)[source]¶ predict probability
Parameters: X (2D array-like) – X to predict Returns: probability of all classes Return type: 2d array
-
proba_to_score
(prob)[source]¶ covert probability to score
odds = (1 - prob) / prob score = factor * log(odds) * offset
-
score_to_proba
(score)[source]¶ covert score to probability
Returns: the probability of 1 Return type: array-like|float
-
toad.selection module¶
-
toad.selection.
stepwise
(frame, target='target', estimator='ols', direction='both', criterion='aic', p_enter=0.01, p_remove=0.01, p_value_enter=0.2, intercept=False, max_iter=None, return_drop=False, exclude=None)[source]¶ stepwise to select features
Parameters: - frame (DataFrame) – dataframe that will be use to select
- target (str) – target name in frame
- estimator (str) – model to use for stats
- direction (str) – direction of stepwise, support ‘forward’, ‘backward’ and ‘both’, suggest ‘both’
- criterion (str) – criterion to statistic model, support ‘aic’, ‘bic’
- p_enter (float) – threshold that will be used in ‘forward’ and ‘both’ to keep features
- p_remove (float) – threshold that will be used in ‘backward’ to remove features
- intercept (bool) – if have intercept
- p_value_enter (float) – threshold that will be used in ‘both’ to remove features
- max_iter (int) – maximum number of iterate
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_empty
(frame, threshold=0.9, nan=None, return_drop=False, exclude=None)[source]¶ drop columns by empty
Parameters: - frame (DataFrame) – dataframe that will be used
- threshold (number) – drop the features whose empty num is greater than threshold. if threshold is float, it will be use as percentage
- nan (any) – values will be look like empty
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_var
(frame, threshold=0, return_drop=False, exclude=None)[source]¶ drop columns by variance
Parameters: - frame (DataFrame) – dataframe that will be used
- threshold (float) – drop features whose variance is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_corr
(frame, target=None, threshold=0.7, by='IV', return_drop=False, exclude=None)[source]¶ drop columns by correlation
Parameters: - frame (DataFrame) – dataframe that will be used
- target (str) – target name in dataframe
- threshold (float) – drop features that has the smallest weight in each groups whose correlation is greater than threshold
- by (array-like) – weight of features that will be used to drop the features
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
drop_iv
(frame, target='target', threshold=0.02, return_drop=False, return_iv=False, exclude=None)[source]¶ drop columns by IV
Parameters: - frame (DataFrame) – dataframe that will be used
- target (str) – target name in dataframe
- threshold (float) – drop the features whose IV is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- return_iv (bool) – if need to return features’ IV
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped Series: list of features’ IV
Return type: DataFrame
-
toad.selection.
drop_vif
(frame, threshold=3, return_drop=False, exclude=None)[source]¶ variance inflation factor
Parameters: - frame (DataFrame) –
- threshold (float) – drop features until all vif is less than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature names that will not be dropped
Returns: selected dataframe array: list of feature names that has been dropped
Return type: DataFrame
-
toad.selection.
select
(frame, target='target', empty=0.9, iv=0.02, corr=0.7, return_drop=False, exclude=None)[source]¶ select features by rate of empty, iv and correlation
Parameters: - frame (DataFrame) –
- target (str) – target’s name in dataframe
- empty (number) – drop the features which empty num is greater than threshold. if threshold is less than 1, it will be use as percentage
- iv (float) – drop the features whose IV is less than threshold
- corr (float) – drop features that has the smallest IV in each groups which correlation is greater than threshold
- return_drop (bool) – if need to return features’ name who has been dropped
- exclude (array-like) – list of feature name that will not be dropped
Returns: selected dataframe dict: list of dropped feature names in each step
Return type: DataFrame
toad.stats module¶
-
toad.stats.
gini
(target)[source]¶ get gini index of a feature
Parameters: target (array-like) – list of target that will be calculate gini Returns: gini value Return type: number
-
toad.stats.
gini_cond
[source]¶ get conditional gini index of a feature
Parameters: - feature (array-like) –
- target (array-like) –
Returns: conditional gini value. If feature is continuous, it will return the best gini value when the feature bins into two groups
Return type: number
-
toad.stats.
entropy
(target)[source]¶ get infomation entropy of a feature
Parameters: target (array-like) – Returns: information entropy Return type: number
-
toad.stats.
entropy_cond
[source]¶ get conditional entropy of a feature
Parameters: - feature (array-like) –
- target (array-like) –
Returns: conditional information entropy. If feature is continuous, it will return the best entropy when the feature bins into two groups
Return type: number
-
toad.stats.
WOE
(y_prob, n_prob)[source]¶ get WOE of a group
Parameters: - y_prob – the probability of grouped y in total y
- n_prob – the probability of grouped n in total n
Returns: woe value
Return type: number
-
toad.stats.
IV
[source]¶ get the IV of a feature
Parameters: - feature (array-like) –
- target (array-like) –
- return_sub (bool) – if need return IV of each groups
- n_bins (int) – n groups that the feature will bin into
- method (str) – the strategy to be used to merge feature, default is ‘dt’
- () (**kwargs) – other options for merge function
-
toad.stats.
badrate
(target)[source]¶ calculate badrate
Parameters: target (array-like) – target array which 1 is bad Returns: float
-
class
toad.stats.
indicator
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
indicator decorator
-
toad.stats.
column_quality
(feature, target, name='feature', indicators=[], need_merge=False, **kwargs)[source]¶ calculate quality of a feature
Parameters: - feature (array-like) –
- target (array-like) –
- name (str) – feature’s name that will be setted in the returned Series
- indicators (list) – list of indicator functions
- need_merge (bool) – if need merge feature
Returns: a list of quality with the feature’s name
Return type: Series
-
toad.stats.
quality
(dataframe, target='target', cpu_cores=0, iv_only=False, indicators=['iv', 'gini', 'entropy', 'unique'], **kwargs)[source]¶ get quality of features in data
Parameters: - dataframe (DataFrame) – dataframe that will be calculate quality
- target (str) – the target’s name in dataframe
- iv_only (bool) – deprecated. if only calculate IV
- cpu_cores (int) – the maximun number of CPU cores will be used, 0 means all CPUs will be used, -1 means all CPUs but one will be used.
Returns: quality of features with the features’ name as row name
Return type: DataFrame
toad.transform module¶
-
class
toad.transform.
Transformer
[source]¶ Bases:
sklearn.base.TransformerMixin
,toad.utils.mixin.RulesMixin
Base class for transformers
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
export
(**kwargs)[source]¶ export rules to dict or a json file
Parameters: to_json (str|IOBase) – json file to save rules Returns: dictionary of rules Return type: dict
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
-
class
toad.transform.
WOETransformer
[source]¶ Bases:
toad.transform.Transformer
WOE transformer
-
fit_
(X, y)[source]¶ fit WOE transformer
Parameters: - X (DataFrame|array-like) –
- y (str|array-like) –
- select_dtypes (str|numpy.dtypes) – ‘object’, ‘number’ etc. only selected dtypes will be transform
-
transform_
(rule, X, default='min')[source]¶ transform function for single feature
Parameters: - X (array-like) –
- default (str) – ‘min’(default), ‘max’ - the strategy to be used for unknown group
Returns: array-like
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
export
(**kwargs)[source]¶ export rules to dict or a json file
Parameters: to_json (str|IOBase) – json file to save rules Returns: dictionary of rules Return type: dict
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
-
class
toad.transform.
Combiner
[source]¶ Bases:
toad.transform.Transformer
,toad.utils.mixin.BinsMixin
Combiner for merge data
-
fit_
(X, y=None, method='chi', empty_separate=False, **kwargs)[source]¶ fit combiner
Parameters: - X (DataFrame|array-like) – features to be combined
- y (str|array-like) – target data or name of target in X
- method (str) – the strategy to be used to merge X, same as .merge, default is chi
- n_bins (int) – counts of bins will be combined
- empty_separate (bool) – if need to combine empty values into a separate group
-
transform_
(rule, X, labels=False, ellipsis=16, **kwargs)[source]¶ transform X by combiner
Parameters: - X (DataFrame|array-like) – features to be transformed
- labels (bool) – if need to use labels for resulting bins, False by default
- ellipsis (int) – max length threshold that labels will not be ellipsis, None for skipping ellipsis
Returns: array-like
-
set_rules
(map, reset=False)[source]¶ set rules for combiner
Parameters: - map (dict|array-like) – map of splits
- reset (bool) – if need to reset combiner
Returns: self
-
__init__
¶ Initialize self. See help(type(self)) for accurate signature.
-
export
(**kwargs)[source]¶ export rules to dict or a json file
Parameters: to_json (str|IOBase) – json file to save rules Returns: dictionary of rules Return type: dict
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
classmethod
format_bins
(bins, index=False, ellipsis=None)[source]¶ format bins to label
Parameters: - bins (ndarray) – bins to format
- index (bool) – if need index prefix
- ellipsis (int) – max length threshold that labels will not be ellipsis, None for skipping ellipsis
Returns: array of labels
Return type: ndarray
-
-
class
toad.transform.
GBDTTransformer
[source]¶ Bases:
toad.transform.Transformer
GBDT transformer
-
fit_
(X, y, **kwargs)[source]¶ fit GBDT transformer
Parameters: - X (DataFrame|array-like) –
- y (str|array-like) –
- select_dtypes (str|numpy.dtypes) – ‘object’, ‘number’ etc. only selected dtypes will be transform,
-
transform_
(rules, X)[source]¶ transform woe
Parameters: X (DataFrame|array-like) – Returns: array-like
-
export
(**kwargs)[source]¶ export rules to dict or a json file
Parameters: to_json (str|IOBase) – json file to save rules Returns: dictionary of rules Return type: dict
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (array-like of shape (n_samples, n_features)) – Input samples.
- y (array-like of shape (n_samples,) or (n_samples, n_outputs), default=None) – Target values (None for unsupervised transformations).
- **fit_params (dict) – Additional fit parameters.
Returns: X_new – Transformed array.
Return type: ndarray array of shape (n_samples, n_features_new)
-
toad.preprocessing module¶
toad.preprocessing.process module¶
-
class
toad.preprocessing.process.
Processing
(data)[source]¶ Bases:
object
Examples:
>>> (Processing(data) ... .groupby('id') ... .partitionby(TimePartition( ... 'base_time', ... 'filter_time', ... ['30d', '60d', '180d', '365d', 'all'] ... )) ... .apply({'A': ['max', 'min', 'mean']}) ... .apply({'B': ['max', 'min', 'mean']}) ... .apply({'C': 'nunique'}) ... .apply({'D': { ... 'f': len, ... 'name': 'normal_count', ... 'mask': Mask('D').isin(['normal']), ... }}) ... .apply({'id': 'count'}) ... .exec() ... )
toad.preprocessing.partition module¶
-
class
toad.preprocessing.partition.
TimePartition
(base, filter, times)[source]¶ Bases:
toad.preprocessing.partition.Partition
partition data by time delta
Parameters: - base (str) – column name of base time
- filter (str) – column name of target time to be compared
- times (list) – list of time delta`
Example:
>>> TimePartition('apply_time', 'query_time', ['30d', '90d', 'all'])
toad.nn module¶
toad.nn.module module¶
-
class
toad.nn.module.
Module
[source]¶ Bases:
torch.nn.modules.module.Module
base module for every model
-
device
¶ device of model
-
fit
(loader, trainer=None, optimizer=None, early_stopping=None, **kwargs)[source]¶ train model
Parameters: - loader (DataLoader) – loader for training model
- trainer (Trainer) – trainer for training model
- optimizer (torch.Optimier) – the default optimizer is Adam(lr = 1e-3)
- early_stopping (earlystopping) – the default value is loss_earlystopping, you can set it to False to disable early stopping
- epoch (int) – number of epoch for training loop
- callback (callable) – callable function will be called every epoch
-
evaluate
(loader, trainer=None)[source]¶ evaluate model :param loader: loader for evaluate model :type loader: DataLoader :param trainer: trainer for evaluate model :type trainer: Trainer
-
fit_step
(batch, *args, **kwargs)[source]¶ step for fitting :param batch: batch data from dataloader :type batch: Any
Returns: loss of this step Return type: Tensor
-
-
class
toad.nn.module.
DistModule
(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False, gradient_as_bucket_view=False)[source]¶ Bases:
torch.nn.parallel.distributed.DistributedDataParallel
distributed module class
toad.nn.functional module¶
-
toad.nn.functional.
focal_loss
(input, target, alpha=1.0, gamma=2.0, reduction='mean')[source]¶ focal loss
Parameters: - input (Tensor) – N x C, C is the number of classes
- target (Tensor) – N, each value is the index of classes
- alpha (Variable) – balaced variant of focal loss, range is in [0, 1]
- gamma (float) – focal loss parameter
- reduction (str) – mean, sum, none for reduce the loss of each classes
toad.nn.trainer module¶
-
class
toad.nn.trainer.
callback
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
Examples
>>> @callback ... def savemodel(model): ... model.save("path_to_file") ... ... trainer.train(model, callback = savemodel)
-
class
toad.nn.trainer.
earlystopping
(*args, delta=-0.001, patience=10, skip=0, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
Examples
>>> @earlystopping(delta = 1e-3, patience = 5) ... def auc(history): ... return AUC(history['y_hat'], history['y'])
toad.utils module¶
toad.utils.func module¶
-
toad.utils.func.
iter_df
(dataframe, feature, target, splits)[source]¶ iterate dataframe by split points
Returns: iterator (df, splitter)
-
toad.utils.func.
save_json
(contents, file, indent=4)[source]¶ save json file
Parameters: - contents (dict) – contents to save
- file (str|IOBase) – file to save
-
toad.utils.func.
clip
(series, value=None, std=None, quantile=None)[source]¶ clip series
Parameters: - series (array-like) – series need to be clipped
- value (number | tuple) – min/max value of clipping
- std (number | tuple) – min/max std of clipping
- quantile (number | tuple) – min/max quantile of clipping
-
toad.utils.func.
flatten_columns
(columns, sep='_')[source]¶ flatten multiple columns to 1-dim columns joined with ‘_’
-
toad.utils.func.
bin_to_number
(reg=None)[source]¶ Returns: func(string) -> number Return type: function
-
toad.utils.func.
generate_target
(size, rate=0.5, weight=None, reverse=False)[source]¶ generate target for reject inference
Parameters: - size (int) – size of target
- rate (float) – rate of ‘1’ in target
- weight (array-like) – weight of ‘1’ to generate target
- reverse (bool) – if need reverse weight
Returns: array
toad.utils.decorator module¶
-
class
toad.utils.decorator.
Decorator
(*args, is_class=False, **kwargs)[source]¶ Bases:
object
base decorater class
-
class
toad.utils.decorator.
frame_exclude
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
decorator for exclude columns
-
class
toad.utils.decorator.
select_dtypes
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
decorator for select frame by dtypes
-
class
toad.utils.decorator.
save_to_json
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
support save result to json file
-
class
toad.utils.decorator.
load_from_json
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
support load data from json file
-
class
toad.utils.decorator.
support_dataframe
(*args, is_class=False, **kwargs)[source]¶ Bases:
toad.utils.decorator.Decorator
decorator for supporting dataframe