toad.preprocessing.process module

class toad.preprocessing.process.Processing(data)[source]

Bases: object

Examples:

>>> (Processing(data)
...     .groupby('id')
...     .partitionby(TimePartition(
...         'base_time',
...         'filter_time',
...         ['30d', '60d', '180d', '365d', 'all']
...     ))
...     .apply({'A': ['max', 'min', 'mean']})
...     .apply({'B': ['max', 'min', 'mean']})
...     .apply({'C': 'nunique'})
...     .apply({'D': {
...         'f': len,
...         'name': 'normal_count',
...         'mask':  Mask('D').isin(['normal']),
...     }})
...     .apply({'id': 'count'})
...     .exec()
... )
__init__(data)[source]

Initialize self. See help(type(self)) for accurate signature.

groupby(name)[source]

group data by name

Parameters:name (str) – column name in data
apply(f)[source]

apply functions to data

Parameters:f (dict|function) – a config dict that keys are the column names and values are the functions, it will take the column series as the functions argument. if f is a function, it will take the whole dataframe as the argument.
partitionby(p)[source]

partition data to multiple pieces, processing will process to all the pieces

Parameters:p (Partition) –
class toad.preprocessing.process.Mask(column=None)[source]

Bases: object

a placeholder to select dataframe

__init__(column=None)[source]

Initialize self. See help(type(self)) for accurate signature.

class toad.preprocessing.process.F(f, name=None, mask=None)[source]

Bases: object

function class for processing

__init__(f, name=None, mask=None)[source]

Initialize self. See help(type(self)) for accurate signature.