toad.preprocessing.process module

class toad.preprocessing.process.Processing(data)[source]

Bases: object

Example:

>>> (Processing(data)
...     .groupby('id')
...     .partitionby(TimePartition(
...         'base_time',
...         'filter_time',
...         ['30d', '60d', '180d', '365d', 'all']
...     ))
...     .apply({'A': ['max', 'min', 'mean']})
...     .apply({'B': ['max', 'min', 'mean']})
...     .apply({'C': 'nunique'})
...     .apply({'D': {
...         'f': len,
...         'name': 'normal_count',
...         'mask':  Mask('D').isin(['normal']),
...     }})
...     .apply({'id': 'count'})
...     .exec()
... )
groupby(name)[source]

group data by name

Parameters:name (str) – column name in data
apply(f)[source]

apply functions to data

Parameters:f (dict|function) – a config dict that keys are the column names and values are the functions, it will take the column series as the functions argument. if f is a function, it will take the whole dataframe as the argument.
append_func(col, func)[source]
partitionby(p)[source]

partition data to multiple pieces, processing will process to all the pieces

Parameters:p (Partition) –
exec()[source]
process(data)[source]
class toad.preprocessing.process.Mask(column=None)[source]

Bases: object

a placeholder to select dataframe

push(op, value)[source]
replay(data)[source]
isin(other)[source]
isna()[source]
class toad.preprocessing.process.F(f, name=None, mask=None)[source]

Bases: object

function class for processing

name
is_buildin
need_filter
filter(data)[source]