Quick Start

Setting things up

Target type and Carver selection

Depending on one’s desired modelling task, several Carvers are implemented:

In the following quick start example, we will consider a binary classification problem:

target = "binary_target"

Hence the use of BinaryCarver and ClassificationSelector in following code blocks.

Data Sampling

AutoCarver unables testing for robustness of carved modalities on X_dev while maximizing the association between X_train and y_train.

# defining training and testing sets
train_set = ...  # used to fit the AutoCarver and the model
dev_set = ...  # used to validate the AutoCarver's buckets and optimize the model's parameters/hyperparameters
test_set = ...  # used to evaluate the final model's performances

Setting up Features to Carve

from AutoCarver import Features

features = Features(
    quantitatives=['quantitative1', 'quantitative2', 'discrete1', 'discrete2_with_nan'],
    categoricals=['categorical1', 'categorical2', 'categorical3_with_nan'],
    ordinals={'ordinal1': ['low', 'medium', 'high'], 'ordinal2_with_nan': ['low', 'medium', 'high']},
)

Qualitative features will automatically be converted to str if necessary. Ordinal features are added, alongside there expected ordering.

To wrap already-instantiated feature objects (e.g. CategoricalFeature, OrdinalFeature, QuantitativeFeature) use Features.from_list() instead. Collection-level state (nan / default / ordinal_encoding / dropna) can be propagated to every feature via a FeaturesConfig.

Using AutoCarver

Fitting AutoCarver

from AutoCarver import BinaryCarver

# intiating AutoCarver
binary_carver = BinaryCarver(
    features=features,
    min_freq=0.02,  # minimum frequency per modality
    max_n_mod=5,  # maximum number of modality per Carved feature (mandatory)
)

# fitting on training sample, a dev sample can be specified to evaluate carving robustness
x_discretized = binary_carver.fit_transform(train_set, train_set[target], X_dev=dev_set, y_dev=dev_set[target])

Note

Behavioral toggles (copy, ordinal_encoding, dropna, verbose, n_jobs) are now grouped in DiscretizerConfig. Carvers default to DiscretizerConfig(dropna=True, ordinal_encoding=True).

To pick a different association metric, pass a pre-built combination evaluator via the combination_evaluator keyword (e.g. CramervCombinations for binary).

Applying AutoCarver

# transforming dev/test sample accordingly
dev_set_discretized = binary_carver.transform(dev_set)
test_set_discretized = binary_carver.transform(tes_set)

Saving AutoCarver

All Carvers can safely be serialized as a .json file.

binary_carver.save('my_carver.json')

Loading AutoCarver

Carvers can safely be loaded from a .json file.

from AutoCarver import BinaryCarver

binary_carver = BinaryCarver.load('my_carver.json')

Feature Selection

from AutoCarver.selectors import ClassificationSelector

# select the best 25 most target associated features
classification_selector = ClassificationSelector(
    features=features,  # features to select from
    n_best_per_type=25,  # number of features to select per data type
    verbose=True,  # displays statistics
)
best_features = classification_selector.select(train_set_discretized, train_set_discretized[target])