Quick Start

Setting things up

Target type and Carver selection

Depending on one’s desired modelling task, several Carvers are implemented:

Binary Classification

Multilclass Classification

Continuous Regression

In the following quick start example, we will consider a binary classification problem:

target = "binary_target"

Hence the use of BinaryCarver and ClassificationSelector in following code blocks.

Data Sampling

AutoCarver unables testing for robustness of carved modalities on X_dev while maximizing the association between X_train and y_train.

# defining training and testing sets
train_set = ...  # used to fit the AutoCarver and the model
dev_set = ...  # used to validate the AutoCarver's buckets and optimize the model's parameters/hyperparameters
test_set = ...  # used to evaluate the final model's performances

Picking up columns to Carve

quantitative_features = ['Quantitative', 'Discrete_Quantitative_highnan', 'Discrete_Quantitative_lownan', 'Discrete_Quantitative', 'Discrete_Quantitative_rarevalue']
qualitative_features = ["Qualitative", "Qualitative_grouped", "Qualitative_lownan", "Qualitative_highnan", "Discrete_Qualitative_noorder", "Discrete_Qualitative_lownan_noorder", "Discrete_Qualitative_rarevalue_noorder"]

Qualitative features will automatically be converted to str if necessary. Ordinal features can be added, alongside there expected ordering. See Credit Scoring Example.

Using AutoCarver

Fitting AutoCarver

from AutoCarver import BinaryCarver

# intiating AutoCarver
auto_carver = BinaryCarver(
    quantitative_features=quantitative_features,
    qualitative_features=qualitative_features,
    min_freq=0.02,  # minimum frequency per modality
    max_n_mod=5,  # maximum number of modality per Carved feature
    sort_by='tschuprowt',  # measure used to select the best combination of modalities
    verbose=True,  # showing statistics
)

# fitting on training sample, a dev sample can be specified to evaluate carving robustness
x_discretized = auto_carver.fit_transform(train_set, train_set[target], X_dev=dev_set, y_dev=dev_set[target])

Applying AutoCarver

# transforming dev/test sample accordingly
dev_set_discretized = auto_carver.transform(dev_set)
test_set_discretized = auto_carver.transform(tes_set)

Saving AutoCarver

All Carvers can safely be stored as a .json file.

import json

# storing as json file
with open('my_carver.json', 'w') as my_carver_json:
    json.dump(auto_carver.to_json(), my_carver_json)

Loading AutoCarver

The AutoCarver can safely be loaded from a .json file.

import json

from AutoCarver import load_carver

# loading json file
with open('my_carver.json', 'r') as my_carver_json:
    auto_carver = load_carver(json.load(my_carver_json))

Feature Selection

from AutoCarver.selectors import ClassificationSelector

# select the best 25 most target associated qualitative features
feature_selector = ClassificationSelector(
    qualitative_features=features,  # features to select from
    n_best=25,  # number of features to select
    verbose=True,  # displays statistics
)
best_features = feature_selector.select(train_set_discretized, train_set_discretized[target])

In-depth examples

See Credit Scoring Example.