Benchmark: AutoCarver vs. optbinning vs. KBinsDiscretizer

This notebook runs the three binning libraries side-by-side on two public datasets:

German Credit — binary classification, mixed numeric / categorical features, 1,000 rows.
California Housing — regression, all-numeric features, 20,640 rows.

For each library and dataset, we report:

``fit`` and ``transform`` wall-clock (seconds)
Downstream-model score — AUC for binary, R² for regression — using a linear model (logistic regression / ridge) on the one-hot-encoded bin output
``train`` → ``test`` score drop as a coarse proxy for drift sensitivity

All three libraries see the same train + dev data and are evaluated on the same held-out test. AutoCarver uses the dev sample for its built-in robustness veto; optbinning and KBinsDiscretizer don’t have a dev-set concept and so treat the union of train + dev as one pooled training set — which is the comparison practitioners actually run.

This is not an IV / Tschuprow’s T leaderboard. Those metrics structurally favour the library whose objective they are. The downstream-model score is the metric a real scorecard team would use to pick a binner.

Numbers come from a single run on a single machine with a fixed seed; treat them as illustrative, not as authoritative benchmark figures. Re-run on your own data before drawing conclusions.

Setup

[1]:

import time
import warnings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing, fetch_openml
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.metrics import r2_score, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer

from AutoCarver import BinaryCarver, ContinuousCarver, Features
from AutoCarver.discretizers.utils.base_discretizer import DiscretizerConfig

try:
    from optbinning import ContinuousOptimalBinning, OptimalBinning

    HAS_OPTBINNING = True
except ImportError:
    HAS_OPTBINNING = False
    print('optbinning is not installed \u2014 its rows will be skipped.')

SEED = 42
warnings.filterwarnings('ignore')
plt.rcParams['figure.figsize'] = (10, 3.5)

[2]:

def one_hot(df):
    """Treat every bin label as a categorical level and one-hot encode it.

    Lets a linear downstream model consume any of the three libraries' outputs
    uniformly, without us computing WoE per bin.
    """
    return pd.get_dummies(df.astype(str), drop_first=True).astype(float)


def fit_eval_binary(X_train, X_test, y_train, y_test):
    Xtr = one_hot(X_train)
    Xte = one_hot(X_test).reindex(columns=Xtr.columns, fill_value=0.0)
    model = LogisticRegression(max_iter=1000, random_state=SEED).fit(Xtr, y_train)
    return {
        'train_auc': roc_auc_score(y_train, model.predict_proba(Xtr)[:, 1]),
        'test_auc': roc_auc_score(y_test, model.predict_proba(Xte)[:, 1]),
    }


def fit_eval_regression(X_train, X_test, y_train, y_test):
    Xtr = one_hot(X_train)
    Xte = one_hot(X_test).reindex(columns=Xtr.columns, fill_value=0.0)
    model = Ridge(random_state=SEED).fit(Xtr, y_train)
    return {
        'train_r2': r2_score(y_train, model.predict(Xtr)),
        'test_r2': r2_score(y_test, model.predict(Xte)),
    }


def plot_bars(results_df, score_cols, title):
    fig, axes = plt.subplots(1, len(score_cols), figsize=(4 * len(score_cols), 3.5))
    if len(score_cols) == 1:
        axes = [axes]
    for ax, col in zip(axes, score_cols):
        results_df.plot.bar(x='library', y=col, ax=ax, legend=False, color='#4C72B0')
        ax.set_title(col)
        ax.set_xlabel('')
        ax.tick_params(axis='x', rotation=0)
    fig.suptitle(title)
    fig.tight_layout()
    plt.show()

[3]:

MAX_N_MOD = 5
MIN_FREQ = 0.05

def bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, kind):
    Carver = BinaryCarver if kind == 'binary' else ContinuousCarver
    features = Features(categoricals=categoricals, quantitatives=quantitatives)
    config = DiscretizerConfig(verbose=True)  # showing statistics
    carver = Carver(features=features, min_freq=MIN_FREQ, max_n_mod=MAX_N_MOD, config=config)

    t0 = time.perf_counter()
    X_tr = carver.fit_transform(X_train.copy(), y_train, X_dev=X_dev.copy(), y_dev=y_dev)
    fit_t = time.perf_counter() - t0

    X_dv = carver.transform(X_dev.copy())
    t1 = time.perf_counter()
    X_te = carver.transform(X_test.copy())
    transform_t = time.perf_counter() - t1
    return pd.concat([X_tr, X_dv]), X_te, fit_t, transform_t


def bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, kind):
    Cls = OptimalBinning if kind == 'binary' else ContinuousOptimalBinning
    X_all = pd.concat([X_train, X_dev])
    y_all = pd.concat([y_train, y_dev])
    binners = {}
    train_binned = pd.DataFrame(index=X_all.index)
    test_binned = pd.DataFrame(index=X_test.index)

    t0 = time.perf_counter()
    for col in X_all.columns:
        dtype = 'categorical' if col in categoricals else 'numerical'
        binner = Cls(name=col, dtype=dtype, min_prebin_size=MIN_FREQ/2, max_n_bins=MAX_N_MOD)
        binner.fit(X_all[col].to_numpy(), y_all.to_numpy())
        binners[col] = binner
        train_binned[col] = binner.transform(X_all[col].to_numpy(), metric='bins')
    fit_t = time.perf_counter() - t0

    t1 = time.perf_counter()
    for col, b in binners.items():
        test_binned[col] = b.transform(X_test[col].to_numpy(), metric='bins')
    transform_t = time.perf_counter() - t1
    return train_binned, test_binned, fit_t, transform_t


def bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives, n_bins=5):
    X_all = pd.concat([X_train, X_dev])
    num_train = X_all[quantitatives].apply(lambda c: c.fillna(c.median()))
    num_test = X_test[quantitatives].apply(lambda c: c.fillna(c.median()))
    kbd = KBinsDiscretizer(n_bins=n_bins, encode='ordinal', strategy='quantile')

    t0 = time.perf_counter()
    binned_num_train = pd.DataFrame(
        kbd.fit_transform(num_train), columns=quantitatives, index=X_all.index
    )
    fit_t = time.perf_counter() - t0

    t1 = time.perf_counter()
    binned_num_test = pd.DataFrame(
        kbd.transform(num_test), columns=quantitatives, index=X_test.index
    )
    transform_t = time.perf_counter() - t1

    # KBins has no opinion on categoricals — pass them through as labels
    train = pd.concat([binned_num_train, X_all[categoricals].astype(str)], axis=1)
    test = pd.concat([binned_num_test, X_test[categoricals].astype(str)], axis=1)
    return train, test, fit_t, transform_t

Binary classification — German Credit

20 features (numeric + categorical), 1,000 rows, target = class == 'bad'. Train / dev / test split = 60 / 20 / 20 %.

[4]:

credit = fetch_openml(data_id=31, as_frame=True)
df = credit.frame.copy()

y_binary = (df['class'] == 'bad').astype(int)
X_binary = df.drop(columns=['class'])

X_train, X_rest, y_train, y_rest = train_test_split(
    X_binary, y_binary, test_size=0.4, random_state=SEED, stratify=y_binary,
)
X_dev, X_test, y_dev, y_test = train_test_split(
    X_rest, y_rest, test_size=0.5, random_state=SEED, stratify=y_rest,
)

categoricals = [c for c in X_binary.columns if X_binary[c].dtype == object or isinstance(X_binary[c].dtype, pd.CategoricalDtype)]
quantitatives = [c for c in X_binary.columns if c not in categoricals]

print(f'train={len(X_train)}, dev={len(X_dev)}, test={len(X_test)}')
print(f'categoricals={len(categoricals)}, quantitatives={len(quantitatives)}')
print(f'bad rate (train)={y_train.mean():.3f}, (test)={y_test.mean():.3f}')

train=600, dev=200, test=200
categoricals=13, quantitatives=7
bad rate (train)=0.300, (test)=0.300

[5]:

y_train_full = pd.concat([y_train, y_dev])

runs = [(
    'AutoCarver',
    lambda: bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'binary'),
)]
if HAS_OPTBINNING:
    runs.append((
        'optbinning',
        lambda: bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'binary'),
    ))
runs.append((
    'KBinsDiscretizer',
    lambda: bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives),
))

rows = []
for name, run in runs:
    X_tr, X_te, fit_t, transform_t = run()
    scores = fit_eval_binary(X_tr, X_te, y_train_full, y_test)
    rows.append({
        'library': name,
        'fit_s': round(fit_t, 3),
        'transform_s': round(transform_t, 4),
        'train_auc': round(scores['train_auc'], 4),
        'test_auc': round(scores['test_auc'], 4),
        'auc_drop': round(scores['train_auc'] - scores['test_auc'], 4),
    })

binary_results = pd.DataFrame(rows)
binary_results

------
--- [QuantitativeDiscretizer] Fit Features(['duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
 - [ContinuousDiscretizer] Fit Features(['duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
 - [OrdinalDiscretizer] Fit Features(['duration', 'installment_commitment', 'residence_since', 'existing_credits', 'num_dependents'])
------

------
--- [QualitativeDiscretizer] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker'])
 - [CategoricalDiscretizer] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker'])
------

---------
------ [BinaryCarver] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker', 'duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
--- [BinaryCarver] Fit Categorical('checking_status') (1/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
no checking	0.1317	0.4050
>=200	0.2778	0.0600
0<=X<200	0.3896	0.2567
<0	0.4671	0.2783

X_dev distribution
target_mean	frequency
0.0694	0.3600
0.0833	0.0600
0.3710	0.3100
0.5741	0.2700

Computing associations: 7it [00:00, 5284.40it/s]
Testing robustness    :   0%|          | 0/7 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
no checking, >=200	0.1505	0.4650
0<=X<200, <0	0.4299	0.5350

X_dev distribution
target_mean	frequency
0.0714	0.4200
0.4655	0.5800

--- [BinaryCarver] Fit Categorical('credit_history') (2/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
critical/other existing credit	0.1676	0.2883
existing paid	0.3185	0.5233
delayed previously	0.3621	0.0967
all paid	0.5455	0.0550
no credits/all paid	0.5455	0.0367

X_dev distribution
target_mean	frequency
0.2241	0.2900
0.2703	0.5550
0.3571	0.0700
0.7273	0.0550
0.6667	0.0300

Computing associations: 15it [00:00, 10330.80it/s]
Testing robustness    :   0%|          | 0/15 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
critical/other existing credit	0.1676	0.2883
existing paid, delayed previously	0.3253	0.6200
all paid, no credits/all paid	0.5455	0.0917

X_dev distribution
target_mean	frequency
0.2241	0.2900
0.2800	0.6250
0.7059	0.0850

--- [BinaryCarver] Fit Categorical('purpose') (3/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
used car	0.1875	0.1067
radio/tv	0.2303	0.2750
other, domestic appliance, retraining	0.2632	0.0317
furniture/equipment	0.3333	0.1700
new car	0.3401	0.2450
business	0.3729	0.0983
repairs	0.3750	0.0267
education	0.4643	0.0467

X_dev distribution
target_mean	frequency
0.1250	0.0800
0.2295	0.3050
0.2727	0.0550
0.3235	0.1700
0.4222	0.2250
0.2778	0.0900
0.0000	0.0100
0.4615	0.0650

Computing associations: 98it [00:00, 96015.37it/s]
Testing robustness    :   0%|          | 0/98 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
used car, radio/tv, other, domestic appliance, ret...	0.2218	0.4133
new car, furniture/equipment, business, education,...	0.3551	0.5867

X_dev distribution
target_mean	frequency
0.2159	0.4400
0.3661	0.5600

--- [BinaryCarver] Fit Categorical('savings_status') (4/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
>=1000	0.0667	0.0500
500<=X<1000	0.1622	0.0617
no known savings	0.1714	0.1750
100<=X<500	0.3333	0.1150
<100	0.3649	0.5983

X_dev distribution
target_mean	frequency
0.3333	0.0300
0.1250	0.0800
0.1667	0.1800
0.3889	0.0900
0.3468	0.6200

Computing associations: 15it [00:00, ?it/s]
Testing robustness    :   0%|          | 0/15 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
no known savings, >=1000, 500<=X<1000	0.1512	0.2867
<100, 100<=X<500	0.3598	0.7133

X_dev distribution
target_mean	frequency
0.1724	0.2900
0.3521	0.7100

--- [BinaryCarver] Fit Categorical('employment') (5/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
4<=X<7	0.1935	0.1550
>=7	0.2516	0.2650
1<=X<4	0.2911	0.3550
<1	0.4272	0.1717
unemployed	0.5000	0.0533

X_dev distribution
target_mean	frequency
0.2632	0.1900
0.2600	0.2500
0.3621	0.2900
0.3333	0.1800
0.2222	0.0900

Computing associations: 15it [00:00, ?it/s]
Testing robustness    :  60%|██████    | 9/15 [00:00<00:00, 220.01it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
>=7, 4<=X<7	0.2302	0.4200
unemployed, 1<=X<4, <1	0.3506	0.5800

X_dev distribution
target_mean	frequency
0.2614	0.4400
0.3304	0.5600

--- [BinaryCarver] Fit Categorical('personal_status') (6/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
male single	0.2679	0.5600
male mar/wid	0.2778	0.0900
female div/dep/mar	0.3559	0.2950
male div/sep	0.3636	0.0550

X_dev distribution
target_mean	frequency
0.2830	0.5300
0.2381	0.1050
0.3385	0.3250
0.3750	0.0400

Computing associations: 7it [00:00, 6363.27it/s]
Testing robustness    :   0%|          | 0/7 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
male single, male mar/wid	0.2692	0.6500
female div/dep/mar, male div/sep	0.3571	0.3500

X_dev distribution
target_mean	frequency
0.2756	0.6350
0.3425	0.3650

--- [BinaryCarver] Fit Categorical('other_parties') (7/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
guarantor	0.1786	0.0467
none	0.2996	0.9067
co applicant	0.4286	0.0467

X_dev distribution
target_mean	frequency
0.2500	0.0400
0.2989	0.9200
0.3750	0.0400

Computing associations: 3it [00:00, 3005.95it/s]
Testing robustness    : 100%|██████████| 3/3 [00:00<00:00, 520.41it/s]



WARNING: No robust combination for Categorical('other_parties'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Categorical('property_magnitude') (8/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
real estate	0.2130	0.2817
life insurance	0.3125	0.2133
car	0.3143	0.3500
no known property	0.4086	0.1550

X_dev distribution
target_mean	frequency
0.2182	0.2750
0.2600	0.2500
0.3281	0.3200
0.4516	0.1550

Computing associations: 7it [00:00, ?it/s]
Testing robustness    :   0%|          | 0/7 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
real estate	0.2130	0.2817
car, life insurance	0.3136	0.5633
no known property	0.4086	0.1550

X_dev distribution
target_mean	frequency
0.2182	0.2750
0.2982	0.5700
0.4516	0.1550

--- [BinaryCarver] Fit Categorical('other_payment_plans') (9/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
none	0.2619	0.8083
stores	0.4375	0.0533
bank	0.4699	0.1383

X_dev distribution
target_mean	frequency
0.2866	0.8200
0.4444	0.0450
0.3333	0.1350

Computing associations: 3it [00:00, 2997.36it/s]
Testing robustness    :   0%|          | 0/3 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
none	0.2619	0.8083
bank, stores	0.4609	0.1917

X_dev distribution
target_mean	frequency
0.2866	0.8200
0.3611	0.1800

--- [BinaryCarver] Fit Categorical('housing') (10/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
own	0.2558	0.7233
for free	0.3750	0.1067
rent	0.4412	0.1700

X_dev distribution
target_mean	frequency
0.2857	0.7350
0.4348	0.1150
0.2667	0.1500

Computing associations: 3it [00:00, ?it/s]
Testing robustness    :   0%|          | 0/3 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
own	0.2558	0.7233
for free, rent	0.4157	0.2767

X_dev distribution
target_mean	frequency
0.2857	0.7350
0.3396	0.2650

--- [BinaryCarver] Fit Categorical('job') (11/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
skilled	0.2898	0.6383
unskilled resident	0.2966	0.1967
high qualif/self emp/mgmt	0.3258	0.1483
unemp/unskilled non res	0.5000	0.0167

X_dev distribution
target_mean	frequency
0.2541	0.6100
0.3171	0.2050
0.4839	0.1550
0.1667	0.0300

Computing associations: 7it [00:00, ?it/s]
Testing robustness    :  57%|█████▋    | 4/7 [00:00<00:00, 363.24it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
skilled, unskilled resident	0.2914	0.8350
high qualif/self emp/mgmt, unemp/unskilled non res	0.3434	0.1650

X_dev distribution
target_mean	frequency
0.2699	0.8150
0.4324	0.1850

--- [BinaryCarver] Fit Categorical('own_telephone') (12/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
yes	0.2645	0.4033
none	0.3240	0.5967

X_dev distribution
target_mean	frequency
0.3125	0.4000
0.2917	0.6000

Computing associations: 1it [00:00, ?it/s]
Testing robustness    : 100%|██████████| 1/1 [00:00<00:00, 189.40it/s]



WARNING: No robust combination for Categorical('own_telephone'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Categorical('foreign_worker') (13/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
no	0.0435	0.0383
yes	0.3102	0.9617

X_dev distribution
target_mean	frequency
0.3333	0.0300
0.2990	0.9700

Computing associations: 1it [00:00, ?it/s]
Testing robustness    : 100%|██████████| 1/1 [00:00<00:00, 473.08it/s]



WARNING: No robust combination for Categorical('foreign_worker'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('duration') (14/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 8.00e+00	0.0980	0.0850
8.00e+00 < x <= 9.00e+00	0.2333	0.0500
9.00e+00 < x <= 1.10e+01	0.0870	0.0383
1.10e+01 < x <= 1.20e+01	0.2883	0.1850
1.20e+01 < x <= 1.50e+01	0.2273	0.0733
1.50e+01 < x <= 1.80e+01	0.3692	0.1083
1.80e+01 < x <= 2.20e+01	0.2381	0.0350
2.20e+01 < x <= 2.40e+01	0.3333	0.1950
2.40e+01 < x <= 2.80e+01	0.2222	0.0150
2.80e+01 < x <= 3.30e+01	0.3846	0.0433
3.30e+01 < x <= 3.60e+01	0.4727	0.0917
3.60e+01 < x <= 4.70e+01	0.2667	0.0250
4.70e+01 < x	0.4242	0.0550

X_dev distribution
target_mean	frequency
0.1000	0.1000
0.3077	0.0650
0.0000	0.0400
0.2432	0.1850
0.0714	0.0700
0.3043	0.1150
0.4444	0.0450
0.3548	0.1550
0.7500	0.0200
0.4286	0.0350
0.3529	0.0850
0.6667	0.0150
0.5714	0.0700

Computing associations: 793it [00:00, 113615.13it/s]
Testing robustness    :   0%|          | 0/793 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 1.10e+01	0.1346	0.1733
1.10e+01 < x <= 2.80e+01	0.3052	0.6117
2.80e+01 < x	0.4186	0.2150

X_dev distribution
target_mean	frequency
0.1463	0.2050
0.2966	0.5900
0.4634	0.2050

--- [BinaryCarver] Fit Quantitative('credit_amount') (15/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 6.18e+02	0.2000	0.0250
6.18e+02 < x <= 7.08e+02	0.4000	0.0250
7.08e+02 < x <= 7.97e+02	0.3333	0.0250
7.97e+02 < x <= 9.09e+02	0.4000	0.0250
9.09e+02 < x <= 1.03e+03	0.4000	0.0250
1.03e+03 < x <= 1.16e+03	0.2000	0.0250
1.16e+03 < x <= 1.21e+03	0.2667	0.0250
1.21e+03 < x <= 1.26e+03	0.2000	0.0250
1.26e+03 < x <= 1.31e+03	0.3333	0.0250
1.31e+03 < x <= 1.37e+03	0.4667	0.0250
1.37e+03 < x <= 1.41e+03	0.1250	0.0267
1.41e+03 < x <= 1.47e+03	0.1429	0.0233
1.47e+03 < x <= 1.53e+03	0.2667	0.0250
1.53e+03 < x <= 1.60e+03	0.2000	0.0250
1.60e+03 < x <= 1.82e+03	0.2000	0.0250
1.82e+03 < x <= 1.92e+03	0.5000	0.0267
1.92e+03 < x <= 1.98e+03	0.2857	0.0233
1.98e+03 < x <= 2.12e+03	0.3333	0.0250
2.12e+03 < x <= 2.21e+03	0.2667	0.0250
2.21e+03 < x <= 2.30e+03	0.2667	0.0250
2.30e+03 < x <= 2.38e+03	0.2000	0.0250
2.38e+03 < x <= 2.48e+03	0.4000	0.0250
2.48e+03 < x <= 2.62e+03	0.2667	0.0250
2.62e+03 < x <= 2.75e+03	0.3333	0.0250
2.75e+03 < x <= 2.92e+03	0.2000	0.0250
2.92e+03 < x <= 3.07e+03	0.2000	0.0250
3.07e+03 < x <= 3.35e+03	0.4000	0.0250
3.35e+03 < x <= 3.51e+03	0.1333	0.0250
3.51e+03 < x <= 3.63e+03	0.1333	0.0250
3.63e+03 < x <= 3.91e+03	0.0667	0.0250
3.91e+03 < x <= 4.24e+03	0.4667	0.0250
4.24e+03 < x <= 4.66e+03	0.4000	0.0250
4.66e+03 < x <= 5.08e+03	0.4667	0.0250
5.08e+03 < x <= 5.80e+03	0.2000	0.0250
5.80e+03 < x <= 6.36e+03	0.2667	0.0250
6.36e+03 < x <= 6.85e+03	0.4667	0.0250
6.85e+03 < x <= 7.48e+03	0.2000	0.0250
7.48e+03 < x <= 8.23e+03	0.4667	0.0250
8.23e+03 < x <= 9.57e+03	0.4000	0.0250
9.57e+03 < x	0.5333	0.0250

X_dev distribution
target_mean	frequency
0.2000	0.0250
0.5000	0.0200
0.5000	0.0300
0.0000	0.0100
0.3333	0.0300
0.1429	0.0350
0.5000	0.0100
0.3333	0.0600
0.0000	0.0100
0.2857	0.0350
0.0000	0.0150
0.3333	0.0300
0.2500	0.0200
0.0000	0.0150
0.3333	0.0300
0.2857	0.0350
0.2500	0.0200
0.0000	0.0400
0.5000	0.0100
0.5000	0.0100
0.0000	0.0150
0.0000	0.0050
0.6667	0.0150
0.0000	0.0200
0.0000	0.0200
0.3333	0.0150
0.2000	0.0500
0.5000	0.0400
0.0000	0.0300
0.1000	0.0500
0.2500	0.0200
0.8000	0.0250
0.3333	0.0150
0.4000	0.0250
0.2857	0.0350
0.0000	0.0200
0.6667	0.0150
0.6667	0.0150
0.6667	0.0150
0.6154	0.0650

Computing associations: 92170it [00:03, 25717.85it/s]
Testing robustness    :   0%|          | 0/92170 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 3.35e+03	0.2889	0.6750
3.35e+03 < x <= 3.91e+03	0.1111	0.0750
3.91e+03 < x	0.3867	0.2500

X_dev distribution
target_mean	frequency
0.2460	0.6300
0.2083	0.1200
0.4800	0.2500

--- [BinaryCarver] Fit Quantitative('installment_commitment') (16/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.00e+00	0.2436	0.1300
1.00e+00 < x <= 2.00e+00	0.2606	0.2367
2.00e+00 < x <= 3.00e+00	0.2979	0.1567
3.00e+00 < x	0.3357	0.4767

X_dev distribution
target_mean	frequency
0.1071	0.1400
0.2667	0.2250
0.2414	0.1450
0.3878	0.4900

Computing associations: 7it [00:00, ?it/s]
Testing robustness    :   0%|          | 0/7 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 2.0e+00	0.2545	0.3667
2.0e+00 < x	0.3263	0.6333

X_dev distribution
target_mean	frequency
0.2055	0.3650
0.3543	0.6350

--- [BinaryCarver] Fit Quantitative('residence_since') (17/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.00e+00	0.3117	0.1283
1.00e+00 < x <= 2.00e+00	0.2905	0.2983
2.00e+00 < x <= 3.00e+00	0.3000	0.1667
3.00e+00 < x	0.3033	0.4067

X_dev distribution
target_mean	frequency
0.2174	0.1150
0.3529	0.3400
0.3333	0.1500
0.2658	0.3950

Computing associations: 7it [00:00, ?it/s]
Testing robustness    : 100%|██████████| 7/7 [00:00<00:00, 187.60it/s]



WARNING: No robust combination for Quantitative('residence_since'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('age') (18/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 2.10e+01	0.4000	0.0250
2.10e+01 < x <= 2.20e+01	0.3684	0.0317
2.20e+01 < x <= 2.30e+01	0.4500	0.0333
2.30e+01 < x <= 2.40e+01	0.3333	0.0350
2.40e+01 < x <= 2.50e+01	0.5161	0.0517
2.50e+01 < x <= 2.60e+01	0.2500	0.0467
2.60e+01 < x <= 2.70e+01	0.2258	0.0517
2.70e+01 < x <= 2.80e+01	0.4091	0.0367
2.80e+01 < x <= 2.90e+01	0.3913	0.0383
2.90e+01 < x <= 3.00e+01	0.2143	0.0467
3.00e+01 < x <= 3.10e+01	0.2308	0.0433
3.10e+01 < x <= 3.20e+01	0.2500	0.0333
3.20e+01 < x <= 3.30e+01	0.3636	0.0367
3.30e+01 < x <= 3.40e+01	0.3636	0.0367
3.40e+01 < x <= 3.50e+01	0.1724	0.0483
3.50e+01 < x <= 3.60e+01	0.2083	0.0400
3.60e+01 < x <= 3.70e+01	0.3333	0.0250
3.70e+01 < x <= 3.80e+01	0.1875	0.0267
3.80e+01 < x <= 3.90e+01	0.2941	0.0283
3.90e+01 < x <= 4.10e+01	0.3182	0.0367
4.10e+01 < x <= 4.20e+01	0.2727	0.0183
4.20e+01 < x <= 4.40e+01	0.1905	0.0350
4.40e+01 < x <= 4.60e+01	0.2632	0.0317
4.60e+01 < x <= 4.70e+01	0.4000	0.0167
4.70e+01 < x <= 4.90e+01	0.1429	0.0233
4.90e+01 < x <= 5.10e+01	0.1429	0.0233
5.10e+01 < x <= 5.40e+01	0.2941	0.0283
5.40e+01 < x <= 5.70e+01	0.3333	0.0200
5.70e+01 < x <= 6.30e+01	0.4375	0.0267
6.30e+01 < x	0.2667	0.0250

X_dev distribution
target_mean	frequency
0.3333	0.0300
0.5000	0.0200
0.3333	0.0750
0.6364	0.0550
0.3333	0.0150
0.3333	0.0600
0.1538	0.0650
0.1429	0.0350
0.4000	0.0250
0.5000	0.0500
0.3333	0.0300
0.2000	0.0250
0.3750	0.0400
0.3333	0.0150
0.2500	0.0200
0.1429	0.0350
0.2500	0.0400
0.2500	0.0200
0.0000	0.0050
0.2308	0.0650
0.6000	0.0250
0.3333	0.0300
0.1250	0.0400
0.0000	0.0200
0.2000	0.0250
0.5000	0.0100
0.6000	0.0250
0.2500	0.0200
0.2500	0.0400
0.0000	0.0400

Computing associations: 27840it [00:00, 36613.59it/s]
Testing robustness    :   0%|          | 0/27840 [00:00<?, ?it/s]



 [BinaryCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 2.5e+01	0.4245	0.1767
2.5e+01 < x	0.2733	0.8233

X_dev distribution
target_mean	frequency
0.4359	0.1950
0.2671	0.8050

--- [BinaryCarver] Fit Quantitative('existing_credits') (19/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.00e+00	0.3061	0.6317
1.00e+00 < x <= 2.00e+00	0.2899	0.3450
2.00e+00 < x	0.2857	0.0233

X_dev distribution
target_mean	frequency
0.3000	0.6500
0.3016	0.3150
0.2857	0.0350

Computing associations: 3it [00:00, ?it/s]
Testing robustness    : 100%|██████████| 3/3 [00:00<00:00, 489.53it/s]



WARNING: No robust combination for Quantitative('existing_credits'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('num_dependents') (20/20)
 [BinaryCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.0e+00	0.2984	0.8433
1.0e+00 < x	0.3085	0.1567

X_dev distribution
target_mean	frequency
0.3000	0.8500
0.3000	0.1500

Computing associations: 1it [00:00, ?it/s]
Testing robustness    : 100%|██████████| 1/1 [00:00<00:00, 224.23it/s]



WARNING: No robust combination for Quantitative('num_dependents'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).

[5]:

	library	fit_s	transform_s	train_auc	test_auc	auc_drop
0	AutoCarver	6.196	0.0126	0.8321	0.7874	0.0447
1	optbinning	1.150	0.0131	0.8523	0.7931	0.0592
2	KBinsDiscretizer	0.003	0.0010	0.8401	0.7943	0.0458

[6]:

plot_bars(binary_results, ['fit_s', 'test_auc', 'auc_drop'], 'German Credit \u2014 binary classification')

../../_images/examples_Comparison_comparison_notebook_8_0.png

Here, AutoCarver has dropped 6 columns that were not stable on dev set.

Regression — California Housing

6 numeric demographic features (Latitude / Longitude dropped — see comment in the next cell), 20,640 rows, target = median house value. Same 60 / 20 / 20 split.

[7]:

housing = fetch_california_housing(as_frame=True)
X_reg = housing.frame.drop(columns=['MedHouseVal'])
y_reg = housing.frame['MedHouseVal']

X_train, X_rest, y_train, y_rest = train_test_split(X_reg, y_reg, test_size=0.4, random_state=SEED)
X_dev, X_test, y_dev, y_test = train_test_split(X_rest, y_rest, test_size=0.5, random_state=SEED)

quantitatives = list(X_reg.columns)
categoricals = []

print(f'train={len(X_train)}, dev={len(X_dev)}, test={len(X_test)}')
print(f'quantitatives={len(quantitatives)} ({quantitatives})')

train=12384, dev=4128, test=4128
quantitatives=8 (['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])

[8]:

y_train_full = pd.concat([y_train, y_dev])

runs = [(
    'AutoCarver',
    lambda: bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'continuous'),
)]
if HAS_OPTBINNING:
    runs.append((
        'optbinning',
        lambda: bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'continuous'),
    ))
runs.append((
    'KBinsDiscretizer',
    lambda: bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives),
))

rows = []
for name, run in runs:
    X_tr, X_te, fit_t, transform_t = run()
    scores = fit_eval_regression(X_tr, X_te, y_train_full, y_test)
    rows.append({
        'library': name,
        'fit_s': round(fit_t, 3),
        'transform_s': round(transform_t, 4),
        'train_r2': round(scores['train_r2'], 4),
        'test_r2': round(scores['test_r2'], 4),
        'r2_drop': round(scores['train_r2'] - scores['test_r2'], 4),
    })

regression_results = pd.DataFrame(rows)
regression_results

------
--- [QuantitativeDiscretizer] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
 - [ContinuousDiscretizer] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
 - [OrdinalDiscretizer] Fit Features(['HouseAge'])
------

---------
------ [ContinuousCarver] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
--- [ContinuousCarver] Fit Quantitative('MedInc') (1/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.335e+00	1.1984	0.0250
1.335e+00 < x <= 1.593e+00	1.0105	0.0250
1.593e+00 < x <= 1.740e+00	1.1133	0.0250
1.740e+00 < x <= 1.906e+00	1.1535	0.0252
1.906e+00 < x <= 2.029e+00	1.2090	0.0248
2.029e+00 < x <= 2.152e+00	1.2141	0.0251
2.152e+00 < x <= 2.243e+00	1.2417	0.0250
2.243e+00 < x <= 2.350e+00	1.3827	0.0249
2.350e+00 < x <= 2.468e+00	1.3614	0.0250
2.468e+00 < x <= 2.569e+00	1.4190	0.0250
2.569e+00 < x <= 2.655e+00	1.5264	0.0250
2.655e+00 < x <= 2.737e+00	1.5428	0.0250
2.737e+00 < x <= 2.862e+00	1.5708	0.0250
2.862e+00 < x <= 2.974e+00	1.6630	0.0250
2.974e+00 < x <= 3.054e+00	1.6270	0.0250
3.054e+00 < x <= 3.135e+00	1.7079	0.0250
3.135e+00 < x <= 3.216e+00	1.8554	0.0250
3.216e+00 < x <= 3.315e+00	1.8373	0.0250
3.315e+00 < x <= 3.423e+00	1.9121	0.0250
3.423e+00 < x <= 3.531e+00	1.9162	0.0251
3.531e+00 < x <= 3.633e+00	1.9678	0.0250
3.633e+00 < x <= 3.723e+00	2.0226	0.0250
3.723e+00 < x <= 3.839e+00	1.9891	0.0251
3.839e+00 < x <= 3.971e+00	2.0493	0.0249
3.971e+00 < x <= 4.073e+00	2.0538	0.0252
4.073e+00 < x <= 4.179e+00	2.2004	0.0249
4.179e+00 < x <= 4.315e+00	2.2417	0.0250
4.315e+00 < x <= 4.464e+00	2.2394	0.0250
4.464e+00 < x <= 4.611e+00	2.2577	0.0252
4.611e+00 < x <= 4.757e+00	2.4351	0.0248
4.757e+00 < x <= 4.946e+00	2.3482	0.0250
4.946e+00 < x <= 5.117e+00	2.4592	0.0250
5.117e+00 < x <= 5.308e+00	2.5784	0.0250
5.308e+00 < x <= 5.538e+00	2.6892	0.0250
5.538e+00 < x <= 5.828e+00	2.7867	0.0251
5.828e+00 < x <= 6.148e+00	3.0943	0.0249
6.148e+00 < x <= 6.599e+00	3.3031	0.0250
6.599e+00 < x <= 7.313e+00	3.6064	0.0250
7.313e+00 < x <= 8.433e+00	4.0191	0.0250
8.433e+00 < x	4.7343	0.0250

X_dev distribution
target_mean	frequency
1.2507	0.0247
1.0319	0.0262
1.1587	0.0257
1.0855	0.0252
1.2523	0.0225
1.2606	0.0293
1.2643	0.0208
1.3335	0.0274
1.4528	0.0257
1.4887	0.0305
1.5142	0.0237
1.6485	0.0208
1.5544	0.0293
1.6189	0.0257
1.7433	0.0233
1.6369	0.0213
1.7802	0.0276
1.9721	0.0283
1.8287	0.0279
1.8295	0.0242
1.9907	0.0300
1.9517	0.0216
2.0220	0.0269
2.1509	0.0269
2.0977	0.0291
2.2054	0.0225
2.2979	0.0274
2.3553	0.0274
2.2924	0.0184
2.4401	0.0213
2.2931	0.0250
2.4940	0.0237
2.6133	0.0250
2.7177	0.0189
2.9110	0.0276
3.0729	0.0213
3.0759	0.0271
3.5985	0.0228
4.0385	0.0206
4.6131	0.0264

Computing associations: 92170it [00:03, 27184.56it/s]
Testing robustness    :   0%|          | 0/92170 [00:00<?, ?it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 2.47e+00	1.2093	0.2250
2.47e+00 < x <= 3.13e+00	1.5796	0.1750
3.13e+00 < x <= 4.07e+00	1.9560	0.2251
4.07e+00 < x <= 5.83e+00	2.4238	0.2499
5.83e+00 < x	3.7524	0.1249

X_dev distribution
target_mean	frequency
1.2323	0.2275
1.5934	0.1747
1.9604	0.2425
2.4652	0.2372
3.6870	0.1182

--- [ContinuousCarver] Fit Quantitative('HouseAge') (2/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 5.00e+00	2.2358	0.0271
5.00e+00 < x <= 8.00e+00	1.9727	0.0263
8.00e+00 < x <= 1.10e+01	1.8133	0.0352
1.10e+01 < x <= 1.30e+01	1.8358	0.0267
1.30e+01 < x <= 1.40e+01	1.8778	0.0200
1.40e+01 < x <= 1.60e+01	1.9355	0.0652
1.60e+01 < x <= 1.70e+01	1.8929	0.0319
1.70e+01 < x <= 1.80e+01	1.9455	0.0276
1.80e+01 < x <= 2.00e+01	1.9470	0.0470
2.00e+01 < x <= 2.10e+01	1.9630	0.0217
2.10e+01 < x <= 2.20e+01	2.0661	0.0195
2.20e+01 < x <= 2.30e+01	1.9593	0.0220
2.30e+01 < x <= 2.50e+01	2.1713	0.0480
2.50e+01 < x <= 2.60e+01	2.0937	0.0304
2.60e+01 < x <= 2.70e+01	2.0568	0.0245
2.70e+01 < x <= 2.80e+01	1.9827	0.0241
2.80e+01 < x <= 2.90e+01	2.0203	0.0232
2.90e+01 < x <= 3.00e+01	2.0515	0.0236
3.00e+01 < x <= 3.20e+01	2.0453	0.0484
3.20e+01 < x <= 3.30e+01	2.0343	0.0316
3.30e+01 < x <= 3.40e+01	2.1357	0.0320
3.40e+01 < x <= 3.50e+01	2.0004	0.0399
3.50e+01 < x <= 3.60e+01	2.1148	0.0437
3.60e+01 < x <= 3.70e+01	2.0004	0.0257
3.70e+01 < x <= 3.90e+01	2.0133	0.0355
3.90e+01 < x <= 4.10e+01	2.0306	0.0273
4.10e+01 < x <= 4.20e+01	1.9889	0.0167
4.20e+01 < x <= 4.40e+01	2.0742	0.0351
4.40e+01 < x <= 4.50e+01	2.2977	0.0132
4.50e+01 < x <= 4.70e+01	1.9517	0.0211
4.70e+01 < x	2.5848	0.0857

X_dev distribution
target_mean	frequency
2.0720	0.0245
1.9201	0.0269
1.9054	0.0344
1.8736	0.0216
1.8410	0.0196
1.8826	0.0606
1.8592	0.0375
1.8799	0.0283
1.8746	0.0436
1.9849	0.0206
2.2181	0.0170
2.1550	0.0201
2.0847	0.0579
2.0778	0.0296
2.1784	0.0216
2.2242	0.0208
1.7802	0.0213
1.7629	0.0233
2.0493	0.0504
1.9343	0.0259
2.0837	0.0349
2.1957	0.0417
2.0157	0.0431
2.2006	0.0296
2.0026	0.0351
1.9461	0.0305
1.9196	0.0194
2.0117	0.0312
2.1310	0.0155
2.0515	0.0225
2.5968	0.0911

Computing associations: 31930it [00:00, 33725.96it/s]
Testing robustness    :   1%|          | 310/31930 [00:00<00:54, 584.35it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 2.30e+01	1.9466	0.3703
2.30e+01 < x <= 2.60e+01	2.1412	0.0785
2.60e+01 < x <= 3.60e+01	2.0526	0.2909
3.60e+01 < x <= 4.70e+01	2.0381	0.1747
4.70e+01 < x	2.5848	0.0857

X_dev distribution
target_mean	frequency
1.9316	0.3547
2.0824	0.0875
2.0383	0.2829
2.0347	0.1839
2.5968	0.0911

--- [ContinuousCarver] Fit Quantitative('AveRooms') (3/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 3.066e+00	1.9506	0.0250
3.066e+00 < x <= 3.432e+00	1.8880	0.0250
3.432e+00 < x <= 3.647e+00	1.8233	0.0250
3.647e+00 < x <= 3.792e+00	1.8292	0.0250
3.792e+00 < x <= 3.933e+00	1.7847	0.0250
3.933e+00 < x <= 4.052e+00	1.8499	0.0250
4.052e+00 < x <= 4.168e+00	1.8718	0.0250
4.168e+00 < x <= 4.276e+00	1.8333	0.0250
4.276e+00 < x <= 4.365e+00	1.7965	0.0250
4.365e+00 < x <= 4.454e+00	1.6952	0.0250
4.454e+00 < x <= 4.536e+00	1.7535	0.0250
4.536e+00 < x <= 4.621e+00	1.7952	0.0250
4.621e+00 < x <= 4.705e+00	1.8465	0.0250
4.705e+00 < x <= 4.794e+00	1.7486	0.0250
4.794e+00 < x <= 4.874e+00	1.7719	0.0250
4.874e+00 < x <= 4.941e+00	1.7219	0.0251
4.941e+00 < x <= 5.014e+00	1.7176	0.0249
5.014e+00 < x <= 5.088e+00	1.7707	0.0250
5.088e+00 < x <= 5.160e+00	1.7918	0.0250
5.160e+00 < x <= 5.233e+00	1.7791	0.0250
5.233e+00 < x <= 5.315e+00	1.8209	0.0250
5.315e+00 < x <= 5.384e+00	1.9107	0.0250
5.384e+00 < x <= 5.460e+00	1.7728	0.0250
5.460e+00 < x <= 5.532e+00	1.8996	0.0250
5.532e+00 < x <= 5.616e+00	1.8872	0.0250
5.616e+00 < x <= 5.694e+00	1.9905	0.0250
5.694e+00 < x <= 5.778e+00	2.0029	0.0250
5.778e+00 < x <= 5.858e+00	2.0107	0.0250
5.858e+00 < x <= 5.959e+00	2.1137	0.0250
5.959e+00 < x <= 6.059e+00	2.0469	0.0250
6.059e+00 < x <= 6.157e+00	2.1450	0.0250
6.157e+00 < x <= 6.270e+00	2.2477	0.0250
6.270e+00 < x <= 6.396e+00	2.3495	0.0250
6.396e+00 < x <= 6.543e+00	2.4232	0.0250
6.543e+00 < x <= 6.717e+00	2.6241	0.0250
6.717e+00 < x <= 6.946e+00	2.7573	0.0250
6.946e+00 < x <= 7.233e+00	3.0763	0.0250
7.233e+00 < x <= 7.637e+00	3.1118	0.0250
7.637e+00 < x <= 8.324e+00	3.5846	0.0250
8.324e+00 < x	2.7391	0.0250

X_dev distribution
target_mean	frequency
2.0908	0.0233
1.8579	0.0264
2.0031	0.0242
1.8060	0.0274
1.8137	0.0240
1.7725	0.0211
1.7723	0.0283
1.7839	0.0247
1.7902	0.0286
1.8121	0.0264
1.6265	0.0264
1.8349	0.0276
1.8339	0.0247
1.7725	0.0342
1.8188	0.0254
1.8480	0.0191
1.8333	0.0235
1.8191	0.0266
1.7419	0.0266
1.7642	0.0220
1.7645	0.0303
1.7917	0.0266
1.8651	0.0262
1.8645	0.0274
1.8082	0.0286
1.8483	0.0177
2.0778	0.0240
2.0005	0.0187
1.9724	0.0291
2.2623	0.0235
2.0818	0.0230
2.2889	0.0250
2.3280	0.0213
2.5373	0.0254
2.6787	0.0201
2.7457	0.0211
3.0108	0.0303
3.1596	0.0233
3.4340	0.0235
2.7568	0.0245

Computing associations: 92170it [00:03, 28430.03it/s]
Testing robustness    :   0%|          | 227/92170 [00:00<03:45, 407.92it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 3.65e+00	1.8874	0.0750
3.65e+00 < x <= 5.62e+00	1.8022	0.5500
5.62e+00 < x <= 6.16e+00	2.0516	0.1500
6.16e+00 < x <= 6.54e+00	2.3401	0.0750
6.54e+00 < x	2.9823	0.1500

X_dev distribution
target_mean	frequency
1.9788	0.0739
1.7962	0.5758
2.0474	0.1359
2.3886	0.0717
2.9752	0.1427

--- [ContinuousCarver] Fit Quantitative('AveBedrms') (4/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 9.1220e-01	2.0511	0.0250
9.1220e-01 < x <= 9.4022e-01	2.1264	0.0250
9.4022e-01 < x <= 9.5595e-01	2.0638	0.0250
9.5595e-01 < x <= 9.6743e-01	2.0756	0.0251
9.6743e-01 < x <= 9.7590e-01	2.2562	0.0249
9.7590e-01 < x <= 9.8343e-01	2.1709	0.0250
9.8343e-01 < x <= 9.8987e-01	2.1450	0.0250
9.8987e-01 < x <= 9.9592e-01	2.1772	0.0250
9.9592e-01 < x <= 1.0019e+00	2.1915	0.0251
1.0019e+00 < x <= 1.0068e+00	2.0949	0.0249
1.0068e+00 < x <= 1.0112e+00	2.2440	0.0250
1.0112e+00 < x <= 1.0156e+00	2.1687	0.0250
1.0156e+00 < x <= 1.0204e+00	2.1723	0.0250
1.0204e+00 < x <= 1.0250e+00	2.2003	0.0254
1.0250e+00 < x <= 1.0290e+00	2.1324	0.0246
1.0290e+00 < x <= 1.0331e+00	2.1840	0.0250
1.0331e+00 < x <= 1.0369e+00	2.0321	0.0250
1.0369e+00 < x <= 1.0412e+00	2.1746	0.0250
1.0412e+00 < x <= 1.0453e+00	2.2536	0.0250
1.0453e+00 < x <= 1.0493e+00	2.1546	0.0250
1.0493e+00 < x <= 1.0534e+00	2.0738	0.0251
1.0534e+00 < x <= 1.0574e+00	2.1224	0.0249
1.0574e+00 < x <= 1.0615e+00	2.0414	0.0250
1.0615e+00 < x <= 1.0662e+00	2.1569	0.0251
1.0662e+00 < x <= 1.0712e+00	2.0972	0.0250
1.0712e+00 < x <= 1.0763e+00	2.0714	0.0249
1.0763e+00 < x <= 1.0816e+00	2.0244	0.0250
1.0816e+00 < x <= 1.0874e+00	2.0135	0.0252
1.0874e+00 < x <= 1.0933e+00	2.2239	0.0249
1.0933e+00 < x <= 1.1000e+00	2.0244	0.0262
1.1000e+00 < x <= 1.1071e+00	2.0077	0.0242
1.1071e+00 < x <= 1.1160e+00	1.9564	0.0245
1.1160e+00 < x <= 1.1267e+00	2.0077	0.0250
1.1267e+00 < x <= 1.1387e+00	1.9305	0.0250
1.1387e+00 < x <= 1.1538e+00	1.8130	0.0258
1.1538e+00 < x <= 1.1739e+00	1.8060	0.0242
1.1739e+00 < x <= 1.2074e+00	1.9109	0.0250
1.2074e+00 < x <= 1.2730e+00	1.8950	0.0250
1.2730e+00 < x <= 1.5018e+00	1.7962	0.0250
1.5018e+00 < x	1.4931	0.0250

X_dev distribution
target_mean	frequency
1.7961	0.0252
2.0098	0.0298
2.3039	0.0257
2.2390	0.0262
2.3293	0.0240
1.9318	0.0194
2.1575	0.0199
2.1740	0.0291
2.2207	0.0337
2.1811	0.0233
2.0475	0.0262
2.2743	0.0218
2.2627	0.0293
2.1068	0.0247
2.4459	0.0228
2.1280	0.0269
2.1193	0.0240
2.2280	0.0259
2.0336	0.0237
2.0195	0.0216
1.9898	0.0235
2.2270	0.0216
1.9244	0.0254
2.1509	0.0237
2.2223	0.0274
1.9654	0.0271
2.1085	0.0257
2.0332	0.0240
1.9262	0.0264
2.1139	0.0274
1.9025	0.0225
1.8628	0.0271
1.9501	0.0259
2.0231	0.0206
1.8622	0.0271
1.8137	0.0250
2.0399	0.0259
1.6392	0.0218
1.7221	0.0250
1.6019	0.0240

Computing associations: 92170it [00:03, 26708.78it/s]
Testing robustness    :   2%|▏         | 1722/92170 [00:02<02:08, 706.46it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 1.049e+00	2.1535	0.5000
1.049e+00 < x <= 1.093e+00	2.0915	0.2250
1.093e+00 < x <= 1.139e+00	1.9857	0.1249
1.139e+00 < x <= 1.207e+00	1.8434	0.0750
1.207e+00 < x	1.7279	0.0750

X_dev distribution
target_mean	frequency
2.1526	0.5029
2.0582	0.2248
1.9707	0.1235
1.9057	0.0780
1.6558	0.0707

--- [ContinuousCarver] Fit Quantitative('Population') (5/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 2.08e+02	1.9050	0.0251
2.08e+02 < x <= 3.53e+02	2.0277	0.0251
3.53e+02 < x <= 4.42e+02	2.0655	0.0250
4.42e+02 < x <= 5.12e+02	2.2067	0.0249
5.12e+02 < x <= 5.75e+02	2.1327	0.0250
5.75e+02 < x <= 6.27e+02	2.0731	0.0250
6.27e+02 < x <= 6.75e+02	2.3627	0.0249
6.75e+02 < x <= 7.16e+02	2.2006	0.0250
7.16e+02 < x <= 7.56e+02	2.0900	0.0253
7.56e+02 < x <= 7.94e+02	2.0191	0.0251
7.94e+02 < x <= 8.32e+02	2.3248	0.0251
8.32e+02 < x <= 8.67e+02	2.0763	0.0253
8.67e+02 < x <= 9.02e+02	2.0313	0.0247
9.02e+02 < x <= 9.40e+02	2.1185	0.0247
9.40e+02 < x <= 9.78e+02	2.1790	0.0253
9.78e+02 < x <= 1.02e+03	2.0746	0.0249
1.02e+03 < x <= 1.06e+03	1.9522	0.0247
1.06e+03 < x <= 1.09e+03	2.1186	0.0250
1.09e+03 < x <= 1.13e+03	2.0592	0.0252
1.13e+03 < x <= 1.17e+03	2.0640	0.0252
1.17e+03 < x <= 1.22e+03	2.0134	0.0249
1.22e+03 < x <= 1.26e+03	2.1690	0.0250
1.26e+03 < x <= 1.30e+03	2.0558	0.0248
1.30e+03 < x <= 1.35e+03	1.9711	0.0249
1.35e+03 < x <= 1.41e+03	2.0185	0.0250
1.41e+03 < x <= 1.46e+03	2.0004	0.0251
1.46e+03 < x <= 1.52e+03	2.0911	0.0248
1.52e+03 < x <= 1.59e+03	2.1322	0.0254
1.59e+03 < x <= 1.66e+03	1.9949	0.0246
1.66e+03 < x <= 1.73e+03	2.0233	0.0250
1.73e+03 < x <= 1.82e+03	1.8946	0.0253
1.82e+03 < x <= 1.91e+03	1.9504	0.0247
1.91e+03 < x <= 2.02e+03	2.0074	0.0250
2.02e+03 < x <= 2.16e+03	2.0213	0.0250
2.16e+03 < x <= 2.32e+03	2.0541	0.0250
2.32e+03 < x <= 2.56e+03	2.0757	0.0250
2.56e+03 < x <= 2.86e+03	2.0142	0.0250
2.86e+03 < x <= 3.28e+03	1.9196	0.0250
3.28e+03 < x <= 4.25e+03	2.0439	0.0250
4.25e+03 < x	2.0010	0.0250

X_dev distribution
target_mean	frequency
1.9895	0.0269
1.8189	0.0271
2.1479	0.0271
2.2434	0.0266
2.1281	0.0269
2.2908	0.0257
2.0926	0.0283
2.1757	0.0213
2.2182	0.0259
2.1433	0.0286
2.0769	0.0293
2.1889	0.0240
2.0488	0.0218
2.1585	0.0247
2.0699	0.0259
2.0396	0.0247
1.9843	0.0254
2.1062	0.0213
1.9823	0.0242
2.1353	0.0271
2.1132	0.0230
1.9696	0.0252
2.1243	0.0196
1.9774	0.0245
1.8002	0.0245
2.1500	0.0264
1.9471	0.0293
1.9535	0.0262
2.0915	0.0274
2.0390	0.0228
2.1380	0.0211
1.9706	0.0203
1.8717	0.0264
1.9082	0.0247
2.0895	0.0233
1.8131	0.0266
2.0019	0.0269
2.0234	0.0201
2.1558	0.0262
2.0339	0.0225

Computing associations: 92170it [00:03, 26163.59it/s]
Testing robustness    :   1%|          | 753/92170 [00:00<01:43, 885.21it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 3.53e+02	1.9663	0.0502
3.53e+02 < x <= 8.32e+02	2.1636	0.2253
8.32e+02 < x <= 1.73e+03	2.0604	0.4745
1.73e+03 < x <= 2.16e+03	1.9683	0.1000
2.16e+03 < x	2.0181	0.1500

X_dev distribution
target_mean	frequency
1.9038	0.0540
2.1659	0.2398
2.0445	0.4680
1.9639	0.0925
2.0169	0.1456

--- [ContinuousCarver] Fit Quantitative('AveOccup') (6/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 1.699e+00	2.6141	0.0250
1.699e+00 < x <= 1.868e+00	2.7986	0.0250
1.868e+00 < x <= 1.976e+00	2.6979	0.0250
1.976e+00 < x <= 2.071e+00	2.5558	0.0250
2.071e+00 < x <= 2.161e+00	2.4582	0.0250
2.161e+00 < x <= 2.228e+00	2.2757	0.0250
2.228e+00 < x <= 2.288e+00	2.3592	0.0250
2.288e+00 < x <= 2.341e+00	2.2507	0.0250
2.341e+00 < x <= 2.388e+00	2.1371	0.0250
2.388e+00 < x <= 2.435e+00	2.2708	0.0250
2.435e+00 < x <= 2.475e+00	2.1989	0.0250
2.475e+00 < x <= 2.515e+00	2.1564	0.0250
2.515e+00 < x <= 2.557e+00	2.1279	0.0250
2.557e+00 < x <= 2.598e+00	2.2428	0.0250
2.598e+00 < x <= 2.639e+00	2.1116	0.0250
2.639e+00 < x <= 2.674e+00	2.2343	0.0250
2.674e+00 < x <= 2.712e+00	2.0489	0.0250
2.712e+00 < x <= 2.746e+00	2.2196	0.0250
2.746e+00 < x <= 2.784e+00	2.1211	0.0250
2.784e+00 < x <= 2.824e+00	2.2645	0.0250
2.824e+00 < x <= 2.861e+00	2.1565	0.0251
2.861e+00 < x <= 2.899e+00	2.2323	0.0250
2.899e+00 < x <= 2.943e+00	2.0714	0.0250
2.943e+00 < x <= 2.984e+00	2.0495	0.0250
2.984e+00 < x <= 3.026e+00	1.9917	0.0250
3.026e+00 < x <= 3.071e+00	1.9623	0.0250
3.071e+00 < x <= 3.117e+00	2.0491	0.0250
3.117e+00 < x <= 3.168e+00	1.9336	0.0250
3.168e+00 < x <= 3.221e+00	1.9472	0.0250
3.221e+00 < x <= 3.279e+00	1.8938	0.0250
3.279e+00 < x <= 3.344e+00	1.8804	0.0250
3.344e+00 < x <= 3.424e+00	1.8724	0.0250
3.424e+00 < x <= 3.508e+00	1.8000	0.0250
3.508e+00 < x <= 3.606e+00	1.6571	0.0250
3.606e+00 < x <= 3.719e+00	1.5624	0.0250
3.719e+00 < x <= 3.870e+00	1.5709	0.0250
3.870e+00 < x <= 4.089e+00	1.4854	0.0250
4.089e+00 < x <= 4.317e+00	1.4240	0.0250
4.317e+00 < x <= 4.705e+00	1.3233	0.0250
4.705e+00 < x	1.5280	0.0250

X_dev distribution
target_mean	frequency
2.7524	0.0220
2.7763	0.0293
2.6502	0.0257
2.5990	0.0242
2.4828	0.0296
2.4039	0.0247
2.2567	0.0281
2.4137	0.0230
2.3471	0.0211
2.2425	0.0300
2.0911	0.0252
2.2072	0.0259
2.1370	0.0262
2.0973	0.0281
2.0188	0.0230
2.0825	0.0225
2.2615	0.0247
2.0114	0.0213
2.2314	0.0257
2.0203	0.0233
2.0908	0.0286
1.8887	0.0233
1.9894	0.0250
2.2316	0.0228
2.0891	0.0291
1.9787	0.0223
2.0818	0.0279
1.8602	0.0203
1.9611	0.0189
1.7265	0.0230
1.7789	0.0259
1.8341	0.0274
1.6481	0.0211
1.6989	0.0247
1.6267	0.0271
1.5547	0.0250
1.4150	0.0293
1.5364	0.0220
1.4245	0.0262
1.5598	0.0266

Computing associations: 92170it [00:03, 26604.88it/s]
Testing robustness    :   0%|          | 0/92170 [00:00<?, ?it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 2.16e+00	2.6250	0.1250
2.16e+00 < x <= 2.90e+00	2.2005	0.4251
2.90e+00 < x <= 3.51e+00	1.9501	0.2749
3.51e+00 < x <= 3.87e+00	1.5968	0.0750
3.87e+00 < x	1.4402	0.1000

X_dev distribution
target_mean	frequency
2.6484	0.1308
2.1665	0.4247
1.9311	0.2636
1.6265	0.0768
1.4801	0.1042

--- [ContinuousCarver] Fit Quantitative('Latitude') (7/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= 3.275e+01	1.5912	0.0287
3.275e+01 < x <= 3.284e+01	1.9471	0.0220
3.284e+01 < x <= 3.321e+01	2.1038	0.0246
3.321e+01 < x <= 3.365e+01	2.7833	0.0279
3.365e+01 < x <= 3.374e+01	2.4326	0.0268
3.374e+01 < x <= 3.379e+01	2.1829	0.0262
3.379e+01 < x <= 3.383e+01	2.4232	0.0229
3.383e+01 < x <= 3.387e+01	2.3003	0.0241
3.387e+01 < x <= 3.391e+01	2.1570	0.0279
3.391e+01 < x <= 3.394e+01	1.6300	0.0242
3.394e+01 < x <= 3.397e+01	1.8594	0.0225
3.397e+01 < x <= 3.400e+01	1.9482	0.0224
3.400e+01 < x <= 3.403e+01	2.1267	0.0277
3.403e+01 < x <= 3.406e+01	2.4021	0.0339
3.406e+01 < x <= 3.408e+01	2.2476	0.0214
3.408e+01 < x <= 3.410e+01	2.1003	0.0203
3.410e+01 < x <= 3.413e+01	2.3646	0.0242
3.413e+01 < x <= 3.417e+01	2.7771	0.0301
3.417e+01 < x <= 3.420e+01	2.5061	0.0174
3.420e+01 < x <= 3.427e+01	2.3463	0.0262
3.427e+01 < x <= 3.453e+01	2.4559	0.0240
3.453e+01 < x <= 3.532e+01	1.4914	0.0246
3.532e+01 < x <= 3.623e+01	0.9208	0.0250
3.623e+01 < x <= 3.672e+01	1.2441	0.0262
3.672e+01 < x <= 3.697e+01	1.3129	0.0253
3.697e+01 < x <= 3.729e+01	2.6241	0.0239
3.729e+01 < x <= 3.737e+01	2.6574	0.0258
3.737e+01 < x <= 3.753e+01	3.0105	0.0255
3.753e+01 < x <= 3.765e+01	2.4197	0.0243
3.765e+01 < x <= 3.772e+01	2.1174	0.0256
3.772e+01 < x <= 3.777e+01	2.5537	0.0286
3.777e+01 < x <= 3.781e+01	2.7647	0.0221
3.781e+01 < x <= 3.793e+01	2.6181	0.0238
3.793e+01 < x <= 3.800e+01	1.7622	0.0250
3.800e+01 < x <= 3.826e+01	1.5924	0.0243
3.826e+01 < x <= 3.850e+01	1.8570	0.0254
3.850e+01 < x <= 3.863e+01	1.3981	0.0241
3.863e+01 < x <= 3.898e+01	1.3962	0.0251
3.898e+01 < x <= 3.975e+01	1.1241	0.0255
3.975e+01 < x	0.8442	0.0244

X_dev distribution
target_mean	frequency
1.5761	0.0320
1.9445	0.0298
2.2318	0.0254
2.7115	0.0264
2.4368	0.0262
2.2910	0.0291
2.3528	0.0220
2.3233	0.0233
2.0937	0.0368
1.6319	0.0230
1.7992	0.0235
1.9408	0.0250
2.1292	0.0250
2.3261	0.0334
2.2713	0.0233
2.2817	0.0211
2.2228	0.0216
2.8224	0.0303
2.3178	0.0187
2.2778	0.0279
2.5025	0.0252
1.3719	0.0201
0.9336	0.0218
1.2516	0.0259
1.2597	0.0274
2.5507	0.0240
2.5351	0.0266
2.9827	0.0283
2.6519	0.0194
2.0869	0.0203
2.6145	0.0242
2.5272	0.0208
2.6246	0.0308
1.6630	0.0250
1.5156	0.0206
1.7549	0.0225
1.3101	0.0196
1.3997	0.0279
1.1114	0.0235
0.8671	0.0225

Computing associations: 92170it [00:03, 27314.34it/s]
Testing robustness    :   0%|          | 1/92170 [00:00<12:41:40,  2.02it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= 3.45e+01	2.2311	0.5254
3.45e+01 < x <= 3.70e+01	1.2415	0.1011
3.70e+01 < x <= 3.79e+01	2.5927	0.1997
3.79e+01 < x <= 3.85e+01	1.7393	0.0748
3.85e+01 < x	1.1907	0.0991

X_dev distribution
target_mean	frequency
2.2111	0.5487
1.2065	0.0952
2.5902	0.1945
1.6488	0.0681
1.1801	0.0935

--- [ContinuousCarver] Fit Quantitative('Longitude') (8/8)
 [ContinuousCarver] Raw distribution

X distribution
	target_mean	frequency
x <= -1.2269e+02	1.4063	0.0259
-1.2269e+02 < x <= -1.2247e+02	2.8878	0.0259
-1.2247e+02 < x <= -1.2241e+02	3.2397	0.0245
-1.2241e+02 < x <= -1.2229e+02	2.1582	0.0262
-1.2229e+02 < x <= -1.2223e+02	2.3463	0.0260
-1.2223e+02 < x <= -1.2215e+02	2.2598	0.0216
-1.2215e+02 < x <= -1.2206e+02	2.5665	0.0263
-1.2206e+02 < x <= -1.2199e+02	2.6265	0.0253
-1.2199e+02 < x <= -1.2191e+02	2.6924	0.0237
-1.2191e+02 < x <= -1.2181e+02	2.2919	0.0255
-1.2181e+02 < x <= -1.2157e+02	1.7103	0.0242
-1.2157e+02 < x <= -1.2139e+02	1.1736	0.0252
-1.2139e+02 < x <= -1.2127e+02	1.3270	0.0263
-1.2127e+02 < x <= -1.2101e+02	1.4857	0.0238
-1.2101e+02 < x <= -1.2064e+02	1.4716	0.0245
-1.2064e+02 < x <= -1.2007e+02	1.3376	0.0254
-1.2007e+02 < x <= -1.1972e+02	1.2624	0.0258
-1.1972e+02 < x <= -1.1929e+02	1.3332	0.0239
-1.1929e+02 < x <= -1.1897e+02	1.3300	0.0250
-1.1897e+02 < x <= -1.1852e+02	2.7211	0.0258
-1.1852e+02 < x <= -1.1843e+02	3.1653	0.0284
-1.1843e+02 < x <= -1.1838e+02	3.4432	0.0238
-1.1838e+02 < x <= -1.1834e+02	2.7480	0.0249
-1.1834e+02 < x <= -1.1830e+02	2.3435	0.0271
-1.1830e+02 < x <= -1.1827e+02	1.8482	0.0207
-1.1827e+02 < x <= -1.1822e+02	1.6714	0.0273
-1.1822e+02 < x <= -1.1818e+02	1.8055	0.0227
-1.1818e+02 < x <= -1.1813e+02	2.1480	0.0287
-1.1813e+02 < x <= -1.1808e+02	2.2494	0.0243
-1.1808e+02 < x <= -1.1801e+02	2.4079	0.0245
-1.1801e+02 < x <= -1.1795e+02	2.1794	0.0252
-1.1795e+02 < x <= -1.1790e+02	2.2897	0.0216
-1.1790e+02 < x <= -1.1780e+02	2.4820	0.0266
-1.1780e+02 < x <= -1.1766e+02	2.2864	0.0248
-1.1766e+02 < x <= -1.1739e+02	1.6791	0.0237
-1.1739e+02 < x <= -1.1725e+02	1.6380	0.0290
-1.1725e+02 < x <= -1.1716e+02	2.0512	0.0229
-1.1716e+02 < x <= -1.1708e+02	1.5113	0.0249
-1.1708e+02 < x <= -1.1696e+02	1.6669	0.0235
-1.1696e+02 < x	1.1769	0.0245

X_dev distribution
target_mean	frequency
1.3927	0.0216
3.0129	0.0233
3.1899	0.0225
2.1911	0.0271
2.3576	0.0254
2.2342	0.0199
2.9862	0.0240
2.5471	0.0240
2.6969	0.0230
2.1464	0.0250
1.7105	0.0218
1.0959	0.0220
1.2918	0.0291
1.3781	0.0230
1.4767	0.0225
1.2441	0.0252
1.2810	0.0281
1.2813	0.0252
1.4223	0.0274
2.7081	0.0218
3.2548	0.0266
3.3604	0.0242
2.8064	0.0262
2.2395	0.0305
1.7551	0.0191
1.7695	0.0242
1.6175	0.0298
2.0881	0.0264
2.3487	0.0245
2.4322	0.0235
2.1831	0.0286
2.1875	0.0211
2.5202	0.0288
2.2701	0.0235
1.7464	0.0225
1.8748	0.0310
2.1466	0.0266
1.4479	0.0279
1.5746	0.0271
1.2465	0.0259

Computing associations: 92170it [00:03, 27465.39it/s]
Testing robustness    :   0%|          | 1/92170 [00:00<4:52:24,  5.25it/s]



 [ContinuousCarver] Carved distribution

X distribution
	target_mean	frequency
x <= -1.218e+02	2.4438	0.2509
-1.218e+02 < x <= -1.190e+02	1.3787	0.2242
-1.190e+02 < x <= -1.183e+02	3.0175	0.1029
-1.183e+02 < x <= -1.177e+02	2.1601	0.2735
-1.177e+02 < x	1.6155	0.1486

X_dev distribution
target_mean	frequency
2.4780	0.2357
1.3487	0.2243
3.0414	0.0988
2.1328	0.2800
1.6763	0.1611

[8]:

	library	fit_s	transform_s	train_r2	test_r2	r2_drop
0	AutoCarver	33.499	0.0577	0.6633	0.6566	0.0067
1	optbinning	2.548	0.0086	0.5145	0.5077	0.0068
2	KBinsDiscretizer	0.007	0.0015	0.6181	0.6192	-0.0011

[9]:

plot_bars(regression_results, ['fit_s', 'test_r2', 'r2_drop'], 'California Housing \u2014 regression')

../../_images/examples_Comparison_comparison_notebook_13_0.png

How to read these numbers

``fit_s`` / ``transform_s`` measure only .fit / .transform wall-clock — not data loading, not one-hot encoding, not the downstream model.
``test_auc`` / ``test_r2`` are the headline metric. They reflect how well a simple downstream model performs on each library’s binned output. A tree-based downstream model would tell a different (and less binning-sensitive) story.
``auc_drop`` / ``r2_drop`` are train - test and measure how much each library’s bins overfit. Lower is more robust. AutoCarver’s dev-set veto is designed to keep this small.
Same data, same seed, same downstream model across libraries — but a single run, on one machine, with one set of hyper-parameters. Treat as illustrative.

When the result will move

Bigger ``max_n_mod`` / smaller ``min_freq`` will improve AutoCarver and optbinning’s in-sample scores at the cost of *_drop. KBins doesn’t have a target, so it’s mostly insensitive.
Different downstream model. Gradient-boosted trees on the raw features beat any binning + linear pipeline. The point of binning is interpretability, not raw accuracy.
Different dataset. German Credit is small; on a 10M-row credit-risk dataset, fit_s is what dominates the comparison.

See comparison.rst for the qualitative scope and algorithmic comparison.