Benchmark: AutoCarver vs. optbinning vs. KBinsDiscretizer
This notebook runs the three binning libraries side-by-side on two public datasets:
German Credit — binary classification, mixed numeric / categorical features, 1,000 rows.
California Housing — regression, all-numeric features, 20,640 rows.
For each library and dataset, we report:
``fit`` and ``transform`` wall-clock (seconds)
Downstream-model score — AUC for binary, R² for regression — using a linear model (logistic regression / ridge) on the one-hot-encoded bin output
``train`` → ``test`` score drop as a coarse proxy for drift sensitivity
All three libraries see the same train + dev data and are evaluated on the same held-out test. AutoCarver uses the dev sample for its built-in robustness veto; optbinning and KBinsDiscretizer don’t have a dev-set concept and so treat the union of train + dev as one pooled training set — which is the comparison practitioners actually run.
This is not an IV / Tschuprow’s T leaderboard. Those metrics structurally favour the library whose objective they are. The downstream-model score is the metric a real scorecard team would use to pick a binner.
Numbers come from a single run on a single machine with a fixed seed; treat them as illustrative, not as authoritative benchmark figures. Re-run on your own data before drawing conclusions.
Setup
[1]:
import time
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing, fetch_openml
from sklearn.linear_model import LogisticRegression, Ridge
from sklearn.metrics import r2_score, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import KBinsDiscretizer
from AutoCarver import BinaryCarver, ContinuousCarver, Features
from AutoCarver.discretizers.utils.base_discretizer import DiscretizerConfig
try:
from optbinning import ContinuousOptimalBinning, OptimalBinning
HAS_OPTBINNING = True
except ImportError:
HAS_OPTBINNING = False
print('optbinning is not installed \u2014 its rows will be skipped.')
SEED = 42
warnings.filterwarnings('ignore')
plt.rcParams['figure.figsize'] = (10, 3.5)
[2]:
def one_hot(df):
"""Treat every bin label as a categorical level and one-hot encode it.
Lets a linear downstream model consume any of the three libraries' outputs
uniformly, without us computing WoE per bin.
"""
return pd.get_dummies(df.astype(str), drop_first=True).astype(float)
def fit_eval_binary(X_train, X_test, y_train, y_test):
Xtr = one_hot(X_train)
Xte = one_hot(X_test).reindex(columns=Xtr.columns, fill_value=0.0)
model = LogisticRegression(max_iter=1000, random_state=SEED).fit(Xtr, y_train)
return {
'train_auc': roc_auc_score(y_train, model.predict_proba(Xtr)[:, 1]),
'test_auc': roc_auc_score(y_test, model.predict_proba(Xte)[:, 1]),
}
def fit_eval_regression(X_train, X_test, y_train, y_test):
Xtr = one_hot(X_train)
Xte = one_hot(X_test).reindex(columns=Xtr.columns, fill_value=0.0)
model = Ridge(random_state=SEED).fit(Xtr, y_train)
return {
'train_r2': r2_score(y_train, model.predict(Xtr)),
'test_r2': r2_score(y_test, model.predict(Xte)),
}
def plot_bars(results_df, score_cols, title):
fig, axes = plt.subplots(1, len(score_cols), figsize=(4 * len(score_cols), 3.5))
if len(score_cols) == 1:
axes = [axes]
for ax, col in zip(axes, score_cols):
results_df.plot.bar(x='library', y=col, ax=ax, legend=False, color='#4C72B0')
ax.set_title(col)
ax.set_xlabel('')
ax.tick_params(axis='x', rotation=0)
fig.suptitle(title)
fig.tight_layout()
plt.show()
[3]:
MAX_N_MOD = 5
MIN_FREQ = 0.05
def bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, kind):
Carver = BinaryCarver if kind == 'binary' else ContinuousCarver
features = Features(categoricals=categoricals, quantitatives=quantitatives)
config = DiscretizerConfig(verbose=True) # showing statistics
carver = Carver(features=features, min_freq=MIN_FREQ, max_n_mod=MAX_N_MOD, config=config)
t0 = time.perf_counter()
X_tr = carver.fit_transform(X_train.copy(), y_train, X_dev=X_dev.copy(), y_dev=y_dev)
fit_t = time.perf_counter() - t0
X_dv = carver.transform(X_dev.copy())
t1 = time.perf_counter()
X_te = carver.transform(X_test.copy())
transform_t = time.perf_counter() - t1
return pd.concat([X_tr, X_dv]), X_te, fit_t, transform_t
def bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, kind):
Cls = OptimalBinning if kind == 'binary' else ContinuousOptimalBinning
X_all = pd.concat([X_train, X_dev])
y_all = pd.concat([y_train, y_dev])
binners = {}
train_binned = pd.DataFrame(index=X_all.index)
test_binned = pd.DataFrame(index=X_test.index)
t0 = time.perf_counter()
for col in X_all.columns:
dtype = 'categorical' if col in categoricals else 'numerical'
binner = Cls(name=col, dtype=dtype, min_prebin_size=MIN_FREQ/2, max_n_bins=MAX_N_MOD)
binner.fit(X_all[col].to_numpy(), y_all.to_numpy())
binners[col] = binner
train_binned[col] = binner.transform(X_all[col].to_numpy(), metric='bins')
fit_t = time.perf_counter() - t0
t1 = time.perf_counter()
for col, b in binners.items():
test_binned[col] = b.transform(X_test[col].to_numpy(), metric='bins')
transform_t = time.perf_counter() - t1
return train_binned, test_binned, fit_t, transform_t
def bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives, n_bins=5):
X_all = pd.concat([X_train, X_dev])
num_train = X_all[quantitatives].apply(lambda c: c.fillna(c.median()))
num_test = X_test[quantitatives].apply(lambda c: c.fillna(c.median()))
kbd = KBinsDiscretizer(n_bins=n_bins, encode='ordinal', strategy='quantile')
t0 = time.perf_counter()
binned_num_train = pd.DataFrame(
kbd.fit_transform(num_train), columns=quantitatives, index=X_all.index
)
fit_t = time.perf_counter() - t0
t1 = time.perf_counter()
binned_num_test = pd.DataFrame(
kbd.transform(num_test), columns=quantitatives, index=X_test.index
)
transform_t = time.perf_counter() - t1
# KBins has no opinion on categoricals — pass them through as labels
train = pd.concat([binned_num_train, X_all[categoricals].astype(str)], axis=1)
test = pd.concat([binned_num_test, X_test[categoricals].astype(str)], axis=1)
return train, test, fit_t, transform_t
Binary classification — German Credit
20 features (numeric + categorical), 1,000 rows, target = class == 'bad'. Train / dev / test split = 60 / 20 / 20 %.
[4]:
credit = fetch_openml(data_id=31, as_frame=True)
df = credit.frame.copy()
y_binary = (df['class'] == 'bad').astype(int)
X_binary = df.drop(columns=['class'])
X_train, X_rest, y_train, y_rest = train_test_split(
X_binary, y_binary, test_size=0.4, random_state=SEED, stratify=y_binary,
)
X_dev, X_test, y_dev, y_test = train_test_split(
X_rest, y_rest, test_size=0.5, random_state=SEED, stratify=y_rest,
)
categoricals = [c for c in X_binary.columns if X_binary[c].dtype == object or isinstance(X_binary[c].dtype, pd.CategoricalDtype)]
quantitatives = [c for c in X_binary.columns if c not in categoricals]
print(f'train={len(X_train)}, dev={len(X_dev)}, test={len(X_test)}')
print(f'categoricals={len(categoricals)}, quantitatives={len(quantitatives)}')
print(f'bad rate (train)={y_train.mean():.3f}, (test)={y_test.mean():.3f}')
train=600, dev=200, test=200
categoricals=13, quantitatives=7
bad rate (train)=0.300, (test)=0.300
[5]:
y_train_full = pd.concat([y_train, y_dev])
runs = [(
'AutoCarver',
lambda: bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'binary'),
)]
if HAS_OPTBINNING:
runs.append((
'optbinning',
lambda: bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'binary'),
))
runs.append((
'KBinsDiscretizer',
lambda: bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives),
))
rows = []
for name, run in runs:
X_tr, X_te, fit_t, transform_t = run()
scores = fit_eval_binary(X_tr, X_te, y_train_full, y_test)
rows.append({
'library': name,
'fit_s': round(fit_t, 3),
'transform_s': round(transform_t, 4),
'train_auc': round(scores['train_auc'], 4),
'test_auc': round(scores['test_auc'], 4),
'auc_drop': round(scores['train_auc'] - scores['test_auc'], 4),
})
binary_results = pd.DataFrame(rows)
binary_results
------
--- [QuantitativeDiscretizer] Fit Features(['duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
- [ContinuousDiscretizer] Fit Features(['duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
- [OrdinalDiscretizer] Fit Features(['duration', 'installment_commitment', 'residence_since', 'existing_credits', 'num_dependents'])
------
------
--- [QualitativeDiscretizer] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker'])
- [CategoricalDiscretizer] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker'])
------
---------
------ [BinaryCarver] Fit Features(['checking_status', 'credit_history', 'purpose', 'savings_status', 'employment', 'personal_status', 'other_parties', 'property_magnitude', 'other_payment_plans', 'housing', 'job', 'own_telephone', 'foreign_worker', 'duration', 'credit_amount', 'installment_commitment', 'residence_since', 'age', 'existing_credits', 'num_dependents'])
--- [BinaryCarver] Fit Categorical('checking_status') (1/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| no checking | 0.1317 | 0.4050 |
| >=200 | 0.2778 | 0.0600 |
| 0<=X<200 | 0.3896 | 0.2567 |
| <0 | 0.4671 | 0.2783 |
| target_mean | frequency |
|---|---|
| 0.0694 | 0.3600 |
| 0.0833 | 0.0600 |
| 0.3710 | 0.3100 |
| 0.5741 | 0.2700 |
Computing associations: 7it [00:00, 5284.40it/s]
Testing robustness : 0%| | 0/7 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| no checking, >=200 | 0.1505 | 0.4650 |
| 0<=X<200, <0 | 0.4299 | 0.5350 |
| target_mean | frequency |
|---|---|
| 0.0714 | 0.4200 |
| 0.4655 | 0.5800 |
--- [BinaryCarver] Fit Categorical('credit_history') (2/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| critical/other existing credit | 0.1676 | 0.2883 |
| existing paid | 0.3185 | 0.5233 |
| delayed previously | 0.3621 | 0.0967 |
| all paid | 0.5455 | 0.0550 |
| no credits/all paid | 0.5455 | 0.0367 |
| target_mean | frequency |
|---|---|
| 0.2241 | 0.2900 |
| 0.2703 | 0.5550 |
| 0.3571 | 0.0700 |
| 0.7273 | 0.0550 |
| 0.6667 | 0.0300 |
Computing associations: 15it [00:00, 10330.80it/s]
Testing robustness : 0%| | 0/15 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| critical/other existing credit | 0.1676 | 0.2883 |
| existing paid, delayed previously | 0.3253 | 0.6200 |
| all paid, no credits/all paid | 0.5455 | 0.0917 |
| target_mean | frequency |
|---|---|
| 0.2241 | 0.2900 |
| 0.2800 | 0.6250 |
| 0.7059 | 0.0850 |
--- [BinaryCarver] Fit Categorical('purpose') (3/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| used car | 0.1875 | 0.1067 |
| radio/tv | 0.2303 | 0.2750 |
| other, domestic appliance, retraining | 0.2632 | 0.0317 |
| furniture/equipment | 0.3333 | 0.1700 |
| new car | 0.3401 | 0.2450 |
| business | 0.3729 | 0.0983 |
| repairs | 0.3750 | 0.0267 |
| education | 0.4643 | 0.0467 |
| target_mean | frequency |
|---|---|
| 0.1250 | 0.0800 |
| 0.2295 | 0.3050 |
| 0.2727 | 0.0550 |
| 0.3235 | 0.1700 |
| 0.4222 | 0.2250 |
| 0.2778 | 0.0900 |
| 0.0000 | 0.0100 |
| 0.4615 | 0.0650 |
Computing associations: 98it [00:00, 96015.37it/s]
Testing robustness : 0%| | 0/98 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| used car, radio/tv, other, domestic appliance, ret... | 0.2218 | 0.4133 |
| new car, furniture/equipment, business, education,... | 0.3551 | 0.5867 |
| target_mean | frequency |
|---|---|
| 0.2159 | 0.4400 |
| 0.3661 | 0.5600 |
--- [BinaryCarver] Fit Categorical('savings_status') (4/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| >=1000 | 0.0667 | 0.0500 |
| 500<=X<1000 | 0.1622 | 0.0617 |
| no known savings | 0.1714 | 0.1750 |
| 100<=X<500 | 0.3333 | 0.1150 |
| <100 | 0.3649 | 0.5983 |
| target_mean | frequency |
|---|---|
| 0.3333 | 0.0300 |
| 0.1250 | 0.0800 |
| 0.1667 | 0.1800 |
| 0.3889 | 0.0900 |
| 0.3468 | 0.6200 |
Computing associations: 15it [00:00, ?it/s]
Testing robustness : 0%| | 0/15 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| no known savings, >=1000, 500<=X<1000 | 0.1512 | 0.2867 |
| <100, 100<=X<500 | 0.3598 | 0.7133 |
| target_mean | frequency |
|---|---|
| 0.1724 | 0.2900 |
| 0.3521 | 0.7100 |
--- [BinaryCarver] Fit Categorical('employment') (5/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| 4<=X<7 | 0.1935 | 0.1550 |
| >=7 | 0.2516 | 0.2650 |
| 1<=X<4 | 0.2911 | 0.3550 |
| <1 | 0.4272 | 0.1717 |
| unemployed | 0.5000 | 0.0533 |
| target_mean | frequency |
|---|---|
| 0.2632 | 0.1900 |
| 0.2600 | 0.2500 |
| 0.3621 | 0.2900 |
| 0.3333 | 0.1800 |
| 0.2222 | 0.0900 |
Computing associations: 15it [00:00, ?it/s]
Testing robustness : 60%|██████ | 9/15 [00:00<00:00, 220.01it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| >=7, 4<=X<7 | 0.2302 | 0.4200 |
| unemployed, 1<=X<4, <1 | 0.3506 | 0.5800 |
| target_mean | frequency |
|---|---|
| 0.2614 | 0.4400 |
| 0.3304 | 0.5600 |
--- [BinaryCarver] Fit Categorical('personal_status') (6/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| male single | 0.2679 | 0.5600 |
| male mar/wid | 0.2778 | 0.0900 |
| female div/dep/mar | 0.3559 | 0.2950 |
| male div/sep | 0.3636 | 0.0550 |
| target_mean | frequency |
|---|---|
| 0.2830 | 0.5300 |
| 0.2381 | 0.1050 |
| 0.3385 | 0.3250 |
| 0.3750 | 0.0400 |
Computing associations: 7it [00:00, 6363.27it/s]
Testing robustness : 0%| | 0/7 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| male single, male mar/wid | 0.2692 | 0.6500 |
| female div/dep/mar, male div/sep | 0.3571 | 0.3500 |
| target_mean | frequency |
|---|---|
| 0.2756 | 0.6350 |
| 0.3425 | 0.3650 |
--- [BinaryCarver] Fit Categorical('other_parties') (7/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| guarantor | 0.1786 | 0.0467 |
| none | 0.2996 | 0.9067 |
| co applicant | 0.4286 | 0.0467 |
| target_mean | frequency |
|---|---|
| 0.2500 | 0.0400 |
| 0.2989 | 0.9200 |
| 0.3750 | 0.0400 |
Computing associations: 3it [00:00, 3005.95it/s]
Testing robustness : 100%|██████████| 3/3 [00:00<00:00, 520.41it/s]
WARNING: No robust combination for Categorical('other_parties'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Categorical('property_magnitude') (8/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| real estate | 0.2130 | 0.2817 |
| life insurance | 0.3125 | 0.2133 |
| car | 0.3143 | 0.3500 |
| no known property | 0.4086 | 0.1550 |
| target_mean | frequency |
|---|---|
| 0.2182 | 0.2750 |
| 0.2600 | 0.2500 |
| 0.3281 | 0.3200 |
| 0.4516 | 0.1550 |
Computing associations: 7it [00:00, ?it/s]
Testing robustness : 0%| | 0/7 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| real estate | 0.2130 | 0.2817 |
| car, life insurance | 0.3136 | 0.5633 |
| no known property | 0.4086 | 0.1550 |
| target_mean | frequency |
|---|---|
| 0.2182 | 0.2750 |
| 0.2982 | 0.5700 |
| 0.4516 | 0.1550 |
--- [BinaryCarver] Fit Categorical('other_payment_plans') (9/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| none | 0.2619 | 0.8083 |
| stores | 0.4375 | 0.0533 |
| bank | 0.4699 | 0.1383 |
| target_mean | frequency |
|---|---|
| 0.2866 | 0.8200 |
| 0.4444 | 0.0450 |
| 0.3333 | 0.1350 |
Computing associations: 3it [00:00, 2997.36it/s]
Testing robustness : 0%| | 0/3 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| none | 0.2619 | 0.8083 |
| bank, stores | 0.4609 | 0.1917 |
| target_mean | frequency |
|---|---|
| 0.2866 | 0.8200 |
| 0.3611 | 0.1800 |
--- [BinaryCarver] Fit Categorical('housing') (10/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| own | 0.2558 | 0.7233 |
| for free | 0.3750 | 0.1067 |
| rent | 0.4412 | 0.1700 |
| target_mean | frequency |
|---|---|
| 0.2857 | 0.7350 |
| 0.4348 | 0.1150 |
| 0.2667 | 0.1500 |
Computing associations: 3it [00:00, ?it/s]
Testing robustness : 0%| | 0/3 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| own | 0.2558 | 0.7233 |
| for free, rent | 0.4157 | 0.2767 |
| target_mean | frequency |
|---|---|
| 0.2857 | 0.7350 |
| 0.3396 | 0.2650 |
--- [BinaryCarver] Fit Categorical('job') (11/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| skilled | 0.2898 | 0.6383 |
| unskilled resident | 0.2966 | 0.1967 |
| high qualif/self emp/mgmt | 0.3258 | 0.1483 |
| unemp/unskilled non res | 0.5000 | 0.0167 |
| target_mean | frequency |
|---|---|
| 0.2541 | 0.6100 |
| 0.3171 | 0.2050 |
| 0.4839 | 0.1550 |
| 0.1667 | 0.0300 |
Computing associations: 7it [00:00, ?it/s]
Testing robustness : 57%|█████▋ | 4/7 [00:00<00:00, 363.24it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| skilled, unskilled resident | 0.2914 | 0.8350 |
| high qualif/self emp/mgmt, unemp/unskilled non res | 0.3434 | 0.1650 |
| target_mean | frequency |
|---|---|
| 0.2699 | 0.8150 |
| 0.4324 | 0.1850 |
--- [BinaryCarver] Fit Categorical('own_telephone') (12/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| yes | 0.2645 | 0.4033 |
| none | 0.3240 | 0.5967 |
| target_mean | frequency |
|---|---|
| 0.3125 | 0.4000 |
| 0.2917 | 0.6000 |
Computing associations: 1it [00:00, ?it/s]
Testing robustness : 100%|██████████| 1/1 [00:00<00:00, 189.40it/s]
WARNING: No robust combination for Categorical('own_telephone'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Categorical('foreign_worker') (13/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| no | 0.0435 | 0.0383 |
| yes | 0.3102 | 0.9617 |
| target_mean | frequency |
|---|---|
| 0.3333 | 0.0300 |
| 0.2990 | 0.9700 |
Computing associations: 1it [00:00, ?it/s]
Testing robustness : 100%|██████████| 1/1 [00:00<00:00, 473.08it/s]
WARNING: No robust combination for Categorical('foreign_worker'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('duration') (14/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 8.00e+00 | 0.0980 | 0.0850 |
| 8.00e+00 < x <= 9.00e+00 | 0.2333 | 0.0500 |
| 9.00e+00 < x <= 1.10e+01 | 0.0870 | 0.0383 |
| 1.10e+01 < x <= 1.20e+01 | 0.2883 | 0.1850 |
| 1.20e+01 < x <= 1.50e+01 | 0.2273 | 0.0733 |
| 1.50e+01 < x <= 1.80e+01 | 0.3692 | 0.1083 |
| 1.80e+01 < x <= 2.20e+01 | 0.2381 | 0.0350 |
| 2.20e+01 < x <= 2.40e+01 | 0.3333 | 0.1950 |
| 2.40e+01 < x <= 2.80e+01 | 0.2222 | 0.0150 |
| 2.80e+01 < x <= 3.30e+01 | 0.3846 | 0.0433 |
| 3.30e+01 < x <= 3.60e+01 | 0.4727 | 0.0917 |
| 3.60e+01 < x <= 4.70e+01 | 0.2667 | 0.0250 |
| 4.70e+01 < x | 0.4242 | 0.0550 |
| target_mean | frequency |
|---|---|
| 0.1000 | 0.1000 |
| 0.3077 | 0.0650 |
| 0.0000 | 0.0400 |
| 0.2432 | 0.1850 |
| 0.0714 | 0.0700 |
| 0.3043 | 0.1150 |
| 0.4444 | 0.0450 |
| 0.3548 | 0.1550 |
| 0.7500 | 0.0200 |
| 0.4286 | 0.0350 |
| 0.3529 | 0.0850 |
| 0.6667 | 0.0150 |
| 0.5714 | 0.0700 |
Computing associations: 793it [00:00, 113615.13it/s]
Testing robustness : 0%| | 0/793 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.10e+01 | 0.1346 | 0.1733 |
| 1.10e+01 < x <= 2.80e+01 | 0.3052 | 0.6117 |
| 2.80e+01 < x | 0.4186 | 0.2150 |
| target_mean | frequency |
|---|---|
| 0.1463 | 0.2050 |
| 0.2966 | 0.5900 |
| 0.4634 | 0.2050 |
--- [BinaryCarver] Fit Quantitative('credit_amount') (15/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 6.18e+02 | 0.2000 | 0.0250 |
| 6.18e+02 < x <= 7.08e+02 | 0.4000 | 0.0250 |
| 7.08e+02 < x <= 7.97e+02 | 0.3333 | 0.0250 |
| 7.97e+02 < x <= 9.09e+02 | 0.4000 | 0.0250 |
| 9.09e+02 < x <= 1.03e+03 | 0.4000 | 0.0250 |
| 1.03e+03 < x <= 1.16e+03 | 0.2000 | 0.0250 |
| 1.16e+03 < x <= 1.21e+03 | 0.2667 | 0.0250 |
| 1.21e+03 < x <= 1.26e+03 | 0.2000 | 0.0250 |
| 1.26e+03 < x <= 1.31e+03 | 0.3333 | 0.0250 |
| 1.31e+03 < x <= 1.37e+03 | 0.4667 | 0.0250 |
| 1.37e+03 < x <= 1.41e+03 | 0.1250 | 0.0267 |
| 1.41e+03 < x <= 1.47e+03 | 0.1429 | 0.0233 |
| 1.47e+03 < x <= 1.53e+03 | 0.2667 | 0.0250 |
| 1.53e+03 < x <= 1.60e+03 | 0.2000 | 0.0250 |
| 1.60e+03 < x <= 1.82e+03 | 0.2000 | 0.0250 |
| 1.82e+03 < x <= 1.92e+03 | 0.5000 | 0.0267 |
| 1.92e+03 < x <= 1.98e+03 | 0.2857 | 0.0233 |
| 1.98e+03 < x <= 2.12e+03 | 0.3333 | 0.0250 |
| 2.12e+03 < x <= 2.21e+03 | 0.2667 | 0.0250 |
| 2.21e+03 < x <= 2.30e+03 | 0.2667 | 0.0250 |
| 2.30e+03 < x <= 2.38e+03 | 0.2000 | 0.0250 |
| 2.38e+03 < x <= 2.48e+03 | 0.4000 | 0.0250 |
| 2.48e+03 < x <= 2.62e+03 | 0.2667 | 0.0250 |
| 2.62e+03 < x <= 2.75e+03 | 0.3333 | 0.0250 |
| 2.75e+03 < x <= 2.92e+03 | 0.2000 | 0.0250 |
| 2.92e+03 < x <= 3.07e+03 | 0.2000 | 0.0250 |
| 3.07e+03 < x <= 3.35e+03 | 0.4000 | 0.0250 |
| 3.35e+03 < x <= 3.51e+03 | 0.1333 | 0.0250 |
| 3.51e+03 < x <= 3.63e+03 | 0.1333 | 0.0250 |
| 3.63e+03 < x <= 3.91e+03 | 0.0667 | 0.0250 |
| 3.91e+03 < x <= 4.24e+03 | 0.4667 | 0.0250 |
| 4.24e+03 < x <= 4.66e+03 | 0.4000 | 0.0250 |
| 4.66e+03 < x <= 5.08e+03 | 0.4667 | 0.0250 |
| 5.08e+03 < x <= 5.80e+03 | 0.2000 | 0.0250 |
| 5.80e+03 < x <= 6.36e+03 | 0.2667 | 0.0250 |
| 6.36e+03 < x <= 6.85e+03 | 0.4667 | 0.0250 |
| 6.85e+03 < x <= 7.48e+03 | 0.2000 | 0.0250 |
| 7.48e+03 < x <= 8.23e+03 | 0.4667 | 0.0250 |
| 8.23e+03 < x <= 9.57e+03 | 0.4000 | 0.0250 |
| 9.57e+03 < x | 0.5333 | 0.0250 |
| target_mean | frequency |
|---|---|
| 0.2000 | 0.0250 |
| 0.5000 | 0.0200 |
| 0.5000 | 0.0300 |
| 0.0000 | 0.0100 |
| 0.3333 | 0.0300 |
| 0.1429 | 0.0350 |
| 0.5000 | 0.0100 |
| 0.3333 | 0.0600 |
| 0.0000 | 0.0100 |
| 0.2857 | 0.0350 |
| 0.0000 | 0.0150 |
| 0.3333 | 0.0300 |
| 0.2500 | 0.0200 |
| 0.0000 | 0.0150 |
| 0.3333 | 0.0300 |
| 0.2857 | 0.0350 |
| 0.2500 | 0.0200 |
| 0.0000 | 0.0400 |
| 0.5000 | 0.0100 |
| 0.5000 | 0.0100 |
| 0.0000 | 0.0150 |
| 0.0000 | 0.0050 |
| 0.6667 | 0.0150 |
| 0.0000 | 0.0200 |
| 0.0000 | 0.0200 |
| 0.3333 | 0.0150 |
| 0.2000 | 0.0500 |
| 0.5000 | 0.0400 |
| 0.0000 | 0.0300 |
| 0.1000 | 0.0500 |
| 0.2500 | 0.0200 |
| 0.8000 | 0.0250 |
| 0.3333 | 0.0150 |
| 0.4000 | 0.0250 |
| 0.2857 | 0.0350 |
| 0.0000 | 0.0200 |
| 0.6667 | 0.0150 |
| 0.6667 | 0.0150 |
| 0.6667 | 0.0150 |
| 0.6154 | 0.0650 |
Computing associations: 92170it [00:03, 25717.85it/s]
Testing robustness : 0%| | 0/92170 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.35e+03 | 0.2889 | 0.6750 |
| 3.35e+03 < x <= 3.91e+03 | 0.1111 | 0.0750 |
| 3.91e+03 < x | 0.3867 | 0.2500 |
| target_mean | frequency |
|---|---|
| 0.2460 | 0.6300 |
| 0.2083 | 0.1200 |
| 0.4800 | 0.2500 |
--- [BinaryCarver] Fit Quantitative('installment_commitment') (16/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.00e+00 | 0.2436 | 0.1300 |
| 1.00e+00 < x <= 2.00e+00 | 0.2606 | 0.2367 |
| 2.00e+00 < x <= 3.00e+00 | 0.2979 | 0.1567 |
| 3.00e+00 < x | 0.3357 | 0.4767 |
| target_mean | frequency |
|---|---|
| 0.1071 | 0.1400 |
| 0.2667 | 0.2250 |
| 0.2414 | 0.1450 |
| 0.3878 | 0.4900 |
Computing associations: 7it [00:00, ?it/s]
Testing robustness : 0%| | 0/7 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.0e+00 | 0.2545 | 0.3667 |
| 2.0e+00 < x | 0.3263 | 0.6333 |
| target_mean | frequency |
|---|---|
| 0.2055 | 0.3650 |
| 0.3543 | 0.6350 |
--- [BinaryCarver] Fit Quantitative('residence_since') (17/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.00e+00 | 0.3117 | 0.1283 |
| 1.00e+00 < x <= 2.00e+00 | 0.2905 | 0.2983 |
| 2.00e+00 < x <= 3.00e+00 | 0.3000 | 0.1667 |
| 3.00e+00 < x | 0.3033 | 0.4067 |
| target_mean | frequency |
|---|---|
| 0.2174 | 0.1150 |
| 0.3529 | 0.3400 |
| 0.3333 | 0.1500 |
| 0.2658 | 0.3950 |
Computing associations: 7it [00:00, ?it/s]
Testing robustness : 100%|██████████| 7/7 [00:00<00:00, 187.60it/s]
WARNING: No robust combination for Quantitative('residence_since'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('age') (18/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.10e+01 | 0.4000 | 0.0250 |
| 2.10e+01 < x <= 2.20e+01 | 0.3684 | 0.0317 |
| 2.20e+01 < x <= 2.30e+01 | 0.4500 | 0.0333 |
| 2.30e+01 < x <= 2.40e+01 | 0.3333 | 0.0350 |
| 2.40e+01 < x <= 2.50e+01 | 0.5161 | 0.0517 |
| 2.50e+01 < x <= 2.60e+01 | 0.2500 | 0.0467 |
| 2.60e+01 < x <= 2.70e+01 | 0.2258 | 0.0517 |
| 2.70e+01 < x <= 2.80e+01 | 0.4091 | 0.0367 |
| 2.80e+01 < x <= 2.90e+01 | 0.3913 | 0.0383 |
| 2.90e+01 < x <= 3.00e+01 | 0.2143 | 0.0467 |
| 3.00e+01 < x <= 3.10e+01 | 0.2308 | 0.0433 |
| 3.10e+01 < x <= 3.20e+01 | 0.2500 | 0.0333 |
| 3.20e+01 < x <= 3.30e+01 | 0.3636 | 0.0367 |
| 3.30e+01 < x <= 3.40e+01 | 0.3636 | 0.0367 |
| 3.40e+01 < x <= 3.50e+01 | 0.1724 | 0.0483 |
| 3.50e+01 < x <= 3.60e+01 | 0.2083 | 0.0400 |
| 3.60e+01 < x <= 3.70e+01 | 0.3333 | 0.0250 |
| 3.70e+01 < x <= 3.80e+01 | 0.1875 | 0.0267 |
| 3.80e+01 < x <= 3.90e+01 | 0.2941 | 0.0283 |
| 3.90e+01 < x <= 4.10e+01 | 0.3182 | 0.0367 |
| 4.10e+01 < x <= 4.20e+01 | 0.2727 | 0.0183 |
| 4.20e+01 < x <= 4.40e+01 | 0.1905 | 0.0350 |
| 4.40e+01 < x <= 4.60e+01 | 0.2632 | 0.0317 |
| 4.60e+01 < x <= 4.70e+01 | 0.4000 | 0.0167 |
| 4.70e+01 < x <= 4.90e+01 | 0.1429 | 0.0233 |
| 4.90e+01 < x <= 5.10e+01 | 0.1429 | 0.0233 |
| 5.10e+01 < x <= 5.40e+01 | 0.2941 | 0.0283 |
| 5.40e+01 < x <= 5.70e+01 | 0.3333 | 0.0200 |
| 5.70e+01 < x <= 6.30e+01 | 0.4375 | 0.0267 |
| 6.30e+01 < x | 0.2667 | 0.0250 |
| target_mean | frequency |
|---|---|
| 0.3333 | 0.0300 |
| 0.5000 | 0.0200 |
| 0.3333 | 0.0750 |
| 0.6364 | 0.0550 |
| 0.3333 | 0.0150 |
| 0.3333 | 0.0600 |
| 0.1538 | 0.0650 |
| 0.1429 | 0.0350 |
| 0.4000 | 0.0250 |
| 0.5000 | 0.0500 |
| 0.3333 | 0.0300 |
| 0.2000 | 0.0250 |
| 0.3750 | 0.0400 |
| 0.3333 | 0.0150 |
| 0.2500 | 0.0200 |
| 0.1429 | 0.0350 |
| 0.2500 | 0.0400 |
| 0.2500 | 0.0200 |
| 0.0000 | 0.0050 |
| 0.2308 | 0.0650 |
| 0.6000 | 0.0250 |
| 0.3333 | 0.0300 |
| 0.1250 | 0.0400 |
| 0.0000 | 0.0200 |
| 0.2000 | 0.0250 |
| 0.5000 | 0.0100 |
| 0.6000 | 0.0250 |
| 0.2500 | 0.0200 |
| 0.2500 | 0.0400 |
| 0.0000 | 0.0400 |
Computing associations: 27840it [00:00, 36613.59it/s]
Testing robustness : 0%| | 0/27840 [00:00<?, ?it/s]
[BinaryCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.5e+01 | 0.4245 | 0.1767 |
| 2.5e+01 < x | 0.2733 | 0.8233 |
| target_mean | frequency |
|---|---|
| 0.4359 | 0.1950 |
| 0.2671 | 0.8050 |
--- [BinaryCarver] Fit Quantitative('existing_credits') (19/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.00e+00 | 0.3061 | 0.6317 |
| 1.00e+00 < x <= 2.00e+00 | 0.2899 | 0.3450 |
| 2.00e+00 < x | 0.2857 | 0.0233 |
| target_mean | frequency |
|---|---|
| 0.3000 | 0.6500 |
| 0.3016 | 0.3150 |
| 0.2857 | 0.0350 |
Computing associations: 3it [00:00, ?it/s]
Testing robustness : 100%|██████████| 3/3 [00:00<00:00, 489.53it/s]
WARNING: No robust combination for Quantitative('existing_credits'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
--- [BinaryCarver] Fit Quantitative('num_dependents') (20/20)
[BinaryCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.0e+00 | 0.2984 | 0.8433 |
| 1.0e+00 < x | 0.3085 | 0.1567 |
| target_mean | frequency |
|---|---|
| 0.3000 | 0.8500 |
| 0.3000 | 0.1500 |
Computing associations: 1it [00:00, ?it/s]
Testing robustness : 100%|██████████| 1/1 [00:00<00:00, 224.23it/s]
WARNING: No robust combination for Quantitative('num_dependents'). Consider increasing the size of X_dev or dropping the feature (X not representative of X_dev for this feature).
[5]:
| library | fit_s | transform_s | train_auc | test_auc | auc_drop | |
|---|---|---|---|---|---|---|
| 0 | AutoCarver | 6.196 | 0.0126 | 0.8321 | 0.7874 | 0.0447 |
| 1 | optbinning | 1.150 | 0.0131 | 0.8523 | 0.7931 | 0.0592 |
| 2 | KBinsDiscretizer | 0.003 | 0.0010 | 0.8401 | 0.7943 | 0.0458 |
[6]:
plot_bars(binary_results, ['fit_s', 'test_auc', 'auc_drop'], 'German Credit \u2014 binary classification')
Here, AutoCarver has dropped 6 columns that were not stable on dev set.
Regression — California Housing
6 numeric demographic features (Latitude / Longitude dropped — see comment in the next cell), 20,640 rows, target = median house value. Same 60 / 20 / 20 split.
[7]:
housing = fetch_california_housing(as_frame=True)
X_reg = housing.frame.drop(columns=['MedHouseVal'])
y_reg = housing.frame['MedHouseVal']
X_train, X_rest, y_train, y_rest = train_test_split(X_reg, y_reg, test_size=0.4, random_state=SEED)
X_dev, X_test, y_dev, y_test = train_test_split(X_rest, y_rest, test_size=0.5, random_state=SEED)
quantitatives = list(X_reg.columns)
categoricals = []
print(f'train={len(X_train)}, dev={len(X_dev)}, test={len(X_test)}')
print(f'quantitatives={len(quantitatives)} ({quantitatives})')
train=12384, dev=4128, test=4128
quantitatives=8 (['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
[8]:
y_train_full = pd.concat([y_train, y_dev])
runs = [(
'AutoCarver',
lambda: bin_with_autocarver(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'continuous'),
)]
if HAS_OPTBINNING:
runs.append((
'optbinning',
lambda: bin_with_optbinning(X_train, y_train, X_dev, y_dev, X_test, categoricals, quantitatives, 'continuous'),
))
runs.append((
'KBinsDiscretizer',
lambda: bin_with_kbins(X_train, X_dev, X_test, categoricals, quantitatives),
))
rows = []
for name, run in runs:
X_tr, X_te, fit_t, transform_t = run()
scores = fit_eval_regression(X_tr, X_te, y_train_full, y_test)
rows.append({
'library': name,
'fit_s': round(fit_t, 3),
'transform_s': round(transform_t, 4),
'train_r2': round(scores['train_r2'], 4),
'test_r2': round(scores['test_r2'], 4),
'r2_drop': round(scores['train_r2'] - scores['test_r2'], 4),
})
regression_results = pd.DataFrame(rows)
regression_results
------
--- [QuantitativeDiscretizer] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
- [ContinuousDiscretizer] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
- [OrdinalDiscretizer] Fit Features(['HouseAge'])
------
---------
------ [ContinuousCarver] Fit Features(['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude'])
--- [ContinuousCarver] Fit Quantitative('MedInc') (1/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.335e+00 | 1.1984 | 0.0250 |
| 1.335e+00 < x <= 1.593e+00 | 1.0105 | 0.0250 |
| 1.593e+00 < x <= 1.740e+00 | 1.1133 | 0.0250 |
| 1.740e+00 < x <= 1.906e+00 | 1.1535 | 0.0252 |
| 1.906e+00 < x <= 2.029e+00 | 1.2090 | 0.0248 |
| 2.029e+00 < x <= 2.152e+00 | 1.2141 | 0.0251 |
| 2.152e+00 < x <= 2.243e+00 | 1.2417 | 0.0250 |
| 2.243e+00 < x <= 2.350e+00 | 1.3827 | 0.0249 |
| 2.350e+00 < x <= 2.468e+00 | 1.3614 | 0.0250 |
| 2.468e+00 < x <= 2.569e+00 | 1.4190 | 0.0250 |
| 2.569e+00 < x <= 2.655e+00 | 1.5264 | 0.0250 |
| 2.655e+00 < x <= 2.737e+00 | 1.5428 | 0.0250 |
| 2.737e+00 < x <= 2.862e+00 | 1.5708 | 0.0250 |
| 2.862e+00 < x <= 2.974e+00 | 1.6630 | 0.0250 |
| 2.974e+00 < x <= 3.054e+00 | 1.6270 | 0.0250 |
| 3.054e+00 < x <= 3.135e+00 | 1.7079 | 0.0250 |
| 3.135e+00 < x <= 3.216e+00 | 1.8554 | 0.0250 |
| 3.216e+00 < x <= 3.315e+00 | 1.8373 | 0.0250 |
| 3.315e+00 < x <= 3.423e+00 | 1.9121 | 0.0250 |
| 3.423e+00 < x <= 3.531e+00 | 1.9162 | 0.0251 |
| 3.531e+00 < x <= 3.633e+00 | 1.9678 | 0.0250 |
| 3.633e+00 < x <= 3.723e+00 | 2.0226 | 0.0250 |
| 3.723e+00 < x <= 3.839e+00 | 1.9891 | 0.0251 |
| 3.839e+00 < x <= 3.971e+00 | 2.0493 | 0.0249 |
| 3.971e+00 < x <= 4.073e+00 | 2.0538 | 0.0252 |
| 4.073e+00 < x <= 4.179e+00 | 2.2004 | 0.0249 |
| 4.179e+00 < x <= 4.315e+00 | 2.2417 | 0.0250 |
| 4.315e+00 < x <= 4.464e+00 | 2.2394 | 0.0250 |
| 4.464e+00 < x <= 4.611e+00 | 2.2577 | 0.0252 |
| 4.611e+00 < x <= 4.757e+00 | 2.4351 | 0.0248 |
| 4.757e+00 < x <= 4.946e+00 | 2.3482 | 0.0250 |
| 4.946e+00 < x <= 5.117e+00 | 2.4592 | 0.0250 |
| 5.117e+00 < x <= 5.308e+00 | 2.5784 | 0.0250 |
| 5.308e+00 < x <= 5.538e+00 | 2.6892 | 0.0250 |
| 5.538e+00 < x <= 5.828e+00 | 2.7867 | 0.0251 |
| 5.828e+00 < x <= 6.148e+00 | 3.0943 | 0.0249 |
| 6.148e+00 < x <= 6.599e+00 | 3.3031 | 0.0250 |
| 6.599e+00 < x <= 7.313e+00 | 3.6064 | 0.0250 |
| 7.313e+00 < x <= 8.433e+00 | 4.0191 | 0.0250 |
| 8.433e+00 < x | 4.7343 | 0.0250 |
| target_mean | frequency |
|---|---|
| 1.2507 | 0.0247 |
| 1.0319 | 0.0262 |
| 1.1587 | 0.0257 |
| 1.0855 | 0.0252 |
| 1.2523 | 0.0225 |
| 1.2606 | 0.0293 |
| 1.2643 | 0.0208 |
| 1.3335 | 0.0274 |
| 1.4528 | 0.0257 |
| 1.4887 | 0.0305 |
| 1.5142 | 0.0237 |
| 1.6485 | 0.0208 |
| 1.5544 | 0.0293 |
| 1.6189 | 0.0257 |
| 1.7433 | 0.0233 |
| 1.6369 | 0.0213 |
| 1.7802 | 0.0276 |
| 1.9721 | 0.0283 |
| 1.8287 | 0.0279 |
| 1.8295 | 0.0242 |
| 1.9907 | 0.0300 |
| 1.9517 | 0.0216 |
| 2.0220 | 0.0269 |
| 2.1509 | 0.0269 |
| 2.0977 | 0.0291 |
| 2.2054 | 0.0225 |
| 2.2979 | 0.0274 |
| 2.3553 | 0.0274 |
| 2.2924 | 0.0184 |
| 2.4401 | 0.0213 |
| 2.2931 | 0.0250 |
| 2.4940 | 0.0237 |
| 2.6133 | 0.0250 |
| 2.7177 | 0.0189 |
| 2.9110 | 0.0276 |
| 3.0729 | 0.0213 |
| 3.0759 | 0.0271 |
| 3.5985 | 0.0228 |
| 4.0385 | 0.0206 |
| 4.6131 | 0.0264 |
Computing associations: 92170it [00:03, 27184.56it/s]
Testing robustness : 0%| | 0/92170 [00:00<?, ?it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.47e+00 | 1.2093 | 0.2250 |
| 2.47e+00 < x <= 3.13e+00 | 1.5796 | 0.1750 |
| 3.13e+00 < x <= 4.07e+00 | 1.9560 | 0.2251 |
| 4.07e+00 < x <= 5.83e+00 | 2.4238 | 0.2499 |
| 5.83e+00 < x | 3.7524 | 0.1249 |
| target_mean | frequency |
|---|---|
| 1.2323 | 0.2275 |
| 1.5934 | 0.1747 |
| 1.9604 | 0.2425 |
| 2.4652 | 0.2372 |
| 3.6870 | 0.1182 |
--- [ContinuousCarver] Fit Quantitative('HouseAge') (2/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 5.00e+00 | 2.2358 | 0.0271 |
| 5.00e+00 < x <= 8.00e+00 | 1.9727 | 0.0263 |
| 8.00e+00 < x <= 1.10e+01 | 1.8133 | 0.0352 |
| 1.10e+01 < x <= 1.30e+01 | 1.8358 | 0.0267 |
| 1.30e+01 < x <= 1.40e+01 | 1.8778 | 0.0200 |
| 1.40e+01 < x <= 1.60e+01 | 1.9355 | 0.0652 |
| 1.60e+01 < x <= 1.70e+01 | 1.8929 | 0.0319 |
| 1.70e+01 < x <= 1.80e+01 | 1.9455 | 0.0276 |
| 1.80e+01 < x <= 2.00e+01 | 1.9470 | 0.0470 |
| 2.00e+01 < x <= 2.10e+01 | 1.9630 | 0.0217 |
| 2.10e+01 < x <= 2.20e+01 | 2.0661 | 0.0195 |
| 2.20e+01 < x <= 2.30e+01 | 1.9593 | 0.0220 |
| 2.30e+01 < x <= 2.50e+01 | 2.1713 | 0.0480 |
| 2.50e+01 < x <= 2.60e+01 | 2.0937 | 0.0304 |
| 2.60e+01 < x <= 2.70e+01 | 2.0568 | 0.0245 |
| 2.70e+01 < x <= 2.80e+01 | 1.9827 | 0.0241 |
| 2.80e+01 < x <= 2.90e+01 | 2.0203 | 0.0232 |
| 2.90e+01 < x <= 3.00e+01 | 2.0515 | 0.0236 |
| 3.00e+01 < x <= 3.20e+01 | 2.0453 | 0.0484 |
| 3.20e+01 < x <= 3.30e+01 | 2.0343 | 0.0316 |
| 3.30e+01 < x <= 3.40e+01 | 2.1357 | 0.0320 |
| 3.40e+01 < x <= 3.50e+01 | 2.0004 | 0.0399 |
| 3.50e+01 < x <= 3.60e+01 | 2.1148 | 0.0437 |
| 3.60e+01 < x <= 3.70e+01 | 2.0004 | 0.0257 |
| 3.70e+01 < x <= 3.90e+01 | 2.0133 | 0.0355 |
| 3.90e+01 < x <= 4.10e+01 | 2.0306 | 0.0273 |
| 4.10e+01 < x <= 4.20e+01 | 1.9889 | 0.0167 |
| 4.20e+01 < x <= 4.40e+01 | 2.0742 | 0.0351 |
| 4.40e+01 < x <= 4.50e+01 | 2.2977 | 0.0132 |
| 4.50e+01 < x <= 4.70e+01 | 1.9517 | 0.0211 |
| 4.70e+01 < x | 2.5848 | 0.0857 |
| target_mean | frequency |
|---|---|
| 2.0720 | 0.0245 |
| 1.9201 | 0.0269 |
| 1.9054 | 0.0344 |
| 1.8736 | 0.0216 |
| 1.8410 | 0.0196 |
| 1.8826 | 0.0606 |
| 1.8592 | 0.0375 |
| 1.8799 | 0.0283 |
| 1.8746 | 0.0436 |
| 1.9849 | 0.0206 |
| 2.2181 | 0.0170 |
| 2.1550 | 0.0201 |
| 2.0847 | 0.0579 |
| 2.0778 | 0.0296 |
| 2.1784 | 0.0216 |
| 2.2242 | 0.0208 |
| 1.7802 | 0.0213 |
| 1.7629 | 0.0233 |
| 2.0493 | 0.0504 |
| 1.9343 | 0.0259 |
| 2.0837 | 0.0349 |
| 2.1957 | 0.0417 |
| 2.0157 | 0.0431 |
| 2.2006 | 0.0296 |
| 2.0026 | 0.0351 |
| 1.9461 | 0.0305 |
| 1.9196 | 0.0194 |
| 2.0117 | 0.0312 |
| 2.1310 | 0.0155 |
| 2.0515 | 0.0225 |
| 2.5968 | 0.0911 |
Computing associations: 31930it [00:00, 33725.96it/s]
Testing robustness : 1%| | 310/31930 [00:00<00:54, 584.35it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.30e+01 | 1.9466 | 0.3703 |
| 2.30e+01 < x <= 2.60e+01 | 2.1412 | 0.0785 |
| 2.60e+01 < x <= 3.60e+01 | 2.0526 | 0.2909 |
| 3.60e+01 < x <= 4.70e+01 | 2.0381 | 0.1747 |
| 4.70e+01 < x | 2.5848 | 0.0857 |
| target_mean | frequency |
|---|---|
| 1.9316 | 0.3547 |
| 2.0824 | 0.0875 |
| 2.0383 | 0.2829 |
| 2.0347 | 0.1839 |
| 2.5968 | 0.0911 |
--- [ContinuousCarver] Fit Quantitative('AveRooms') (3/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.066e+00 | 1.9506 | 0.0250 |
| 3.066e+00 < x <= 3.432e+00 | 1.8880 | 0.0250 |
| 3.432e+00 < x <= 3.647e+00 | 1.8233 | 0.0250 |
| 3.647e+00 < x <= 3.792e+00 | 1.8292 | 0.0250 |
| 3.792e+00 < x <= 3.933e+00 | 1.7847 | 0.0250 |
| 3.933e+00 < x <= 4.052e+00 | 1.8499 | 0.0250 |
| 4.052e+00 < x <= 4.168e+00 | 1.8718 | 0.0250 |
| 4.168e+00 < x <= 4.276e+00 | 1.8333 | 0.0250 |
| 4.276e+00 < x <= 4.365e+00 | 1.7965 | 0.0250 |
| 4.365e+00 < x <= 4.454e+00 | 1.6952 | 0.0250 |
| 4.454e+00 < x <= 4.536e+00 | 1.7535 | 0.0250 |
| 4.536e+00 < x <= 4.621e+00 | 1.7952 | 0.0250 |
| 4.621e+00 < x <= 4.705e+00 | 1.8465 | 0.0250 |
| 4.705e+00 < x <= 4.794e+00 | 1.7486 | 0.0250 |
| 4.794e+00 < x <= 4.874e+00 | 1.7719 | 0.0250 |
| 4.874e+00 < x <= 4.941e+00 | 1.7219 | 0.0251 |
| 4.941e+00 < x <= 5.014e+00 | 1.7176 | 0.0249 |
| 5.014e+00 < x <= 5.088e+00 | 1.7707 | 0.0250 |
| 5.088e+00 < x <= 5.160e+00 | 1.7918 | 0.0250 |
| 5.160e+00 < x <= 5.233e+00 | 1.7791 | 0.0250 |
| 5.233e+00 < x <= 5.315e+00 | 1.8209 | 0.0250 |
| 5.315e+00 < x <= 5.384e+00 | 1.9107 | 0.0250 |
| 5.384e+00 < x <= 5.460e+00 | 1.7728 | 0.0250 |
| 5.460e+00 < x <= 5.532e+00 | 1.8996 | 0.0250 |
| 5.532e+00 < x <= 5.616e+00 | 1.8872 | 0.0250 |
| 5.616e+00 < x <= 5.694e+00 | 1.9905 | 0.0250 |
| 5.694e+00 < x <= 5.778e+00 | 2.0029 | 0.0250 |
| 5.778e+00 < x <= 5.858e+00 | 2.0107 | 0.0250 |
| 5.858e+00 < x <= 5.959e+00 | 2.1137 | 0.0250 |
| 5.959e+00 < x <= 6.059e+00 | 2.0469 | 0.0250 |
| 6.059e+00 < x <= 6.157e+00 | 2.1450 | 0.0250 |
| 6.157e+00 < x <= 6.270e+00 | 2.2477 | 0.0250 |
| 6.270e+00 < x <= 6.396e+00 | 2.3495 | 0.0250 |
| 6.396e+00 < x <= 6.543e+00 | 2.4232 | 0.0250 |
| 6.543e+00 < x <= 6.717e+00 | 2.6241 | 0.0250 |
| 6.717e+00 < x <= 6.946e+00 | 2.7573 | 0.0250 |
| 6.946e+00 < x <= 7.233e+00 | 3.0763 | 0.0250 |
| 7.233e+00 < x <= 7.637e+00 | 3.1118 | 0.0250 |
| 7.637e+00 < x <= 8.324e+00 | 3.5846 | 0.0250 |
| 8.324e+00 < x | 2.7391 | 0.0250 |
| target_mean | frequency |
|---|---|
| 2.0908 | 0.0233 |
| 1.8579 | 0.0264 |
| 2.0031 | 0.0242 |
| 1.8060 | 0.0274 |
| 1.8137 | 0.0240 |
| 1.7725 | 0.0211 |
| 1.7723 | 0.0283 |
| 1.7839 | 0.0247 |
| 1.7902 | 0.0286 |
| 1.8121 | 0.0264 |
| 1.6265 | 0.0264 |
| 1.8349 | 0.0276 |
| 1.8339 | 0.0247 |
| 1.7725 | 0.0342 |
| 1.8188 | 0.0254 |
| 1.8480 | 0.0191 |
| 1.8333 | 0.0235 |
| 1.8191 | 0.0266 |
| 1.7419 | 0.0266 |
| 1.7642 | 0.0220 |
| 1.7645 | 0.0303 |
| 1.7917 | 0.0266 |
| 1.8651 | 0.0262 |
| 1.8645 | 0.0274 |
| 1.8082 | 0.0286 |
| 1.8483 | 0.0177 |
| 2.0778 | 0.0240 |
| 2.0005 | 0.0187 |
| 1.9724 | 0.0291 |
| 2.2623 | 0.0235 |
| 2.0818 | 0.0230 |
| 2.2889 | 0.0250 |
| 2.3280 | 0.0213 |
| 2.5373 | 0.0254 |
| 2.6787 | 0.0201 |
| 2.7457 | 0.0211 |
| 3.0108 | 0.0303 |
| 3.1596 | 0.0233 |
| 3.4340 | 0.0235 |
| 2.7568 | 0.0245 |
Computing associations: 92170it [00:03, 28430.03it/s]
Testing robustness : 0%| | 227/92170 [00:00<03:45, 407.92it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.65e+00 | 1.8874 | 0.0750 |
| 3.65e+00 < x <= 5.62e+00 | 1.8022 | 0.5500 |
| 5.62e+00 < x <= 6.16e+00 | 2.0516 | 0.1500 |
| 6.16e+00 < x <= 6.54e+00 | 2.3401 | 0.0750 |
| 6.54e+00 < x | 2.9823 | 0.1500 |
| target_mean | frequency |
|---|---|
| 1.9788 | 0.0739 |
| 1.7962 | 0.5758 |
| 2.0474 | 0.1359 |
| 2.3886 | 0.0717 |
| 2.9752 | 0.1427 |
--- [ContinuousCarver] Fit Quantitative('AveBedrms') (4/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 9.1220e-01 | 2.0511 | 0.0250 |
| 9.1220e-01 < x <= 9.4022e-01 | 2.1264 | 0.0250 |
| 9.4022e-01 < x <= 9.5595e-01 | 2.0638 | 0.0250 |
| 9.5595e-01 < x <= 9.6743e-01 | 2.0756 | 0.0251 |
| 9.6743e-01 < x <= 9.7590e-01 | 2.2562 | 0.0249 |
| 9.7590e-01 < x <= 9.8343e-01 | 2.1709 | 0.0250 |
| 9.8343e-01 < x <= 9.8987e-01 | 2.1450 | 0.0250 |
| 9.8987e-01 < x <= 9.9592e-01 | 2.1772 | 0.0250 |
| 9.9592e-01 < x <= 1.0019e+00 | 2.1915 | 0.0251 |
| 1.0019e+00 < x <= 1.0068e+00 | 2.0949 | 0.0249 |
| 1.0068e+00 < x <= 1.0112e+00 | 2.2440 | 0.0250 |
| 1.0112e+00 < x <= 1.0156e+00 | 2.1687 | 0.0250 |
| 1.0156e+00 < x <= 1.0204e+00 | 2.1723 | 0.0250 |
| 1.0204e+00 < x <= 1.0250e+00 | 2.2003 | 0.0254 |
| 1.0250e+00 < x <= 1.0290e+00 | 2.1324 | 0.0246 |
| 1.0290e+00 < x <= 1.0331e+00 | 2.1840 | 0.0250 |
| 1.0331e+00 < x <= 1.0369e+00 | 2.0321 | 0.0250 |
| 1.0369e+00 < x <= 1.0412e+00 | 2.1746 | 0.0250 |
| 1.0412e+00 < x <= 1.0453e+00 | 2.2536 | 0.0250 |
| 1.0453e+00 < x <= 1.0493e+00 | 2.1546 | 0.0250 |
| 1.0493e+00 < x <= 1.0534e+00 | 2.0738 | 0.0251 |
| 1.0534e+00 < x <= 1.0574e+00 | 2.1224 | 0.0249 |
| 1.0574e+00 < x <= 1.0615e+00 | 2.0414 | 0.0250 |
| 1.0615e+00 < x <= 1.0662e+00 | 2.1569 | 0.0251 |
| 1.0662e+00 < x <= 1.0712e+00 | 2.0972 | 0.0250 |
| 1.0712e+00 < x <= 1.0763e+00 | 2.0714 | 0.0249 |
| 1.0763e+00 < x <= 1.0816e+00 | 2.0244 | 0.0250 |
| 1.0816e+00 < x <= 1.0874e+00 | 2.0135 | 0.0252 |
| 1.0874e+00 < x <= 1.0933e+00 | 2.2239 | 0.0249 |
| 1.0933e+00 < x <= 1.1000e+00 | 2.0244 | 0.0262 |
| 1.1000e+00 < x <= 1.1071e+00 | 2.0077 | 0.0242 |
| 1.1071e+00 < x <= 1.1160e+00 | 1.9564 | 0.0245 |
| 1.1160e+00 < x <= 1.1267e+00 | 2.0077 | 0.0250 |
| 1.1267e+00 < x <= 1.1387e+00 | 1.9305 | 0.0250 |
| 1.1387e+00 < x <= 1.1538e+00 | 1.8130 | 0.0258 |
| 1.1538e+00 < x <= 1.1739e+00 | 1.8060 | 0.0242 |
| 1.1739e+00 < x <= 1.2074e+00 | 1.9109 | 0.0250 |
| 1.2074e+00 < x <= 1.2730e+00 | 1.8950 | 0.0250 |
| 1.2730e+00 < x <= 1.5018e+00 | 1.7962 | 0.0250 |
| 1.5018e+00 < x | 1.4931 | 0.0250 |
| target_mean | frequency |
|---|---|
| 1.7961 | 0.0252 |
| 2.0098 | 0.0298 |
| 2.3039 | 0.0257 |
| 2.2390 | 0.0262 |
| 2.3293 | 0.0240 |
| 1.9318 | 0.0194 |
| 2.1575 | 0.0199 |
| 2.1740 | 0.0291 |
| 2.2207 | 0.0337 |
| 2.1811 | 0.0233 |
| 2.0475 | 0.0262 |
| 2.2743 | 0.0218 |
| 2.2627 | 0.0293 |
| 2.1068 | 0.0247 |
| 2.4459 | 0.0228 |
| 2.1280 | 0.0269 |
| 2.1193 | 0.0240 |
| 2.2280 | 0.0259 |
| 2.0336 | 0.0237 |
| 2.0195 | 0.0216 |
| 1.9898 | 0.0235 |
| 2.2270 | 0.0216 |
| 1.9244 | 0.0254 |
| 2.1509 | 0.0237 |
| 2.2223 | 0.0274 |
| 1.9654 | 0.0271 |
| 2.1085 | 0.0257 |
| 2.0332 | 0.0240 |
| 1.9262 | 0.0264 |
| 2.1139 | 0.0274 |
| 1.9025 | 0.0225 |
| 1.8628 | 0.0271 |
| 1.9501 | 0.0259 |
| 2.0231 | 0.0206 |
| 1.8622 | 0.0271 |
| 1.8137 | 0.0250 |
| 2.0399 | 0.0259 |
| 1.6392 | 0.0218 |
| 1.7221 | 0.0250 |
| 1.6019 | 0.0240 |
Computing associations: 92170it [00:03, 26708.78it/s]
Testing robustness : 2%|▏ | 1722/92170 [00:02<02:08, 706.46it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.049e+00 | 2.1535 | 0.5000 |
| 1.049e+00 < x <= 1.093e+00 | 2.0915 | 0.2250 |
| 1.093e+00 < x <= 1.139e+00 | 1.9857 | 0.1249 |
| 1.139e+00 < x <= 1.207e+00 | 1.8434 | 0.0750 |
| 1.207e+00 < x | 1.7279 | 0.0750 |
| target_mean | frequency |
|---|---|
| 2.1526 | 0.5029 |
| 2.0582 | 0.2248 |
| 1.9707 | 0.1235 |
| 1.9057 | 0.0780 |
| 1.6558 | 0.0707 |
--- [ContinuousCarver] Fit Quantitative('Population') (5/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.08e+02 | 1.9050 | 0.0251 |
| 2.08e+02 < x <= 3.53e+02 | 2.0277 | 0.0251 |
| 3.53e+02 < x <= 4.42e+02 | 2.0655 | 0.0250 |
| 4.42e+02 < x <= 5.12e+02 | 2.2067 | 0.0249 |
| 5.12e+02 < x <= 5.75e+02 | 2.1327 | 0.0250 |
| 5.75e+02 < x <= 6.27e+02 | 2.0731 | 0.0250 |
| 6.27e+02 < x <= 6.75e+02 | 2.3627 | 0.0249 |
| 6.75e+02 < x <= 7.16e+02 | 2.2006 | 0.0250 |
| 7.16e+02 < x <= 7.56e+02 | 2.0900 | 0.0253 |
| 7.56e+02 < x <= 7.94e+02 | 2.0191 | 0.0251 |
| 7.94e+02 < x <= 8.32e+02 | 2.3248 | 0.0251 |
| 8.32e+02 < x <= 8.67e+02 | 2.0763 | 0.0253 |
| 8.67e+02 < x <= 9.02e+02 | 2.0313 | 0.0247 |
| 9.02e+02 < x <= 9.40e+02 | 2.1185 | 0.0247 |
| 9.40e+02 < x <= 9.78e+02 | 2.1790 | 0.0253 |
| 9.78e+02 < x <= 1.02e+03 | 2.0746 | 0.0249 |
| 1.02e+03 < x <= 1.06e+03 | 1.9522 | 0.0247 |
| 1.06e+03 < x <= 1.09e+03 | 2.1186 | 0.0250 |
| 1.09e+03 < x <= 1.13e+03 | 2.0592 | 0.0252 |
| 1.13e+03 < x <= 1.17e+03 | 2.0640 | 0.0252 |
| 1.17e+03 < x <= 1.22e+03 | 2.0134 | 0.0249 |
| 1.22e+03 < x <= 1.26e+03 | 2.1690 | 0.0250 |
| 1.26e+03 < x <= 1.30e+03 | 2.0558 | 0.0248 |
| 1.30e+03 < x <= 1.35e+03 | 1.9711 | 0.0249 |
| 1.35e+03 < x <= 1.41e+03 | 2.0185 | 0.0250 |
| 1.41e+03 < x <= 1.46e+03 | 2.0004 | 0.0251 |
| 1.46e+03 < x <= 1.52e+03 | 2.0911 | 0.0248 |
| 1.52e+03 < x <= 1.59e+03 | 2.1322 | 0.0254 |
| 1.59e+03 < x <= 1.66e+03 | 1.9949 | 0.0246 |
| 1.66e+03 < x <= 1.73e+03 | 2.0233 | 0.0250 |
| 1.73e+03 < x <= 1.82e+03 | 1.8946 | 0.0253 |
| 1.82e+03 < x <= 1.91e+03 | 1.9504 | 0.0247 |
| 1.91e+03 < x <= 2.02e+03 | 2.0074 | 0.0250 |
| 2.02e+03 < x <= 2.16e+03 | 2.0213 | 0.0250 |
| 2.16e+03 < x <= 2.32e+03 | 2.0541 | 0.0250 |
| 2.32e+03 < x <= 2.56e+03 | 2.0757 | 0.0250 |
| 2.56e+03 < x <= 2.86e+03 | 2.0142 | 0.0250 |
| 2.86e+03 < x <= 3.28e+03 | 1.9196 | 0.0250 |
| 3.28e+03 < x <= 4.25e+03 | 2.0439 | 0.0250 |
| 4.25e+03 < x | 2.0010 | 0.0250 |
| target_mean | frequency |
|---|---|
| 1.9895 | 0.0269 |
| 1.8189 | 0.0271 |
| 2.1479 | 0.0271 |
| 2.2434 | 0.0266 |
| 2.1281 | 0.0269 |
| 2.2908 | 0.0257 |
| 2.0926 | 0.0283 |
| 2.1757 | 0.0213 |
| 2.2182 | 0.0259 |
| 2.1433 | 0.0286 |
| 2.0769 | 0.0293 |
| 2.1889 | 0.0240 |
| 2.0488 | 0.0218 |
| 2.1585 | 0.0247 |
| 2.0699 | 0.0259 |
| 2.0396 | 0.0247 |
| 1.9843 | 0.0254 |
| 2.1062 | 0.0213 |
| 1.9823 | 0.0242 |
| 2.1353 | 0.0271 |
| 2.1132 | 0.0230 |
| 1.9696 | 0.0252 |
| 2.1243 | 0.0196 |
| 1.9774 | 0.0245 |
| 1.8002 | 0.0245 |
| 2.1500 | 0.0264 |
| 1.9471 | 0.0293 |
| 1.9535 | 0.0262 |
| 2.0915 | 0.0274 |
| 2.0390 | 0.0228 |
| 2.1380 | 0.0211 |
| 1.9706 | 0.0203 |
| 1.8717 | 0.0264 |
| 1.9082 | 0.0247 |
| 2.0895 | 0.0233 |
| 1.8131 | 0.0266 |
| 2.0019 | 0.0269 |
| 2.0234 | 0.0201 |
| 2.1558 | 0.0262 |
| 2.0339 | 0.0225 |
Computing associations: 92170it [00:03, 26163.59it/s]
Testing robustness : 1%| | 753/92170 [00:00<01:43, 885.21it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.53e+02 | 1.9663 | 0.0502 |
| 3.53e+02 < x <= 8.32e+02 | 2.1636 | 0.2253 |
| 8.32e+02 < x <= 1.73e+03 | 2.0604 | 0.4745 |
| 1.73e+03 < x <= 2.16e+03 | 1.9683 | 0.1000 |
| 2.16e+03 < x | 2.0181 | 0.1500 |
| target_mean | frequency |
|---|---|
| 1.9038 | 0.0540 |
| 2.1659 | 0.2398 |
| 2.0445 | 0.4680 |
| 1.9639 | 0.0925 |
| 2.0169 | 0.1456 |
--- [ContinuousCarver] Fit Quantitative('AveOccup') (6/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 1.699e+00 | 2.6141 | 0.0250 |
| 1.699e+00 < x <= 1.868e+00 | 2.7986 | 0.0250 |
| 1.868e+00 < x <= 1.976e+00 | 2.6979 | 0.0250 |
| 1.976e+00 < x <= 2.071e+00 | 2.5558 | 0.0250 |
| 2.071e+00 < x <= 2.161e+00 | 2.4582 | 0.0250 |
| 2.161e+00 < x <= 2.228e+00 | 2.2757 | 0.0250 |
| 2.228e+00 < x <= 2.288e+00 | 2.3592 | 0.0250 |
| 2.288e+00 < x <= 2.341e+00 | 2.2507 | 0.0250 |
| 2.341e+00 < x <= 2.388e+00 | 2.1371 | 0.0250 |
| 2.388e+00 < x <= 2.435e+00 | 2.2708 | 0.0250 |
| 2.435e+00 < x <= 2.475e+00 | 2.1989 | 0.0250 |
| 2.475e+00 < x <= 2.515e+00 | 2.1564 | 0.0250 |
| 2.515e+00 < x <= 2.557e+00 | 2.1279 | 0.0250 |
| 2.557e+00 < x <= 2.598e+00 | 2.2428 | 0.0250 |
| 2.598e+00 < x <= 2.639e+00 | 2.1116 | 0.0250 |
| 2.639e+00 < x <= 2.674e+00 | 2.2343 | 0.0250 |
| 2.674e+00 < x <= 2.712e+00 | 2.0489 | 0.0250 |
| 2.712e+00 < x <= 2.746e+00 | 2.2196 | 0.0250 |
| 2.746e+00 < x <= 2.784e+00 | 2.1211 | 0.0250 |
| 2.784e+00 < x <= 2.824e+00 | 2.2645 | 0.0250 |
| 2.824e+00 < x <= 2.861e+00 | 2.1565 | 0.0251 |
| 2.861e+00 < x <= 2.899e+00 | 2.2323 | 0.0250 |
| 2.899e+00 < x <= 2.943e+00 | 2.0714 | 0.0250 |
| 2.943e+00 < x <= 2.984e+00 | 2.0495 | 0.0250 |
| 2.984e+00 < x <= 3.026e+00 | 1.9917 | 0.0250 |
| 3.026e+00 < x <= 3.071e+00 | 1.9623 | 0.0250 |
| 3.071e+00 < x <= 3.117e+00 | 2.0491 | 0.0250 |
| 3.117e+00 < x <= 3.168e+00 | 1.9336 | 0.0250 |
| 3.168e+00 < x <= 3.221e+00 | 1.9472 | 0.0250 |
| 3.221e+00 < x <= 3.279e+00 | 1.8938 | 0.0250 |
| 3.279e+00 < x <= 3.344e+00 | 1.8804 | 0.0250 |
| 3.344e+00 < x <= 3.424e+00 | 1.8724 | 0.0250 |
| 3.424e+00 < x <= 3.508e+00 | 1.8000 | 0.0250 |
| 3.508e+00 < x <= 3.606e+00 | 1.6571 | 0.0250 |
| 3.606e+00 < x <= 3.719e+00 | 1.5624 | 0.0250 |
| 3.719e+00 < x <= 3.870e+00 | 1.5709 | 0.0250 |
| 3.870e+00 < x <= 4.089e+00 | 1.4854 | 0.0250 |
| 4.089e+00 < x <= 4.317e+00 | 1.4240 | 0.0250 |
| 4.317e+00 < x <= 4.705e+00 | 1.3233 | 0.0250 |
| 4.705e+00 < x | 1.5280 | 0.0250 |
| target_mean | frequency |
|---|---|
| 2.7524 | 0.0220 |
| 2.7763 | 0.0293 |
| 2.6502 | 0.0257 |
| 2.5990 | 0.0242 |
| 2.4828 | 0.0296 |
| 2.4039 | 0.0247 |
| 2.2567 | 0.0281 |
| 2.4137 | 0.0230 |
| 2.3471 | 0.0211 |
| 2.2425 | 0.0300 |
| 2.0911 | 0.0252 |
| 2.2072 | 0.0259 |
| 2.1370 | 0.0262 |
| 2.0973 | 0.0281 |
| 2.0188 | 0.0230 |
| 2.0825 | 0.0225 |
| 2.2615 | 0.0247 |
| 2.0114 | 0.0213 |
| 2.2314 | 0.0257 |
| 2.0203 | 0.0233 |
| 2.0908 | 0.0286 |
| 1.8887 | 0.0233 |
| 1.9894 | 0.0250 |
| 2.2316 | 0.0228 |
| 2.0891 | 0.0291 |
| 1.9787 | 0.0223 |
| 2.0818 | 0.0279 |
| 1.8602 | 0.0203 |
| 1.9611 | 0.0189 |
| 1.7265 | 0.0230 |
| 1.7789 | 0.0259 |
| 1.8341 | 0.0274 |
| 1.6481 | 0.0211 |
| 1.6989 | 0.0247 |
| 1.6267 | 0.0271 |
| 1.5547 | 0.0250 |
| 1.4150 | 0.0293 |
| 1.5364 | 0.0220 |
| 1.4245 | 0.0262 |
| 1.5598 | 0.0266 |
Computing associations: 92170it [00:03, 26604.88it/s]
Testing robustness : 0%| | 0/92170 [00:00<?, ?it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 2.16e+00 | 2.6250 | 0.1250 |
| 2.16e+00 < x <= 2.90e+00 | 2.2005 | 0.4251 |
| 2.90e+00 < x <= 3.51e+00 | 1.9501 | 0.2749 |
| 3.51e+00 < x <= 3.87e+00 | 1.5968 | 0.0750 |
| 3.87e+00 < x | 1.4402 | 0.1000 |
| target_mean | frequency |
|---|---|
| 2.6484 | 0.1308 |
| 2.1665 | 0.4247 |
| 1.9311 | 0.2636 |
| 1.6265 | 0.0768 |
| 1.4801 | 0.1042 |
--- [ContinuousCarver] Fit Quantitative('Latitude') (7/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.275e+01 | 1.5912 | 0.0287 |
| 3.275e+01 < x <= 3.284e+01 | 1.9471 | 0.0220 |
| 3.284e+01 < x <= 3.321e+01 | 2.1038 | 0.0246 |
| 3.321e+01 < x <= 3.365e+01 | 2.7833 | 0.0279 |
| 3.365e+01 < x <= 3.374e+01 | 2.4326 | 0.0268 |
| 3.374e+01 < x <= 3.379e+01 | 2.1829 | 0.0262 |
| 3.379e+01 < x <= 3.383e+01 | 2.4232 | 0.0229 |
| 3.383e+01 < x <= 3.387e+01 | 2.3003 | 0.0241 |
| 3.387e+01 < x <= 3.391e+01 | 2.1570 | 0.0279 |
| 3.391e+01 < x <= 3.394e+01 | 1.6300 | 0.0242 |
| 3.394e+01 < x <= 3.397e+01 | 1.8594 | 0.0225 |
| 3.397e+01 < x <= 3.400e+01 | 1.9482 | 0.0224 |
| 3.400e+01 < x <= 3.403e+01 | 2.1267 | 0.0277 |
| 3.403e+01 < x <= 3.406e+01 | 2.4021 | 0.0339 |
| 3.406e+01 < x <= 3.408e+01 | 2.2476 | 0.0214 |
| 3.408e+01 < x <= 3.410e+01 | 2.1003 | 0.0203 |
| 3.410e+01 < x <= 3.413e+01 | 2.3646 | 0.0242 |
| 3.413e+01 < x <= 3.417e+01 | 2.7771 | 0.0301 |
| 3.417e+01 < x <= 3.420e+01 | 2.5061 | 0.0174 |
| 3.420e+01 < x <= 3.427e+01 | 2.3463 | 0.0262 |
| 3.427e+01 < x <= 3.453e+01 | 2.4559 | 0.0240 |
| 3.453e+01 < x <= 3.532e+01 | 1.4914 | 0.0246 |
| 3.532e+01 < x <= 3.623e+01 | 0.9208 | 0.0250 |
| 3.623e+01 < x <= 3.672e+01 | 1.2441 | 0.0262 |
| 3.672e+01 < x <= 3.697e+01 | 1.3129 | 0.0253 |
| 3.697e+01 < x <= 3.729e+01 | 2.6241 | 0.0239 |
| 3.729e+01 < x <= 3.737e+01 | 2.6574 | 0.0258 |
| 3.737e+01 < x <= 3.753e+01 | 3.0105 | 0.0255 |
| 3.753e+01 < x <= 3.765e+01 | 2.4197 | 0.0243 |
| 3.765e+01 < x <= 3.772e+01 | 2.1174 | 0.0256 |
| 3.772e+01 < x <= 3.777e+01 | 2.5537 | 0.0286 |
| 3.777e+01 < x <= 3.781e+01 | 2.7647 | 0.0221 |
| 3.781e+01 < x <= 3.793e+01 | 2.6181 | 0.0238 |
| 3.793e+01 < x <= 3.800e+01 | 1.7622 | 0.0250 |
| 3.800e+01 < x <= 3.826e+01 | 1.5924 | 0.0243 |
| 3.826e+01 < x <= 3.850e+01 | 1.8570 | 0.0254 |
| 3.850e+01 < x <= 3.863e+01 | 1.3981 | 0.0241 |
| 3.863e+01 < x <= 3.898e+01 | 1.3962 | 0.0251 |
| 3.898e+01 < x <= 3.975e+01 | 1.1241 | 0.0255 |
| 3.975e+01 < x | 0.8442 | 0.0244 |
| target_mean | frequency |
|---|---|
| 1.5761 | 0.0320 |
| 1.9445 | 0.0298 |
| 2.2318 | 0.0254 |
| 2.7115 | 0.0264 |
| 2.4368 | 0.0262 |
| 2.2910 | 0.0291 |
| 2.3528 | 0.0220 |
| 2.3233 | 0.0233 |
| 2.0937 | 0.0368 |
| 1.6319 | 0.0230 |
| 1.7992 | 0.0235 |
| 1.9408 | 0.0250 |
| 2.1292 | 0.0250 |
| 2.3261 | 0.0334 |
| 2.2713 | 0.0233 |
| 2.2817 | 0.0211 |
| 2.2228 | 0.0216 |
| 2.8224 | 0.0303 |
| 2.3178 | 0.0187 |
| 2.2778 | 0.0279 |
| 2.5025 | 0.0252 |
| 1.3719 | 0.0201 |
| 0.9336 | 0.0218 |
| 1.2516 | 0.0259 |
| 1.2597 | 0.0274 |
| 2.5507 | 0.0240 |
| 2.5351 | 0.0266 |
| 2.9827 | 0.0283 |
| 2.6519 | 0.0194 |
| 2.0869 | 0.0203 |
| 2.6145 | 0.0242 |
| 2.5272 | 0.0208 |
| 2.6246 | 0.0308 |
| 1.6630 | 0.0250 |
| 1.5156 | 0.0206 |
| 1.7549 | 0.0225 |
| 1.3101 | 0.0196 |
| 1.3997 | 0.0279 |
| 1.1114 | 0.0235 |
| 0.8671 | 0.0225 |
Computing associations: 92170it [00:03, 27314.34it/s]
Testing robustness : 0%| | 1/92170 [00:00<12:41:40, 2.02it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= 3.45e+01 | 2.2311 | 0.5254 |
| 3.45e+01 < x <= 3.70e+01 | 1.2415 | 0.1011 |
| 3.70e+01 < x <= 3.79e+01 | 2.5927 | 0.1997 |
| 3.79e+01 < x <= 3.85e+01 | 1.7393 | 0.0748 |
| 3.85e+01 < x | 1.1907 | 0.0991 |
| target_mean | frequency |
|---|---|
| 2.2111 | 0.5487 |
| 1.2065 | 0.0952 |
| 2.5902 | 0.1945 |
| 1.6488 | 0.0681 |
| 1.1801 | 0.0935 |
--- [ContinuousCarver] Fit Quantitative('Longitude') (8/8)
[ContinuousCarver] Raw distribution
| target_mean | frequency | |
|---|---|---|
| x <= -1.2269e+02 | 1.4063 | 0.0259 |
| -1.2269e+02 < x <= -1.2247e+02 | 2.8878 | 0.0259 |
| -1.2247e+02 < x <= -1.2241e+02 | 3.2397 | 0.0245 |
| -1.2241e+02 < x <= -1.2229e+02 | 2.1582 | 0.0262 |
| -1.2229e+02 < x <= -1.2223e+02 | 2.3463 | 0.0260 |
| -1.2223e+02 < x <= -1.2215e+02 | 2.2598 | 0.0216 |
| -1.2215e+02 < x <= -1.2206e+02 | 2.5665 | 0.0263 |
| -1.2206e+02 < x <= -1.2199e+02 | 2.6265 | 0.0253 |
| -1.2199e+02 < x <= -1.2191e+02 | 2.6924 | 0.0237 |
| -1.2191e+02 < x <= -1.2181e+02 | 2.2919 | 0.0255 |
| -1.2181e+02 < x <= -1.2157e+02 | 1.7103 | 0.0242 |
| -1.2157e+02 < x <= -1.2139e+02 | 1.1736 | 0.0252 |
| -1.2139e+02 < x <= -1.2127e+02 | 1.3270 | 0.0263 |
| -1.2127e+02 < x <= -1.2101e+02 | 1.4857 | 0.0238 |
| -1.2101e+02 < x <= -1.2064e+02 | 1.4716 | 0.0245 |
| -1.2064e+02 < x <= -1.2007e+02 | 1.3376 | 0.0254 |
| -1.2007e+02 < x <= -1.1972e+02 | 1.2624 | 0.0258 |
| -1.1972e+02 < x <= -1.1929e+02 | 1.3332 | 0.0239 |
| -1.1929e+02 < x <= -1.1897e+02 | 1.3300 | 0.0250 |
| -1.1897e+02 < x <= -1.1852e+02 | 2.7211 | 0.0258 |
| -1.1852e+02 < x <= -1.1843e+02 | 3.1653 | 0.0284 |
| -1.1843e+02 < x <= -1.1838e+02 | 3.4432 | 0.0238 |
| -1.1838e+02 < x <= -1.1834e+02 | 2.7480 | 0.0249 |
| -1.1834e+02 < x <= -1.1830e+02 | 2.3435 | 0.0271 |
| -1.1830e+02 < x <= -1.1827e+02 | 1.8482 | 0.0207 |
| -1.1827e+02 < x <= -1.1822e+02 | 1.6714 | 0.0273 |
| -1.1822e+02 < x <= -1.1818e+02 | 1.8055 | 0.0227 |
| -1.1818e+02 < x <= -1.1813e+02 | 2.1480 | 0.0287 |
| -1.1813e+02 < x <= -1.1808e+02 | 2.2494 | 0.0243 |
| -1.1808e+02 < x <= -1.1801e+02 | 2.4079 | 0.0245 |
| -1.1801e+02 < x <= -1.1795e+02 | 2.1794 | 0.0252 |
| -1.1795e+02 < x <= -1.1790e+02 | 2.2897 | 0.0216 |
| -1.1790e+02 < x <= -1.1780e+02 | 2.4820 | 0.0266 |
| -1.1780e+02 < x <= -1.1766e+02 | 2.2864 | 0.0248 |
| -1.1766e+02 < x <= -1.1739e+02 | 1.6791 | 0.0237 |
| -1.1739e+02 < x <= -1.1725e+02 | 1.6380 | 0.0290 |
| -1.1725e+02 < x <= -1.1716e+02 | 2.0512 | 0.0229 |
| -1.1716e+02 < x <= -1.1708e+02 | 1.5113 | 0.0249 |
| -1.1708e+02 < x <= -1.1696e+02 | 1.6669 | 0.0235 |
| -1.1696e+02 < x | 1.1769 | 0.0245 |
| target_mean | frequency |
|---|---|
| 1.3927 | 0.0216 |
| 3.0129 | 0.0233 |
| 3.1899 | 0.0225 |
| 2.1911 | 0.0271 |
| 2.3576 | 0.0254 |
| 2.2342 | 0.0199 |
| 2.9862 | 0.0240 |
| 2.5471 | 0.0240 |
| 2.6969 | 0.0230 |
| 2.1464 | 0.0250 |
| 1.7105 | 0.0218 |
| 1.0959 | 0.0220 |
| 1.2918 | 0.0291 |
| 1.3781 | 0.0230 |
| 1.4767 | 0.0225 |
| 1.2441 | 0.0252 |
| 1.2810 | 0.0281 |
| 1.2813 | 0.0252 |
| 1.4223 | 0.0274 |
| 2.7081 | 0.0218 |
| 3.2548 | 0.0266 |
| 3.3604 | 0.0242 |
| 2.8064 | 0.0262 |
| 2.2395 | 0.0305 |
| 1.7551 | 0.0191 |
| 1.7695 | 0.0242 |
| 1.6175 | 0.0298 |
| 2.0881 | 0.0264 |
| 2.3487 | 0.0245 |
| 2.4322 | 0.0235 |
| 2.1831 | 0.0286 |
| 2.1875 | 0.0211 |
| 2.5202 | 0.0288 |
| 2.2701 | 0.0235 |
| 1.7464 | 0.0225 |
| 1.8748 | 0.0310 |
| 2.1466 | 0.0266 |
| 1.4479 | 0.0279 |
| 1.5746 | 0.0271 |
| 1.2465 | 0.0259 |
Computing associations: 92170it [00:03, 27465.39it/s]
Testing robustness : 0%| | 1/92170 [00:00<4:52:24, 5.25it/s]
[ContinuousCarver] Carved distribution
| target_mean | frequency | |
|---|---|---|
| x <= -1.218e+02 | 2.4438 | 0.2509 |
| -1.218e+02 < x <= -1.190e+02 | 1.3787 | 0.2242 |
| -1.190e+02 < x <= -1.183e+02 | 3.0175 | 0.1029 |
| -1.183e+02 < x <= -1.177e+02 | 2.1601 | 0.2735 |
| -1.177e+02 < x | 1.6155 | 0.1486 |
| target_mean | frequency |
|---|---|
| 2.4780 | 0.2357 |
| 1.3487 | 0.2243 |
| 3.0414 | 0.0988 |
| 2.1328 | 0.2800 |
| 1.6763 | 0.1611 |
[8]:
| library | fit_s | transform_s | train_r2 | test_r2 | r2_drop | |
|---|---|---|---|---|---|---|
| 0 | AutoCarver | 33.499 | 0.0577 | 0.6633 | 0.6566 | 0.0067 |
| 1 | optbinning | 2.548 | 0.0086 | 0.5145 | 0.5077 | 0.0068 |
| 2 | KBinsDiscretizer | 0.007 | 0.0015 | 0.6181 | 0.6192 | -0.0011 |
[9]:
plot_bars(regression_results, ['fit_s', 'test_r2', 'r2_drop'], 'California Housing \u2014 regression')
How to read these numbers
``fit_s`` / ``transform_s`` measure only
.fit/.transformwall-clock — not data loading, not one-hot encoding, not the downstream model.``test_auc`` / ``test_r2`` are the headline metric. They reflect how well a simple downstream model performs on each library’s binned output. A tree-based downstream model would tell a different (and less binning-sensitive) story.
``auc_drop`` / ``r2_drop`` are
train - testand measure how much each library’s bins overfit. Lower is more robust. AutoCarver’s dev-set veto is designed to keep this small.Same data, same seed, same downstream model across libraries — but a single run, on one machine, with one set of hyper-parameters. Treat as illustrative.
When the result will move
Bigger ``max_n_mod`` / smaller ``min_freq`` will improve AutoCarver and optbinning’s in-sample scores at the cost of
*_drop. KBins doesn’t have a target, so it’s mostly insensitive.Different downstream model. Gradient-boosted trees on the raw features beat any binning + linear pipeline. The point of binning is interpretability, not raw accuracy.
Different dataset. German Credit is small; on a 10M-row credit-risk dataset,
fit_sis what dominates the comparison.
See comparison.rst for the qualitative scope and algorithmic comparison.