Combinations

Combinations are at the core of Carvers. They are used to identify the best combination from all possible combinations with up to max_n_mod modalities.

A pre-built CombinationEvaluator instance can be passed to any carver via the combination_evaluator keyword. Each subclass defaults to a task-appropriate metric: TschuprowtCombinations for BinaryCarver / OneVsRestCarver, TschuprowtMulticlassCombinations for the joint MulticlassCarver, KruskalCombinations for ContinuousCarver, and KendallTauCCombinations for OrdinalCarver.

The animation below starts from the six ordered bins a QuantitativeDiscretizer produces (its final state — see Complete pipeline for continuous and discrete features) and shows the core step: every consecutive grouping into max_n_mod groups is scored by its association with the binary target (Tschuprow’s T) and the table fills best-first in growing top_k batches (the progressive top-K DP search). The highest-scoring grouping that passes the viability filter is kept (gold row). Adjacent bins sharing a colour in a row are merged into one group.

class AutoCarver.combinations.CombinationEvaluator(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

CombinationEvaluator class to evaluate the best combination of modalities for a feature.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

Computes the best combination of modalities for the feature.

Parameters:

feature (BaseFeature) – Feature being carved.
xagg (pd.Series | pd.DataFrame | None) – Train-sample aggregation (per-modality y-lists for continuous, crosstab for binary).
xagg_dev (pd.Series | pd.DataFrame | None, optional) – Dev-sample aggregation, used for the robustness veto.
max_n_mod (int) – Maximum number of modalities allowed in the returned combination.
min_freq (float) – Minimum per-modality frequency, tested via a Wilson upper bound at significance min_freq_alpha (see Minimum-frequency test (Wilson score interval)).
dropna (bool) – Whether to group NaN as another modality (fans out NaN placements after the non-NaN DP, see Search strategy — interval dynamic programming (DP) with progressive top-K).
min_freq_alpha (float, default 0.05) – Two-sided significance level of the Wilson score interval. Smaller → wider CI → fewer rejections. alpha=1 recovers the legacy strict-threshold behaviour.
folds_xagg (list of (pd.Series | pd.DataFrame), optional) – Cross-validation fold aggregations. Each is an additional robustness view: the returned combination must stay viable on xagg_dev and every fold. Ranks are still determined on xagg (full train) only.
rescue (bool, default False) – When the normal search finds nothing viable and a validation view exists (dev and/or CV folds), rerun the search with the min_freq veto waived; distinct-rates and train/dev ordering vetoes stay enforced on every validation view.

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

The highest-scoring grouping is not necessarily the one that is kept: each candidate must clear the viability filter (minimum frequency via a Wilson score interval, distinct consecutive target rates, and train/dev rank preservation). That filter is documented on its own page.

Search strategy — interval dynamic programming (DP) with progressive top-K

For fixed min_freq, max_n_mod and association metric, AutoCarver returns the partition that maximises the metric among admissible candidates. The DP described below is a search-strategy optimisation; it does not prune the candidate set and does not change the statistical claim. Bit-exact agreement with the legacy enumerate-and-score path is pinned by parity tests (tests/combinations/binary/test_dp_chi2_parity.py, tests/combinations/continuous/test_dp_kruskal_parity.py).

The search problem

For a feature with raw modalities \(m_0, \dots, m_{n-1}\) already ordered (by ordinal rank, target rate, or numeric quantile), the carver searches over consecutive segmentations with at most max_n_mod groups: a partition is fully determined by integer split positions \(0 = s_0 < s_1 < \dots < s_k = n\) with \(k \le \text{max_n_mod}\). The chosen partition maximises the association metric subject to the viability filter (Wilson min_freq on train + dev, distinct target rates, preserved rank between train and dev when dev is provided).

The legacy path enumerated every admissible partition, scored each, then walked them in metric-desc order. This is correct but wasteful — only the top handful of candidates ever survive the viability walk.

The DP idea

The DP exploits two properties shared by both supported metrics (Kruskal-Wallis \(H\) for continuous targets, Pearson \(\chi^2\) for binary targets):

Segmentation structure. A partition is a sequence of disjoint consecutive intervals \([s_g, s_{g+1})\). Sub-problems factorise over the right boundary \(j\) and the number of groups \(k\).
Additive decomposability of the metric over groups, given fixed \(k\). Both \(H\) and \(\chi^2\) reduce — at fixed \(k\) and after factoring out \(k\)-dependent normalising constants — to a sum over groups of a quantity that depends only on a single interval \([i, j)\) of raw modalities.

We therefore run an interval DP indexed by \((k, j)\) whose state is the top-K prefixes (by partial score) ending at split \(j\) with \(k\) groups:

\[\text{dp}[k][j] = \operatorname*{top\text{-}K}_{i \in [k-1,\, j)} \big\{\, \text{dp}[k-1][i] \oplus \text{seg_cost}(i,\, j)\, \big\}\]

The final candidate list is \(\bigcup_k \text{dp}[k][n]\), sorted desc and truncated to top_k.

How the DP fills its table — a worked sketch

Why it beats enumerate-and-score, in one picture.

1. The search space. Take a feature with 6 ordered raw modalities. A partition with k = 3 groups is fully determined by 2 internal split positions. Two candidate partitions:

      A:
      m₀ m₁
      │
      m₂ m₃
      │
      m₄ m₅
    

      B:
      m₀ m₁
      │
      m₂ m₃ m₄
      │
      m₅
    

2. The shared-prefix insight. A and B share the first group m₀ m₁. The naïve path re-scores that prefix once per candidate that contains it; for n = 6, k = 3 that's C(5, 2) = 10 partitions and a lot of wasted work. The DP scores each prefix once and reuses it.

3. The DP table. dp[k][j] holds the top-K best scores of partitioning the first j modalities into exactly k groups. We fill it row-by-row, left-to-right:

k \ j	1	2	3	4	5	6
k=1	●	●	●	●	●	●
k=2	—	●	●	●	●	●
k=3	—	—	●	●	●	★

The answer for k = 3 lives at the ★ cell: dp[3][6]. Cells marked — are infeasible (can't make k groups out of fewer than k modalities).

4. The recurrence — one cell at a time. To fill dp[k][j], try every possible last split position i and combine:

    dp[k][j]
     = 
    top-K over
     i 
    of
     { 
    dp[k−1][i]
     ⊕ 
    seg_cost(i, j)
     }
  

dp[k−1][i] is already computed (previous row). seg_cost(i, j) is the contribution of the final group [i, j) — closed-form from prefix sums (no inner enumeration). So each cell costs O(j) work, and the whole table is O(K · max_n_mod · n²) instead of the O(2ⁿ) enumerate-and-score path.

Why "progressive" top-K? The DP returns the top top_k partitions by score, then the viability filter (Wilson min_freq, distinct target rates, train/dev rank) walks them in order. If none pass, top_k doubles and the DP re-runs — keeping the common case (a viable winner in the first batch) cheap, while preserving the optimality guarantee in the worst case.

Progressive top-K

The DP returns the top-K scored partitions, not all of them. The viability walk consumes that list in metric-desc order; if no candidate is viable in the current top-K we double top_k and re-run the DP, walking only the newly-appeared entries. Repeats until either:

a viable candidate is found, or
the DP returns fewer than top_k entries — every consecutive partition has been emitted; no viable exists for this feature.

Doubling guarantees the search is exhaustive in the worst case: the same admissible candidate set is eventually considered as in the legacy enumerate-and-score path. Total work is bounded by \(\sim 2 \times\) a single DP run at the final top_k. The common case (viable found in the initial top-K) costs \(O(K \cdot n^2 \cdot \text{top\_k} \cdot \log \text{top\_k})\) ops, independent of the total combination count — which scales combinatorially in \(n\) and max_n_mod and reached \(\sim 8\text{M}\) at \(n=40,\,\text{max_n_mod}=7\) previously.

The initial top-K is configurable via the class attribute CombinationEvaluator.dp_top_k_initial (default 1000).

NaN fan-out path

When dropna=True and the feature has NaNs, the DP runs on the non-NaN sub-index to produce base partitions. Each base is then fanned out across NaN placements:

NaN folded into each existing group;
NaN as its own group when len(base) < max_n_mod;
plus the degenerate [all_non_nan, [NaN]] partition.

Each variant is scored in closed form (_kruskal_h_for_combination / _chi2_assoc_for_combination) against the full per-modality stats — the NaN row is still in the aggregated sample because _apply_best_combination rebuilt it from raw after the non-NaN DP. Variants are walked sorted desc, dedup’d by partition key across progressive iterations so combinations carried over from a smaller top_k are not re-tested.

What does not change

The admissible candidate set: consecutive segmentations with \(k \le \text{max_n_mod}\).
The viability filter: Wilson min_freq on train + dev, distinct target rates, rank preservation.
The optimality claim: for fixed min_freq, max_n_mod, and metric, no admissible combination scores higher than the one returned.

The DP is a search-strategy optimisation, not a statistical change.

Modality ordering and target rates

The DP above only ever scores consecutive segmentations: once the raw modalities are laid out on a line, every candidate group is an interval of that line. For quantitative features that line is the numeric order, and for ordinal features the user-declared ranking. A categorical feature has no intrinsic order, so one must be built from the data — and because only consecutive groups are ever considered, this pre-search ordering decides which groupings are reachable at all. It is computed once, on the train sample, by the Categorical Discretizer step embedded in every carver pipeline.

The scalar built for that pre-sort does not disappear once the search starts. Every combination evaluator carries a target rate — the per-modality summary of the target that the carver reports and, crucially, the scalar the viability filter orders by — re-computed for every candidate grouping the search proposes. It is passed via the target_rate keyword and defaults to a task-appropriate choice, documented per target type below. A target rate plays two distinct roles:

Display statistic. One value per modality, stored on feature.statistics and surfaced in the carved-feature summary, so the grouping can be read off in interpretable units (an event rate, an odds ratio, a mean target, …).
Ordering key for viability. The same per-modality value is what the distinct-rate test requires to differ between consecutive modalities, and what the train/dev rank-preservation veto sorts on. A combination whose target-rate ordering collapses or flips is rejected.

Because of this dual role, two properties decide whether a candidate rate is a good fit:

It must be an orderable scalar. The viability checks need a single value per modality with a meaningful monotone ordering. A symmetric measure (e.g. a Gini-style impurity, maximal at \(p = 0.5\)) is fine as a display statistic but a poor ordering key.
Decomposability buys a fast path. When a rate can be reconstructed from per-raw-modality sufficient statistics it can opt into a closed-form path (compute_from_stats) that costs \(O(k)\) per combination instead of re-aggregating the raw sample on every candidate. The continuous mean does this; rates that need the full value multiset (median, quantiles) cannot and fall back to the general aggregation path.

Each target type below tells the two-stage story in one place: how the pre-search order is built, and which target rates carry that ordering through the viability filter.

Target-mean ordering (binary, continuous and ordinal targets)

When the target is numeric — binary \(\{0, 1\}\), continuous, or integer-encoded ordinal levels — each modality \(m\) is scored by its mean target value over the train sample,

\[\bar{y}_m = \frac{1}{n_m} \sum_{i:\, x_i = m} y_i\]

(a y.groupby(x).mean()), and modalities are sorted by ascending \(\bar{y}_m\). For a binary target this is the per-modality event rate \(P(y = 1 \mid x = m)\).

This choice is what makes the consecutive-only restriction statistically harmless rather than limiting: sorting by target mean puts modalities of similar risk next to each other, so the intervals the DP can form are exactly the merges of comparable-risk modalities — and the resulting carved feature has monotone target rates across its bins, the layout scorecard-style models expect.

For an ordinal target a raw mean over the integer encoding of the levels would depend on the levels’ spacing, not only their order: encoding the same four levels as \(\{1, 2, 3, 4\}\) or \(\{1, 2, 3, 10\}\) could produce different modality orderings, even though the tau statistics that score the combinations are rank-based and encoding-invariant. OrdinalCarver therefore maps the levels through the scale resolved from its target_scale mode before taking the mean — train ridits by default (see Ridit scoring below), the raw encoding with target_scale="level" (count targets), or user-declared representative values with a {level: value} dict.

Binary target rates

For binary targets (and One-vs-Rest Classification’s per-class binary sub-fits) the per-modality input is a two-column crosstab \((n_0, n_1)\), so every rate below is closed-form. The default is the event rate \(p = n_1 / (n_0 + n_1)\) (TargetMean) — the same scalar the pre-sort ordered the raw modalities by.

class AutoCarver.combinations.binary.binary_target_rates.TargetMean: Mean target rate class.

Continuous target rates

For continuous targets the per-modality input is the multiset of target values. The default TargetMean is decomposable from per-modality \((n, \sum y)\) aggregates and therefore implements the closed-form compute_from_stats fast path; TargetMedian is not decomposable from sums and uses the general aggregation path.

class AutoCarver.combinations.continuous.continuous_target_rates.TargetMean: Mean target rate class.

class AutoCarver.combinations.continuous.continuous_target_rates.TargetMedian: Median of target per class.

Ordinal target rates

For ordinal targets the per-modality input is the ordered contingency table (feature groups × ordinal target levels) — the binary crosstab generalised from two columns to one column per ordinal level. The default TargetMeanRidit is the per-group count-weighted mean train-ridit of the levels (see Ridit scoring below); TargetMeanLevel is the per-group mean ordinal level \(\sum_j \text{level}_j \cdot n_{gj} / n_{g+}\), read from the (integer) crosstab columns or from user-declared level_values. Both are monotone in the target’s order, so they drive the min_freq viability test and the train/dev rank-preservation veto exactly like the binary/continuous target means. Rather than instantiating these directly, declare the scale through OrdinalCarver’s target_scale — the carver derives the modality pre-sort from the same resolved rate, so the two stages can never disagree.

class AutoCarver.combinations.ordinal.ordinal_target_rates.TargetMeanRidit

Mean train-ridit per modality (the ordinal default).

The per-group rate is the count-weighted mean of the train ridits of the crosstab’s columns (see AutoCarver.discretizers.utils.ridits): the owning evaluator fixes the reference marginal once, from the feature’s raw (un-grouped) train crosstab (fit_reference()), and every later call — a train candidate grouping, or a dev-sample grouping — scores against that same reference (levels unseen in train get the natural CDF extension). Invariant under any strictly increasing re-encoding of the target levels, bounded in [0, 1], and monotone in the target’s order, so it drives both the min_freq viability test and the train/dev rank-preservation veto exactly like the binary/continuous target means.

class AutoCarver.combinations.ordinal.ordinal_target_rates.TargetMeanLevel(level_values: dict | None = None)

Mean ordinal level per modality.

The per-group rate is Σ_j level_j · n_gj / n_g+ where level_j is read from the (integer) crosstab columns — or, when level_values is given, from the user’s per-level representative values. It is monotone in the target’s order, so it drives both the min_freq viability test and the train/dev rank-preservation veto exactly like the binary/continuous target means.

Parameters:: level_values (dict, optional) – {level: value} representative value per target level (e.g. a calibrated default probability per rating grade). Values must be strictly increasing when levels are sorted ascending. None (default) reads the levels themselves from the crosstab columns.

Ridit scoring

As noted above, a raw-encoding ordinal pre-sort (and the TargetMeanLevel viability rate) consume the integer encoding of the levels numerically, while the tau statistics scoring the combinations are purely rank-based. The default target_scale="ridit" instead scores each level by its ridit against the train marginal:

\[r_j = F(j - 1) + \tfrac{1}{2} f_j,\]

where \(f_j\) is the train frequency of level \(j\) and \(F(j-1)\) the cumulative frequency of all lower levels — i.e. the level’s mean midrank rescaled to \([0, 1]\). A group’s mean ridit is exactly the per-group quantity concordance statistics (Kendall’s taus, Somers’ D) respond to, and it is invariant under any strictly increasing re-encoding of the levels: \(\{1, 2, 3, 4\}\) and \(\{1, 2, 3, 10\}\) carve identically.

Not every integer-encoded ordered target is order-only, though, so the scale is a user-declared target_scale mode driving both the pre-sort and the viability rate from a single resolved scale:

"ridit" (default) — order-only levels (Poor / Fair / Good): encoding-invariant, semantically honest for “ordinal”.
"level" — count targets (e.g. 0–5 claims), where the encoding is the scale and the mean level (expected count) is the right summary.
{level: value} — known representative values per level (e.g. a calibrated default probability per rating grade), strictly increasing.

When individual continuous target values are available, use ContinuousCarver instead.

Correspondence-analysis ordering (multiclass targets)

A multiclass (unordered) target has no numeric mean: with classes {"car", "bike", "train"} there is no per-modality scalar to sort by. The joint MulticlassCarver instead orders modalities along the first axis of a correspondence analysis (CA) of the raw modalities × classes crosstab — the 1-D embedding of the table that captures the largest share of its \(\chi^2\) inertia (AutoCarver.discretizers.utils.correspondence_analysis).

Fitting the axis:

Normalise the crosstab to proportions \(P\), with row masses \(r_m\) (modality frequencies) and column masses \(c_k\) (class frequencies).
Form the standardized residuals \(S_{mk} = (P_{mk} - r_m c_k) / \sqrt{r_m c_k}\) — the signed cell-wise contributions to \(\chi^2 / n\), zero everywhere iff feature and target are independent.
Take the SVD of \(S\); the first right singular vector \(v_1\) is the class-side axis.

Each modality is then scored by projecting its row profile \(p_{m\cdot}\) (its own distribution across classes) onto that fixed axis, via the CA transition formula

\[\text{score}(m) = \sum_k \frac{p_{mk} - c_k}{\sqrt{c_k}} \; v_{1k},\]

and modalities are sorted by ascending score. Four properties matter for carving:

Chi²-optimal. The first CA axis is the 1-D layout that captures the most of the table’s \(\chi^2\) association — the natural companion to the \(\chi^2\)-derived Cramér’s V / Tschuprow’s T the multiclass evaluators maximise, playing the role the target-mean sort plays for a binary target.
Train-only, fit once. The axis is fit on the feature’s raw train crosstab and never refit: because the transition formula only needs a row’s own profile plus the fixed column masses and axis, the same axis scores candidate grouped tables and the dev-sample projection used by the rank-preservation veto.
Label-independent and deterministic. Scores depend only on the counts, never on the modalities’ or classes’ text, and the axis sign is anchored on the largest-mass row (content-based tie-breaks), so any row permutation of the same table yields the same ordering.
Degenerate fallback. With \(\le 2\) modalities, fewer than 2 classes, or no \(\chi^2\) structure (near-zero first singular value), the CA axis is meaningless and modalities fall back to a deterministic frequency-descending order.

The same fitted axis doubles as the multiclass target rate: the default CAScoreRate projects each row of a candidate grouping’s crosstab (feature groups × classes) onto it, giving the scalar the viability filter orders by — monotone along the axis by construction, so it drives the min_freq distinct-rate test and the train/dev rank-preservation veto exactly like the binary/ordinal target means. Because the axis is never refit on dev, the veto compares train and dev against one shared yardstick rather than two independently-defined axes — and the search-space ordering and the viability ordering can never disagree.

class AutoCarver.combinations.multiclass.multiclass_target_rates.CAScoreRate

Correspondence-analysis first-axis score per modality.

The chi²-optimal 1-D embedding of a group’s row profile, projected onto the fixed train axis (see MulticlassTargetRate). Monotone along the axis by construction, so it drives both the min_freq viability test and the train/dev rank-preservation veto exactly like the binary/ordinal target rates.

Classification tasks

Pearson \(\chi^2\) (binary targets)

For a 2-column contingency table (binary target), each group \(g\) contributes counts \((n_{0,g},\, n_{1,g})\). With row marginals \(R_g = n_{0,g} + n_{1,g}\), column marginals \(C_c = \sum_g n_{c,g}\), and grand total \(N = \sum_g R_g\), Pearson’s statistic is

\[\chi^2 = \sum_{g, c} \frac{(O_{g, c} - E_{g, c})^2}{E_{g, c}}, \quad E_{g, c} = \frac{R_g \cdot C_c}{N}.\]

Two key observations:

Given a fixed number of groups \(k\), the column marginals \(C_c\) and total \(N\) depend only on \(k\) (and a constant tol shift applied to every cell — matching the legacy chi2_contingency(xagg + tol) call): \(C_c = N_c + k\cdot\text{tol}\), \(N = N_0 + N_1 + 2k\cdot\text{tol}\). They are invariant under re-partitioning at fixed \(k\).
Therefore, at fixed \(k\), \(\chi^2\) is additive over groups: each group contributes \((O - E)^2 / E\) summed over its two cells, with \(E\) derivable from \((n_{0,g},\, n_{1,g})\) and the constants \((C_0, C_1, N)\). The Yates correction (subtract \(0.5\) from \(|O - E|\) iff the table is exactly \(2 \times 2\), matching scipy’s default) is applied only when \(k = 2\), which is again known at the DP level.

The DP is therefore run once per \(k \in [2,\, \text{max_n_mod}]\) with the constants \((C_0, C_1, N, \text{yates_flag})\) fixed; per-\(k\) top-K lists are merged and re-truncated:

\[\text{seg_cost}_k(i,\, j) = \chi^2\text{ contribution of }[i,\, j)\text{ under } (C_0,\, C_1,\, N,\, \text{yates_flag} = (k = 2)).\]

Cramér’s \(V = \sqrt{\chi^2 / N_{obs}}\) and Tschuprow’s \(T = V / \sqrt[4]{k - 1}\) are monotone transforms of \(\chi^2\) at fixed \(k\), so sorting by either is equivalent to sorting by \(\chi^2\) within each \(k\) slice. The cross-\(k\) merge re-applies the configured sort_by so the global top-K is correct under either metric. Statistical equivalence to scipy.stats.chi2_contingency() is bit-exact (parity tests pin the \(+\text{tol}\) shift, the Yates handling, and the \(\text{round}(x / \text{tol}) \cdot \text{tol}\) quantisation).

Cramér’s V Combinations

See Cramér’s V for more details on the metric.

class AutoCarver.combinations.CramervCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Cramér’s V based combination evaluation toolkit.

Same DP search as TschuprowtCombinations (see Pearson \chi^2 (binary targets)); only the sort_by key differs. \(V = \sqrt{\chi^2 / N_{obs}}\) is a monotone transform of \(\chi^2\) at fixed \(k\).

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Tschuprow’s T Combinations

See Tschuprow’s T for more details on the metric.

class AutoCarver.combinations.TschuprowtCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Tschuprow’s T based combination evaluation toolkit.

Search uses progressive top-K interval DP over the closed-form Pearson \(\chi^2\) decomposition (per-k DP with constant column marginals, Yates correction iff k == 2). Statistically equivalent to scipy.stats.chi2_contingency() — bit-exact agreement pinned by parity tests.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Pearson \(\chi^2\) generalised to K classes (multiclass targets)

The joint MulticlassCarver generalises Pearson \chi^2 (binary targets) from a 2-column table to a \((B, K)\) one, \(K\) the number of target classes. With row marginals \(R_g\), column marginals \(C_c\) and total \(N\) defined exactly as before,

\[\chi^2 = N \left(\sum_{g, c} \frac{O_{g,c}^2}{R_g\, C_c} - 1\right), \qquad V = \sqrt{\frac{\chi^2}{N\,(\min(B, K) - 1)}}, \qquad T = \sqrt{\frac{\chi^2}{N\,\sqrt{(B-1)(K-1)}}}.\]

The same two observations that make the binary DP possible still hold — at fixed \(k\) groups, the column marginals and total depend only on \(k\) (plus the constant tol shift), so \(\chi^2\) is additive over groups and the DP in Pearson \chi^2 (binary targets) carries over unchanged, generalised from a (2,) to a (K,) per-segment observed vector. Yates’ correction still applies only when the table is exactly \(2\times 2\) (\(k=2\) and \(K=2\)). At \(K=2\) both \(V\) and \(T\) — and the DP itself — are numerically identical to the binary evaluator, pinned bit-for-bit by a parity test.

Unlike binary/ordinal features, a multiclass target gives qualitative modalities no numeric rate to order by before the DP walks them. They are instead ordered by their correspondence-analysis (CA) first-axis score — the 1-D embedding that maximises chi² association — computed once from the raw (un-grouped) crosstab and fixed for the rest of the search. How the axis is fit, why its ordering is deterministic and label-independent, and how the same axis doubles as the multiclass target rate (CAScoreRate) is detailed in Correspondence-analysis ordering (multiclass targets).

Cramér’s V Multiclass Combinations

class AutoCarver.combinations.CramervMulticlassCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Cramér’s V based combination evaluation toolkit.

Same DP search as TschuprowtMulticlassCombinations (see Pearson \chi^2 (binary targets)); only the sort_by key differs.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Tschuprow’s T Multiclass Combinations (default)

class AutoCarver.combinations.TschuprowtMulticlassCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Tschuprow’s T based combination evaluation toolkit (multiclass default).

Search uses progressive top-K interval DP over the closed-form Pearson \(\chi^2\) decomposition, generalised to a (B, K) table. Statistically equivalent to scipy.stats.chi2_contingency() — bit-exact agreement pinned by parity tests; for a 2-class target, numerically identical to TschuprowtCombinations.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Regression tasks

Kruskal-Wallis H (continuous targets)

Given a partition with \(n_g\) observations per group, rank sum \(R_g\), total \(N = \sum_g n_g\), and tie correction \(T = 1 - \sum_i (t_i^3 - t_i) / (N^3 - N)\) (depends only on the pooled \(y\) multiset), the Kruskal-Wallis statistic is

\[H = \frac{1}{T}\left[\,\frac{12}{N(N+1)} \sum_g \frac{R_g^2}{n_g} - 3(N+1)\,\right].\]

Two key observations:

Per-modality \((R_i, n_i)\), the total \(N\), and the tie correction \(T\) depend only on the raw feature ranking — not on the partition. They are computed once by ranking \(y\) once over the pooled sample (see _modality_rank_stats).
\(\sum_g R_g^2 / n_g\) is additive over groups. With prefix sums R_prefix and n_prefix over the raw modalities, a single interval’s contribution is closed-form:

\[\text{seg_cost}(i, j) = \frac{\big(\text{R_prefix}[j] - \text{R_prefix}[i]\big)^2} {\text{n_prefix}[j] - \text{n_prefix}[i]}.\]

The DP maximises \(\sum_g \text{seg_cost}\) over partitions; \(H\) is recovered at the end by applying the constant prefactor \(12 / (N(N+1))\), the constant offset \(-3(N+1)\), and dividing by \(T\). Statistical equivalence to scipy.stats.kruskal() is bit-exact — the DP only re-orders the search.

Kruskal’s H Combinations

See Kruskal-Wallis’ H test statistic for more details on the metric.

class AutoCarver.combinations.KruskalCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Kruskal-Wallis’ H based combination evaluation toolkit.

Search uses progressive top-K interval DP over the closed-form Kruskal-Wallis H decomposition (rank once over pooled y, prefix-sum per-modality rank stats). Statistically equivalent to scipy.stats.kruskal() — bit-exact agreement pinned by parity tests.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Ordinal tasks

For an ordinal target (integer-encoded ordered levels), a combination is scored by a rank-association statistic on the ordered contingency table \((r \times c)\) — \(r\) feature groups (rows, target-rate order) × \(c\) ordinal target levels (cols, ascending). All three statistics below are built from the same pair counts:

\(C\) — concordant pairs (both members order the same way on the feature and on the target);
\(D\) — discordant pairs (members order oppositely);
\(P_0 = n(n-1)/2\) — all pairs, with \(n\) the number of observations;
\(T_X\), \(T_Y\) — pairs tied on the feature / on the target (equal row / equal column); \(P_0 - T_X\) and \(P_0 - T_Y\) are the pairs untied on each margin;
\(m = \min(r', c')\) — the smaller of the number of non-empty grouped rows \(r'\) and target levels \(c'\).

The concordant-minus-discordant count \(C - D\) is computed in closed form from the table’s cumulative cell sums (_concordant_minus_discordant); the three measures are monotone-comparable transforms of it. Each measure is None for a degenerate table (its denominator vanishes), mirroring the continuous evaluator’s None convention. Parity against scipy.stats.kendalltau() (tau-b) and scipy.stats.somersd() is pinned by tests/combinations/ordinal/test_ordinal_associations.py and the property suite tests/properties/combinations/test_ordinal_combinations_properties.py.

Kendall/Stuart’s \(\tau_c\) (ordinal default)

Stuart’s tau-c applies a \(\min(r, c)\) correction tailored to rectangular tables — exactly our shape (few feature groups × many target levels):

\[\tau_c = \frac{2 \, m \, (C - D)}{n^2 \, (m - 1)}.\]

Because the denominator depends only on \((n, m)\) and not on how observations distribute across groups, its magnitude stays comparable across combinations with different group counts. It self-balances toward fewer, robust modalities, only adding one when a split is genuinely discriminative — like Tschuprow’s T and the Kruskal effect sizes. This is the default for OrdinalCarver.

Kendall’s \(\tau_b\)

Kendall’s tau-b normalises \(C - D\) by the geometric mean of the two margins’ untied pairs:

\[\tau_b = \frac{C - D}{\sqrt{(P_0 - T_X)(P_0 - T_Y)}}.\]

It is bit-exact with the tau-b variant of scipy.stats.kendalltau() on the grouped table and tends to retain more modalities on smoothly monotone signals than \(\tau_c\).

Somers’ D

The original asymmetric Somers’ D D(Y|X) — concordant minus discordant pairs over pairs untied on the feature \(X\):

\[D(Y \mid X) = \frac{C - D}{P_0 - T_X}.\]

It matches scipy.stats.somersd(table).statistic. Being asymmetric it leans strongly toward the coarsest split (its maximum over groupings is typically two modalities); offered for users who specifically want raw Somers’ D rather than the self-balancing Kendall taus.

Search strategy — additive \(C - D\) interval DP

The ordinal evaluators reuse the progressive top-K interval DP pattern. The key fact is that the numerator \(C - D\) of a consecutive grouping decomposes additively over groups:

\[(C - D)(\text{grouping}) = \text{TotalBetween} - \sum_g \text{WithinSegment}(g),\]

where TotalBetween (the \(C - D\) of the fully-split table) is constant and WithinSegment — the concordant−discordant pairs removed by merging the rows of a segment — is prefix-summable. An interval DP that keeps, per number of groups \(k\), the partitions with the largest numerator therefore enumerates the best candidates without materialising every consecutive partition.

For \(\tau_c\) the per-\(k\) denominator \(n^2 (m-1) / (2m)\) is constant, so numerator-optimal \(=\) metric-optimal: the DP is exact for tau-c even at the smallest top_k.
For \(\tau_b\) and Somers’ D the denominator depends on the group sizes (through \(T_X\)), so the kept top-K candidates are re-scored with their true denominators and ranked — exact once top_k is exhaustive, a top-K approximation otherwise (progressive doubling makes it exhaustive in the worst case, exactly as in the binary/continuous DPs).

The NaN fan-out is not re-implemented: it runs after this DP has applied the best non-NaN grouping, over the already-small grouped label set, so the inherited enumerate-and-score path is cheap there.

Kendall’s tau-c Combinations

See Kendall/Stuart’s \tau_c (ordinal default) for more details on the metric.

class AutoCarver.combinations.KendallTauCCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Kendall’s tau-c based combination evaluation toolkit (ordinal default).

Stuart’s tau-c applies a min(r, c) correction tailored to rectangular tables — exactly our shape (few feature groups × many target levels) — so its magnitude stays comparable across combinations with different group counts and it leans toward fewer, robust modalities, only adding one when a split is genuinely meaningful.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Kendall’s tau-b Combinations

See Kendall’s \tau_b for more details on the metric.

class AutoCarver.combinations.KendallTauBCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Kendall’s tau-b based combination evaluation toolkit.

Bit-exact with scipy.stats.kendalltau() (the tau-b variant) on the grouped contingency table — pinned by parity tests. Normalised by the geometric mean of both margins’ untied pairs; tends to retain more modalities on smoothly monotone signals than KendallTauCCombinations.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.

Somers’ D Combinations

See Somers’ D for more details on the metric.

class AutoCarver.combinations.SomersDCombinations(*, verbose: bool = False, target_rate: TargetRate[XAgg] | None = None)

Somers’ D based combination evaluation toolkit.

The original asymmetric Somers’ D D(Y|X) — concordant minus discordant pairs over pairs untied on the feature X — matching scipy.stats.somersd(table).statistic. Being asymmetric it leans strongly toward the coarsest split (its maximum over groupings is typically two modalities); offered for users who specifically want raw Somers’ D rather than the self-balancing Kendall taus.

Parameters:

verbose (bool, optional) – Whether to print progress / statistics, by default False.
target_rate (TargetRate, optional) – Target rate strategy. If None, each evaluator subclass picks its own default in _init_target_rate().

classmethod load(file: Path | dict) → CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:: file (Path | dict) – pathlib.Path of the .json file or its already-parsed content.
Returns:: A ready-to-use CombinationEvaluator
Return type:: CombinationEvaluator

save(file_name: Path) → None

Saves CombinationEvaluator to .json file.

Parameters:: file_name (Path) – pathlib.Path of the .json file to write.