Combinations

Combinations are at the core of Carvers. They are used to identify the best combination from all possible combinations with up to max_n_mod modalities.

Classification tasks

Cramér’s V Combinations

See Cramér’s V for more details on the metric.

class AutoCarver.combinations.CramervCombinations(max_n_mod: int = 5, **kwargs)

Cramér’s V based combination evaluation toolkit

Parameters:

max_n_mod (int, optional) –

Maximum number of modalities per feature, by default 5

  • The combination with the best association will be selected.

  • All combinations of sizes from 1 to max_n_mod are tested out.

Tip

Set between 3 (faster, more robust) and 7 (slower, less robust)

Keyword Arguments:
  • min_freq (float, optional) –

    Minimum frequency per modality per feature, by default None

    • Features need at least one modality more frequent than min_freq

    • Defines number of quantiles of continuous features

    • Minimum frequency of modality of quantitative features

    Tip

    Set between 0.01 (slower, less robust) and 0.2 (faster, more robust)

  • dropna (bool, optional) –

    • True, try to group nan with other modalities.

    • False, nan are ignored (not grouped), by default False

  • verbose (bool, optional) –

    • True, without IPython: prints raw statitics

    • True, with IPython: prints HTML statistics, by default False

classmethod load(file: str | dict) CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:

file (str | dict) – String of .json file name or content of the file.

Returns:

A ready-to-use CombinationEvaluator

Return type:

CombinationEvaluator

save(file_name: str) None

Saves CombinationEvaluator to .json file.

Parameters:

file_name (str) – String of .json file name

Tschuprow’s T Combinations

See Tschuprow’s T for more details on the metric.

class AutoCarver.combinations.TschuprowtCombinations(max_n_mod: int = 5, **kwargs)

Tschuprow’s T based combination evaluation toolkit

Parameters:

max_n_mod (int, optional) –

Maximum number of modalities per feature, by default 5

  • The combination with the best association will be selected.

  • All combinations of sizes from 1 to max_n_mod are tested out.

Tip

Set between 3 (faster, more robust) and 7 (slower, less robust)

Keyword Arguments:
  • min_freq (float, optional) –

    Minimum frequency per modality per feature, by default None

    • Features need at least one modality more frequent than min_freq

    • Defines number of quantiles of continuous features

    • Minimum frequency of modality of quantitative features

    Tip

    Set between 0.01 (slower, less robust) and 0.2 (faster, more robust)

  • dropna (bool, optional) –

    • True, try to group nan with other modalities.

    • False, nan are ignored (not grouped), by default False

  • verbose (bool, optional) –

    • True, without IPython: prints raw statitics

    • True, with IPython: prints HTML statistics, by default False

classmethod load(file: str | dict) CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:

file (str | dict) – String of .json file name or content of the file.

Returns:

A ready-to-use CombinationEvaluator

Return type:

CombinationEvaluator

save(file_name: str) None

Saves CombinationEvaluator to .json file.

Parameters:

file_name (str) – String of .json file name

Regression tasks

Kruskal’s H Combinations

See Kruskal-Wallis’ H test statistic for more details on the metric.

class AutoCarver.combinations.KruskalCombinations(max_n_mod: int = 5, **kwargs)

Kruskal-Wallis’ H based combination evaluation toolkit

Parameters:

max_n_mod (int, optional) –

Maximum number of modalities per feature, by default 5

  • The combination with the best association will be selected.

  • All combinations of sizes from 1 to max_n_mod are tested out.

Tip

Set between 3 (faster, more robust) and 7 (slower, less robust)

Keyword Arguments:
  • min_freq (float, optional) –

    Minimum frequency per modality per feature, by default None

    • Features need at least one modality more frequent than min_freq

    • Defines number of quantiles of continuous features

    • Minimum frequency of modality of quantitative features

    Tip

    Set between 0.01 (slower, less robust) and 0.2 (faster, more robust)

  • dropna (bool, optional) –

    • True, try to group nan with other modalities.

    • False, nan are ignored (not grouped), by default False

  • verbose (bool, optional) –

    • True, without IPython: prints raw statitics

    • True, with IPython: prints HTML statistics, by default False

classmethod load(file: str | dict) CombinationEvaluator

Allows one to load a CombinationEvaluator saved as a .json file.

Parameters:

file (str | dict) – String of .json file name or content of the file.

Returns:

A ready-to-use CombinationEvaluator

Return type:

CombinationEvaluator

save(file_name: str) None

Saves CombinationEvaluator to .json file.

Parameters:

file_name (str) – String of .json file name