{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Setting things up\n", "\n", "## About this notebook\n", "\n", "In this notebook, we embark on a journey to enhance the predictive power of the Titanic Dataset through sophisticated preprocessing using the ``BinaryCarver`` pipeline. Designed to maximize associations in the data, ``BinaryCarver`` is a robust Python tool capable of discretizing any type of data—whether it be quantitative or qualitative. Our specific focus is on preparing the dataset for binary classification tasks, such as predicting survival outcomes.\n", "\n", "The Titanic Dataset, derived from the iconic 1912 Titanic passenger information, provides a diverse set of features ranging from socio-economic status and age to cabin location. Leveraging ``BinaryCarver``, we aim to perform association-maximizing discretization, refining both quantitative and qualitative features to create a finely tuned dataset for our binary classification endeavors.\n", "\n", "Throughout this notebook, we'll delve into the intricacies of ``BinaryCarver``'s discretization pipeline, exploring its capabilities in handling a variety of data types. Whether it's transforming passenger ages or classifying fares, ``BinaryCarver``'s adaptability ensures that every feature is optimally represented for our classification tasks.\n", "\n", "Join us in this exploration as we harness the power of ``BinaryCarver`` to preprocess the Titanic Dataset. Through effective feature engineering and discretization, we strive to create a dataset that not only captures the nuances of the Titanic passenger profiles but also sets the stage for the development of accurate and impactful binary classification models.\n", "\n", "Let's dive in and uncover the potential of ``BinaryCarver`` in transforming the Titanic Dataset for optimal predictive modeling.\n", "\n", "\n", "## Installation" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# %pip install AutoCarver[jupyter]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Titanic Data\n", "\n", "In this example notebook, we will use the Titanic dataset.\n", "\n", "The Titanic dataset is a well-known and frequently used dataset in the field of machine learning and data science. It provides information about the passengers on board the Titanic, the famous ship that sank on its maiden voyage in 1912. The dataset is often used for predictive modeling, classification, and regression tasks.\n", "\n", "The dataset includes various features such as passengers' names, ages, genders, ticket classes, cabin information, and whether they survived or not. The primary goal when working with the Titanic dataset is often to build predictive models that can infer whether a passenger survived or perished based on their individual characteristics (binary classification)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "Siblings/Spouses Aboard | \n", "Parents/Children Aboard | \n", "Fare | \n", "
|---|---|---|---|---|---|---|---|---|
| 0 | \n", "0 | \n", "3 | \n", "Mr. Owen Harris Braund | \n", "male | \n", "22.0 | \n", "1 | \n", "0 | \n", "7.2500 | \n", "
| 1 | \n", "1 | \n", "1 | \n", "Mrs. John Bradley (Florence Briggs Thayer) Cum... | \n", "female | \n", "38.0 | \n", "1 | \n", "0 | \n", "71.2833 | \n", "
| 2 | \n", "1 | \n", "3 | \n", "Miss. Laina Heikkinen | \n", "female | \n", "26.0 | \n", "0 | \n", "0 | \n", "7.9250 | \n", "
| 3 | \n", "1 | \n", "1 | \n", "Mrs. Jacques Heath (Lily May Peel) Futrelle | \n", "female | \n", "35.0 | \n", "1 | \n", "0 | \n", "53.1000 | \n", "
| 4 | \n", "0 | \n", "3 | \n", "Mr. William Henry Allen | \n", "male | \n", "35.0 | \n", "0 | \n", "0 | \n", "8.0500 | \n", "
| \n", " | Survived | \n", "Pclass | \n", "Name | \n", "Sex | \n", "Age | \n", "Siblings/Spouses Aboard | \n", "Parents/Children Aboard | \n", "Fare | \n", "
|---|---|---|---|---|---|---|---|---|
| 617 | \n", "0 | \n", "3 | \n", "Mr. Antoni Yasbeck | \n", "male | \n", "27.0 | \n", "1 | \n", "0 | \n", "14.4542 | \n", "
| 489 | \n", "0 | \n", "1 | \n", "Mr. Harry Markland Molson | \n", "male | \n", "55.0 | \n", "0 | \n", "0 | \n", "30.5000 | \n", "
| 871 | \n", "1 | \n", "3 | \n", "Miss. Adele Kiamie Najib | \n", "female | \n", "15.0 | \n", "0 | \n", "0 | \n", "7.2250 | \n", "
| 654 | \n", "0 | \n", "3 | \n", "Mrs. John (Catherine) Bourke | \n", "female | \n", "32.0 | \n", "1 | \n", "1 | \n", "15.5000 | \n", "
| 653 | \n", "0 | \n", "3 | \n", "Mr. Alexander Radeff | \n", "male | \n", "27.0 | \n", "0 | \n", "0 | \n", "7.8958 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| male | \n", "0.1878 | \n", "0.6364 | \n", "378 | \n", "
| female | \n", "0.7315 | \n", "0.3636 | \n", "216 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.1949 | \n", "0.6655 | \n", "195 | \n", "
| 0.7653 | \n", "0.3345 | \n", "98 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| male | \n", "0.1878 | \n", "0.6364 | \n", "378 | \n", "
| female | \n", "0.7315 | \n", "0.3636 | \n", "216 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.1949 | \n", "0.6655 | \n", "195 | \n", "
| 0.7653 | \n", "0.3345 | \n", "98 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| 1 | \n", "0.6197 | \n", "0.2391 | \n", "142 | \n", "
| 2 | \n", "0.4683 | \n", "0.2121 | \n", "126 | \n", "
| 3 | \n", "0.2515 | \n", "0.5488 | \n", "326 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.6486 | \n", "0.2526 | \n", "74 | \n", "
| 0.4828 | \n", "0.1980 | \n", "58 | \n", "
| 0.2298 | \n", "0.5495 | \n", "161 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| 1 to 2 | \n", "0.5485 | \n", "0.4512 | \n", "268 | \n", "
| 3 | \n", "0.2515 | \n", "0.5488 | \n", "326 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.5758 | \n", "0.4505 | \n", "132 | \n", "
| 0.2298 | \n", "0.5495 | \n", "161 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 1.00e+00 | \n", "0.8333 | \n", "0.0202 | \n", "12 | \n", "
| 1.00e+00 < x <= 2.00e+00 | \n", "0.5000 | \n", "0.0067 | \n", "4 | \n", "
| 2.00e+00 < x <= 4.00e+00 | \n", "0.7143 | \n", "0.0236 | \n", "14 | \n", "
| 4.00e+00 < x <= 6.00e+00 | \n", "0.5714 | \n", "0.0118 | \n", "7 | \n", "
| 6.00e+00 < x <= 8.00e+00 | \n", "0.2857 | \n", "0.0118 | \n", "7 | \n", "
| 8.00e+00 < x <= 1.00e+01 | \n", "0.0000 | \n", "0.0101 | \n", "6 | \n", "
| 1.00e+01 < x <= 1.40e+01 | \n", "0.3333 | \n", "0.0152 | \n", "9 | \n", "
| 1.40e+01 < x <= 1.60e+01 | \n", "0.5000 | \n", "0.0303 | \n", "18 | \n", "
| 1.60e+01 < x <= 1.70e+01 | \n", "0.3000 | \n", "0.0168 | \n", "10 | \n", "
| 1.70e+01 < x <= 1.80e+01 | \n", "0.3333 | \n", "0.0354 | \n", "21 | \n", "
| 1.80e+01 < x <= 1.90e+01 | \n", "0.3913 | \n", "0.0387 | \n", "23 | \n", "
| 1.90e+01 < x <= 2.05e+01 | \n", "0.1111 | \n", "0.0303 | \n", "18 | \n", "
| 2.05e+01 < x <= 2.10e+01 | \n", "0.1905 | \n", "0.0354 | \n", "21 | \n", "
| 2.10e+01 < x <= 2.20e+01 | \n", "0.4242 | \n", "0.0556 | \n", "33 | \n", "
| 2.20e+01 < x <= 2.35e+01 | \n", "0.4000 | \n", "0.0168 | \n", "10 | \n", "
| 2.35e+01 < x <= 2.40e+01 | \n", "0.5417 | \n", "0.0404 | \n", "24 | \n", "
| 2.40e+01 < x <= 2.50e+01 | \n", "0.1333 | \n", "0.0253 | \n", "15 | \n", "
| 2.50e+01 < x <= 2.60e+01 | \n", "0.4000 | \n", "0.0168 | \n", "10 | \n", "
| 2.60e+01 < x <= 2.70e+01 | \n", "0.5000 | \n", "0.0337 | \n", "20 | \n", "
| 2.70e+01 < x <= 2.85e+01 | \n", "0.2500 | \n", "0.0337 | \n", "20 | \n", "
| 2.85e+01 < x <= 2.90e+01 | \n", "0.4444 | \n", "0.0303 | \n", "18 | \n", "
| 2.90e+01 < x <= 3.05e+01 | \n", "0.2692 | \n", "0.0438 | \n", "26 | \n", "
| 3.05e+01 < x <= 3.10e+01 | \n", "0.4545 | \n", "0.0185 | \n", "11 | \n", "
| 3.10e+01 < x <= 3.25e+01 | \n", "0.5000 | \n", "0.0303 | \n", "18 | \n", "
| 3.25e+01 < x <= 3.30e+01 | \n", "0.3636 | \n", "0.0185 | \n", "11 | \n", "
| 3.30e+01 < x <= 3.45e+01 | \n", "0.2857 | \n", "0.0236 | \n", "14 | \n", "
| 3.45e+01 < x <= 3.50e+01 | \n", "0.5455 | \n", "0.0185 | \n", "11 | \n", "
| 3.50e+01 < x <= 3.60e+01 | \n", "0.4375 | \n", "0.0269 | \n", "16 | \n", "
| 3.60e+01 < x <= 3.70e+01 | \n", "0.2222 | \n", "0.0152 | \n", "9 | \n", "
| 3.70e+01 < x <= 3.80e+01 | \n", "0.6250 | \n", "0.0135 | \n", "8 | \n", "
| 3.80e+01 < x <= 3.90e+01 | \n", "0.3333 | \n", "0.0202 | \n", "12 | \n", "
| 3.90e+01 < x <= 4.00e+01 | \n", "0.4615 | \n", "0.0219 | \n", "13 | \n", "
| 4.00e+01 < x <= 4.10e+01 | \n", "0.3333 | \n", "0.0101 | \n", "6 | \n", "
| 4.10e+01 < x <= 4.20e+01 | \n", "0.4615 | \n", "0.0219 | \n", "13 | \n", "
| 4.20e+01 < x <= 4.40e+01 | \n", "0.3333 | \n", "0.0202 | \n", "12 | \n", "
| 4.40e+01 < x <= 4.50e+01 | \n", "0.4545 | \n", "0.0185 | \n", "11 | \n", "
| 4.50e+01 < x <= 4.60e+01 | \n", "0.2500 | \n", "0.0067 | \n", "4 | \n", "
| 4.60e+01 < x <= 4.70e+01 | \n", "0.2500 | \n", "0.0135 | \n", "8 | \n", "
| 4.70e+01 < x <= 4.80e+01 | \n", "0.7778 | \n", "0.0152 | \n", "9 | \n", "
| 4.80e+01 < x <= 5.00e+01 | \n", "0.4545 | \n", "0.0185 | \n", "11 | \n", "
| 5.00e+01 < x <= 5.10e+01 | \n", "0.2000 | \n", "0.0084 | \n", "5 | \n", "
| 5.10e+01 < x <= 5.40e+01 | \n", "0.4000 | \n", "0.0168 | \n", "10 | \n", "
| 5.40e+01 < x <= 5.60e+01 | \n", "0.3750 | \n", "0.0135 | \n", "8 | \n", "
| 5.60e+01 < x <= 5.80e+01 | \n", "0.3333 | \n", "0.0101 | \n", "6 | \n", "
| 5.80e+01 < x <= 6.10e+01 | \n", "0.0000 | \n", "0.0118 | \n", "7 | \n", "
| 6.10e+01 < x <= 6.50e+01 | \n", "0.2857 | \n", "0.0118 | \n", "7 | \n", "
| 6.50e+01 < x | \n", "0.1250 | \n", "0.0135 | \n", "8 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 1.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.2857 | \n", "0.0239 | \n", "7 | \n", "
| 0.7500 | \n", "0.0137 | \n", "4 | \n", "
| 1.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.5000 | \n", "0.0137 | \n", "4 | \n", "
| 0.5000 | \n", "0.0137 | \n", "4 | \n", "
| 0.6667 | \n", "0.0205 | \n", "6 | \n", "
| 0.2500 | \n", "0.0273 | \n", "8 | \n", "
| 0.5000 | \n", "0.0205 | \n", "6 | \n", "
| 0.4000 | \n", "0.0512 | \n", "15 | \n", "
| 0.2000 | \n", "0.0341 | \n", "10 | \n", "
| 0.3333 | \n", "0.0205 | \n", "6 | \n", "
| 0.1538 | \n", "0.0444 | \n", "13 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| 0.1875 | \n", "0.0546 | \n", "16 | \n", "
| 0.5000 | \n", "0.0341 | \n", "10 | \n", "
| 0.5000 | \n", "0.0341 | \n", "10 | \n", "
| 0.2727 | \n", "0.0375 | \n", "11 | \n", "
| 0.5000 | \n", "0.0205 | \n", "6 | \n", "
| 0.2632 | \n", "0.0648 | \n", "19 | \n", "
| 0.4286 | \n", "0.0239 | \n", "7 | \n", "
| 0.3333 | \n", "0.0307 | \n", "9 | \n", "
| 0.6250 | \n", "0.0273 | \n", "8 | \n", "
| 0.4000 | \n", "0.0171 | \n", "5 | \n", "
| 0.6667 | \n", "0.0205 | \n", "6 | \n", "
| 0.7500 | \n", "0.0137 | \n", "4 | \n", "
| 0.6000 | \n", "0.0341 | \n", "10 | \n", "
| 0.5714 | \n", "0.0239 | \n", "7 | \n", "
| 0.2500 | \n", "0.0137 | \n", "4 | \n", "
| 0.2500 | \n", "0.0137 | \n", "4 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.5000 | \n", "0.0137 | \n", "4 | \n", "
| 0.0000 | \n", "0.0137 | \n", "4 | \n", "
| 0.3333 | \n", "0.0102 | \n", "3 | \n", "
| 0.2500 | \n", "0.0137 | \n", "4 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.6667 | \n", "0.0102 | \n", "3 | \n", "
| 0.5714 | \n", "0.0239 | \n", "7 | \n", "
| 0.5000 | \n", "0.0068 | \n", "2 | \n", "
| 0.5000 | \n", "0.0205 | \n", "6 | \n", "
| nan | \n", "0.0000 | \n", "0 | \n", "
| 0.5000 | \n", "0.0068 | \n", "2 | \n", "
| 0.6667 | \n", "0.0102 | \n", "3 | \n", "
| 0.3333 | \n", "0.0205 | \n", "6 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 6.00e+00 | \n", "0.7027 | \n", "0.0623 | \n", "37 | \n", "
| 6.00e+00 < x <= 5.80e+01 | \n", "0.3738 | \n", "0.9007 | \n", "535 | \n", "
| 5.80e+01 < x | \n", "0.1364 | \n", "0.0370 | \n", "22 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.6000 | \n", "0.0512 | \n", "15 | \n", "
| 0.3745 | \n", "0.9113 | \n", "267 | \n", "
| 0.3636 | \n", "0.0375 | \n", "11 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 0.0000e+00 | \n", "0.0000 | \n", "0.0135 | \n", "8 | \n", "
| 0.0000e+00 < x <= 6.8583e+00 | \n", "0.0000 | \n", "0.0135 | \n", "8 | \n", "
| 6.8583e+00 < x <= 7.0500e+00 | \n", "0.1111 | \n", "0.0152 | \n", "9 | \n", "
| 7.0500e+00 < x <= 7.2250e+00 | \n", "0.2143 | \n", "0.0236 | \n", "14 | \n", "
| 7.2250e+00 < x <= 7.2292e+00 | \n", "0.2727 | \n", "0.0185 | \n", "11 | \n", "
| 7.2292e+00 < x <= 7.2500e+00 | \n", "0.0909 | \n", "0.0185 | \n", "11 | \n", "
| 7.2500e+00 < x <= 7.6500e+00 | \n", "0.2222 | \n", "0.0152 | \n", "9 | \n", "
| 7.6500e+00 < x <= 7.7500e+00 | \n", "0.3871 | \n", "0.0522 | \n", "31 | \n", "
| 7.7500e+00 < x <= 7.7750e+00 | \n", "0.3333 | \n", "0.0152 | \n", "9 | \n", "
| 7.7750e+00 < x <= 7.8292e+00 | \n", "0.4000 | \n", "0.0084 | \n", "5 | \n", "
| 7.8292e+00 < x <= 7.8542e+00 | \n", "0.3000 | \n", "0.0168 | \n", "10 | \n", "
| 7.8542e+00 < x <= 7.8875e+00 | \n", "0.8000 | \n", "0.0084 | \n", "5 | \n", "
| 7.8875e+00 < x <= 7.8958e+00 | \n", "0.0000 | \n", "0.0387 | \n", "23 | \n", "
| 7.8958e+00 < x <= 8.0292e+00 | \n", "0.5000 | \n", "0.0269 | \n", "16 | \n", "
| 8.0292e+00 < x <= 8.0500e+00 | \n", "0.0968 | \n", "0.0522 | \n", "31 | \n", "
| 8.0500e+00 < x <= 8.4583e+00 | \n", "0.0000 | \n", "0.0051 | \n", "3 | \n", "
| 8.4583e+00 < x <= 8.6625e+00 | \n", "0.1111 | \n", "0.0152 | \n", "9 | \n", "
| 8.6625e+00 < x <= 9.3500e+00 | \n", "0.2857 | \n", "0.0118 | \n", "7 | \n", "
| 9.3500e+00 < x <= 9.5000e+00 | \n", "0.3333 | \n", "0.0101 | \n", "6 | \n", "
| 9.5000e+00 < x <= 1.0500e+01 | \n", "0.3684 | \n", "0.0320 | \n", "19 | \n", "
| 1.0500e+01 < x <= 1.2287e+01 | \n", "0.6250 | \n", "0.0135 | \n", "8 | \n", "
| 1.2287e+01 < x <= 1.3000e+01 | \n", "0.4839 | \n", "0.0522 | \n", "31 | \n", "
| 1.3000e+01 < x <= 1.4000e+01 | \n", "0.5000 | \n", "0.0135 | \n", "8 | \n", "
| 1.4000e+01 < x <= 1.4458e+01 | \n", "0.1111 | \n", "0.0152 | \n", "9 | \n", "
| 1.4458e+01 < x <= 1.5100e+01 | \n", "0.3333 | \n", "0.0101 | \n", "6 | \n", "
| 1.5100e+01 < x <= 1.5550e+01 | \n", "0.1250 | \n", "0.0135 | \n", "8 | \n", "
| 1.5550e+01 < x <= 1.6100e+01 | \n", "0.5556 | \n", "0.0152 | \n", "9 | \n", "
| 1.6100e+01 < x <= 1.8750e+01 | \n", "0.6250 | \n", "0.0135 | \n", "8 | \n", "
| 1.8750e+01 < x <= 1.9500e+01 | \n", "1.0000 | \n", "0.0084 | \n", "5 | \n", "
| 1.9500e+01 < x <= 2.1000e+01 | \n", "0.1111 | \n", "0.0152 | \n", "9 | \n", "
| 2.1000e+01 < x <= 2.3000e+01 | \n", "0.6250 | \n", "0.0135 | \n", "8 | \n", "
| 2.3000e+01 < x <= 2.4150e+01 | \n", "0.3750 | \n", "0.0135 | \n", "8 | \n", "
| 2.4150e+01 < x <= 2.6000e+01 | \n", "0.3214 | \n", "0.0471 | \n", "28 | \n", "
| 2.6000e+01 < x <= 2.6387e+01 | \n", "0.8000 | \n", "0.0168 | \n", "10 | \n", "
| 2.6387e+01 < x <= 2.6550e+01 | \n", "0.3333 | \n", "0.0152 | \n", "9 | \n", "
| 2.6550e+01 < x <= 2.7900e+01 | \n", "0.2500 | \n", "0.0202 | \n", "12 | \n", "
| 2.7900e+01 < x <= 2.9000e+01 | \n", "0.6667 | \n", "0.0051 | \n", "3 | \n", "
| 2.9000e+01 < x <= 3.0000e+01 | \n", "0.4000 | \n", "0.0168 | \n", "10 | \n", "
| 3.0000e+01 < x <= 3.0696e+01 | \n", "0.3333 | \n", "0.0101 | \n", "6 | \n", "
| 3.0696e+01 < x <= 3.1387e+01 | \n", "0.3333 | \n", "0.0152 | \n", "9 | \n", "
| 3.1387e+01 < x <= 3.3500e+01 | \n", "0.4000 | \n", "0.0084 | \n", "5 | \n", "
| 3.3500e+01 < x <= 3.7004e+01 | \n", "0.2500 | \n", "0.0135 | \n", "8 | \n", "
| 3.7004e+01 < x <= 3.9600e+01 | \n", "0.6250 | \n", "0.0135 | \n", "8 | \n", "
| 3.9600e+01 < x <= 4.1579e+01 | \n", "0.2857 | \n", "0.0118 | \n", "7 | \n", "
| 4.1579e+01 < x <= 4.6900e+01 | \n", "0.0000 | \n", "0.0118 | \n", "7 | \n", "
| 4.6900e+01 < x <= 5.1862e+01 | \n", "0.2857 | \n", "0.0118 | \n", "7 | \n", "
| 5.1862e+01 < x <= 5.2554e+01 | \n", "0.7143 | \n", "0.0118 | \n", "7 | \n", "
| 5.2554e+01 < x <= 5.6496e+01 | \n", "0.7273 | \n", "0.0185 | \n", "11 | \n", "
| 5.6496e+01 < x <= 5.7979e+01 | \n", "1.0000 | \n", "0.0067 | \n", "4 | \n", "
| 5.7979e+01 < x <= 6.9550e+01 | \n", "0.3636 | \n", "0.0185 | \n", "11 | \n", "
| 6.9550e+01 < x <= 7.3500e+01 | \n", "0.1667 | \n", "0.0101 | \n", "6 | \n", "
| 7.3500e+01 < x <= 7.7287e+01 | \n", "0.6667 | \n", "0.0101 | \n", "6 | \n", "
| 7.7287e+01 < x <= 7.9650e+01 | \n", "0.7500 | \n", "0.0135 | \n", "8 | \n", "
| 7.9650e+01 < x <= 8.3158e+01 | \n", "0.8571 | \n", "0.0118 | \n", "7 | \n", "
| 8.3158e+01 < x <= 9.0000e+01 | \n", "0.8571 | \n", "0.0118 | \n", "7 | \n", "
| 9.0000e+01 < x <= 1.1088e+02 | \n", "0.7143 | \n", "0.0118 | \n", "7 | \n", "
| 1.1088e+02 < x <= 1.3365e+02 | \n", "0.8571 | \n", "0.0118 | \n", "7 | \n", "
| 1.3365e+02 < x <= 1.5155e+02 | \n", "0.8889 | \n", "0.0152 | \n", "9 | \n", "
| 1.5155e+02 < x <= 2.1134e+02 | \n", "0.8571 | \n", "0.0118 | \n", "7 | \n", "
| 2.1134e+02 < x | \n", "0.5714 | \n", "0.0118 | \n", "7 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.1429 | \n", "0.0239 | \n", "7 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.2500 | \n", "0.0137 | \n", "4 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.2727 | \n", "0.0375 | \n", "11 | \n", "
| 0.0000 | \n", "0.0239 | \n", "7 | \n", "
| 0.4000 | \n", "0.0171 | \n", "5 | \n", "
| 0.0000 | \n", "0.0102 | \n", "3 | \n", "
| 0.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.0769 | \n", "0.0444 | \n", "13 | \n", "
| 0.3333 | \n", "0.0102 | \n", "3 | \n", "
| 0.1667 | \n", "0.0410 | \n", "12 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| 0.0000 | \n", "0.0102 | \n", "3 | \n", "
| 0.0000 | \n", "0.0171 | \n", "5 | \n", "
| 0.2667 | \n", "0.0512 | \n", "15 | \n", "
| 0.4000 | \n", "0.0171 | \n", "5 | \n", "
| 0.3810 | \n", "0.0717 | \n", "21 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.0000 | \n", "0.0137 | \n", "4 | \n", "
| 0.0000 | \n", "0.0171 | \n", "5 | \n", "
| 1.0000 | \n", "0.0171 | \n", "5 | \n", "
| 0.5000 | \n", "0.0341 | \n", "10 | \n", "
| 0.3333 | \n", "0.0102 | \n", "3 | \n", "
| 0.6667 | \n", "0.0102 | \n", "3 | \n", "
| 0.6250 | \n", "0.0273 | \n", "8 | \n", "
| 0.4000 | \n", "0.0171 | \n", "5 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| 0.7273 | \n", "0.0375 | \n", "11 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.8333 | \n", "0.0205 | \n", "6 | \n", "
| 0.2000 | \n", "0.0171 | \n", "5 | \n", "
| 0.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.5000 | \n", "0.0137 | \n", "4 | \n", "
| 1.0000 | \n", "0.0102 | \n", "3 | \n", "
| 0.4000 | \n", "0.0171 | \n", "5 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.4286 | \n", "0.0239 | \n", "7 | \n", "
| nan | \n", "0.0000 | \n", "0 | \n", "
| 0.0000 | \n", "0.0102 | \n", "3 | \n", "
| nan | \n", "0.0000 | \n", "0 | \n", "
| 1.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.3333 | \n", "0.0102 | \n", "3 | \n", "
| 0.6667 | \n", "0.0205 | \n", "6 | \n", "
| 1.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.4286 | \n", "0.0239 | \n", "7 | \n", "
| 0.5000 | \n", "0.0068 | \n", "2 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.6667 | \n", "0.0205 | \n", "6 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.7500 | \n", "0.0137 | \n", "4 | \n", "
| 0.8000 | \n", "0.0171 | \n", "5 | \n", "
| 1.0000 | \n", "0.0068 | \n", "2 | \n", "
| 0.0000 | \n", "0.0068 | \n", "2 | \n", "
| 1.0000 | \n", "0.0034 | \n", "1 | \n", "
| 0.7000 | \n", "0.0341 | \n", "10 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 5.2e+01 | \n", "0.3184 | \n", "0.8249 | \n", "490 | \n", "
| 5.2e+01 < x | \n", "0.7019 | \n", "0.1751 | \n", "104 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.3278 | \n", "0.8225 | \n", "241 | \n", "
| 0.6538 | \n", "0.1775 | \n", "52 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 0.00e+00 | \n", "0.3614 | \n", "0.6801 | \n", "404 | \n", "
| 0.00e+00 < x <= 1.00e+00 | \n", "0.5000 | \n", "0.2323 | \n", "138 | \n", "
| 1.00e+00 < x <= 2.00e+00 | \n", "0.5500 | \n", "0.0337 | \n", "20 | \n", "
| 2.00e+00 < x <= 3.00e+00 | \n", "0.1111 | \n", "0.0152 | \n", "9 | \n", "
| 3.00e+00 < x <= 4.00e+00 | \n", "0.1667 | \n", "0.0202 | \n", "12 | \n", "
| 4.00e+00 < x | \n", "0.0000 | \n", "0.0185 | \n", "11 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.3200 | \n", "0.6826 | \n", "200 | \n", "
| 0.6056 | \n", "0.2423 | \n", "71 | \n", "
| 0.2500 | \n", "0.0273 | \n", "8 | \n", "
| 0.4286 | \n", "0.0239 | \n", "7 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| 0.0000 | \n", "0.0034 | \n", "1 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 0.00e+00 | \n", "0.3614 | \n", "0.6801 | \n", "404 | \n", "
| 0.00e+00 < x <= 2.00e+00 | \n", "0.5063 | \n", "0.2660 | \n", "158 | \n", "
| 2.00e+00 < x | \n", "0.0938 | \n", "0.0539 | \n", "32 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.3200 | \n", "0.6826 | \n", "200 | \n", "
| 0.5696 | \n", "0.2696 | \n", "79 | \n", "
| 0.2857 | \n", "0.0478 | \n", "14 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 0.00e+00 | \n", "0.3447 | \n", "0.7374 | \n", "438 | \n", "
| 0.00e+00 < x <= 1.00e+00 | \n", "0.5057 | \n", "0.1465 | \n", "87 | \n", "
| 1.00e+00 < x <= 2.00e+00 | \n", "0.5167 | \n", "0.1010 | \n", "60 | \n", "
| 2.00e+00 < x | \n", "0.3333 | \n", "0.0152 | \n", "9 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.3475 | \n", "0.8055 | \n", "236 | \n", "
| 0.6774 | \n", "0.1058 | \n", "31 | \n", "
| 0.4500 | \n", "0.0683 | \n", "20 | \n", "
| 0.1667 | \n", "0.0205 | \n", "6 | \n", "
| \n", " | target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|---|
| x <= 0.0e+00 | \n", "0.3447 | \n", "0.7374 | \n", "438 | \n", "
| 0.0e+00 < x | \n", "0.5000 | \n", "0.2626 | \n", "156 | \n", "
| target_mean | \n", "frequency | \n", "count | \n", "
|---|---|---|
| 0.3475 | \n", "0.8055 | \n", "236 | \n", "
| 0.5439 | \n", "0.1945 | \n", "57 | \n", "
| \n", " | \n", " | \n", " | \n", " | \n", " | content | \n", "target_mean | \n", "frequency | \n", "dropped | \n", "dropped_reason | \n", "
|---|---|---|---|---|---|---|---|---|---|
| feature | \n", "count | \n", "tschuprowt | \n", "n_mod | \n", "label | \n", "\n", " | \n", " | \n", " | \n", " | \n", " |
| Categorical('Sex') | \n", "378.0 | \n", "0.533719 | \n", "2 | \n", "0 | \n", "male | \n", "0.187831 | \n", "0.636364 | \n", "False | \n", "None | \n", "
| 216.0 | \n", "0.533719 | \n", "2 | \n", "1 | \n", "female | \n", "0.731481 | \n", "0.363636 | \n", "False | \n", "None | \n", "|
| Ordinal('Pclass') | \n", "268.0 | \n", "0.300144 | \n", "2 | \n", "0 | \n", "[2, 1] | \n", "0.548507 | \n", "0.451178 | \n", "False | \n", "None | \n", "
| 326.0 | \n", "0.300144 | \n", "2 | \n", "1 | \n", "3 | \n", "0.251534 | \n", "0.548822 | \n", "False | \n", "None | \n", "|
| Numerical('Age') | \n", "37.0 | \n", "0.161045 | \n", "3 | \n", "0 | \n", "x <= 6.00e+00 | \n", "0.702703 | \n", "0.062290 | \n", "False | \n", "None | \n", "
| 535.0 | \n", "0.161045 | \n", "3 | \n", "1 | \n", "6.00e+00 < x <= 5.80e+01 | \n", "0.373832 | \n", "0.900673 | \n", "False | \n", "None | \n", "|
| 22.0 | \n", "0.161045 | \n", "3 | \n", "2 | \n", "5.80e+01 < x | \n", "0.136364 | \n", "0.037037 | \n", "False | \n", "None | \n", "|
| Numerical('Fare') | \n", "490.0 | \n", "0.294937 | \n", "2 | \n", "0 | \n", "x <= 5.2e+01 | \n", "0.318367 | \n", "0.824916 | \n", "False | \n", "None | \n", "
| 104.0 | \n", "0.294937 | \n", "2 | \n", "1 | \n", "5.2e+01 < x | \n", "0.701923 | \n", "0.175084 | \n", "False | \n", "None | \n", "|
| Numerical('Siblings/Spouses Aboard') | \n", "404.0 | \n", "0.162663 | \n", "3 | \n", "0 | \n", "x <= 0.00e+00 | \n", "0.361386 | \n", "0.680135 | \n", "False | \n", "None | \n", "
| 158.0 | \n", "0.162663 | \n", "3 | \n", "1 | \n", "0.00e+00 < x <= 2.00e+00 | \n", "0.506329 | \n", "0.265993 | \n", "False | \n", "None | \n", "|
| 32.0 | \n", "0.162663 | \n", "3 | \n", "2 | \n", "2.00e+00 < x | \n", "0.093750 | \n", "0.053872 | \n", "False | \n", "None | \n", "|
| Numerical('Parents/Children Aboard') | \n", "438.0 | \n", "0.136439 | \n", "2 | \n", "0 | \n", "x <= 0.0e+00 | \n", "0.344749 | \n", "0.737374 | \n", "False | \n", "None | \n", "
| 156.0 | \n", "0.136439 | \n", "2 | \n", "1 | \n", "0.0e+00 < x | \n", "0.500000 | \n", "0.262626 | \n", "False | \n", "None | \n", "
| \n", " | info | \n", "cramerv | \n", "tschuprowt | \n", "combination | \n", "n_mod | \n", "dropna | \n", "train | \n", "viable | \n", "dev | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "Raw distribution | \n", "0.321044 | \n", "0.269965 | \n", "{'1': '1', '2': '2', '3': '3'} | \n", "3 | \n", "False | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
| 1 | \n", "Best for tschuprowt and max_n_mod=5 | \n", "0.300144 | \n", "0.300144 | \n", "{'1': '1', '2': '1', '3': '3'} | \n", "2 | \n", "False | \n", "{'viable': True, 'info': ''} | \n", "True | \n", "{'viable': True, 'info': ''} | \n", "
| \n", " | Sex | \n", "Pclass | \n", "Age | \n", "Fare | \n", "Siblings/Spouses Aboard | \n", "Parents/Children Aboard | \n", "
|---|---|---|---|---|---|---|
| 0.0 | \n", "0.665529 | \n", "0.450512 | \n", "0.051195 | \n", "0.822526 | \n", "0.682594 | \n", "0.805461 | \n", "
| 1.0 | \n", "0.334471 | \n", "0.549488 | \n", "0.911263 | \n", "0.177474 | \n", "0.269625 | \n", "0.194539 | \n", "
| 2.0 | \n", "NaN | \n", "NaN | \n", "0.037543 | \n", "NaN | \n", "0.047782 | \n", "NaN | \n", "
| \n", " | feature | \n", "Nan | \n", "Mode | \n", "TschuprowtMeasure | \n", "TschuprowtRank | \n", "TschuprowtFilter | \n", "TschuprowtWith | \n", "
|---|---|---|---|---|---|---|---|
| 0 | \n", "Categorical('Sex') | \n", "0.0000 | \n", "0.6364 | \n", "0.5373 | \n", "0.0000 | \n", "0.0000 | \n", "itself | \n", "
| 1 | \n", "Ordinal('Pclass') | \n", "0.0000 | \n", "0.5488 | \n", "0.3036 | \n", "1.0000 | \n", "0.0988 | \n", "Sex | \n", "
| 3 | \n", "Numerical('Fare') | \n", "0.0000 | \n", "0.8249 | \n", "0.2995 | \n", "2.0000 | \n", "0.4057 | \n", "Pclass | \n", "
| 4 | \n", "Numerical('Siblings/Spouses Aboard') | \n", "0.0000 | \n", "0.6801 | \n", "0.1627 | \n", "3.0000 | \n", "0.2383 | \n", "Pclass | \n", "
| 2 | \n", "Numerical('Age') | \n", "0.0000 | \n", "0.9007 | \n", "0.1610 | \n", "4.0000 | \n", "0.2576 | \n", "Siblings/Spouses Aboard | \n", "
| 5 | \n", "Numerical('Parents/Children Aboard') | \n", "0.0000 | \n", "0.7374 | \n", "0.1404 | \n", "5.0000 | \n", "0.4257 | \n", "Siblings/Spouses Aboard | \n", "
| \n", " | Sex | \n", "Pclass | \n", "Fare | \n", "Siblings/Spouses Aboard | \n", "
|---|---|---|---|---|
| 617 | \n", "0 | \n", "1 | \n", "0.0 | \n", "1 | \n", "
| 489 | \n", "0 | \n", "0 | \n", "0.0 | \n", "0 | \n", "
| 871 | \n", "1 | \n", "1 | \n", "0.0 | \n", "0 | \n", "
| 654 | \n", "1 | \n", "1 | \n", "0.0 | \n", "1 | \n", "
| 653 | \n", "0 | \n", "1 | \n", "0.0 | \n", "0 | \n", "
| \n", " | Sex | \n", "Pclass | \n", "Fare | \n", "Siblings/Spouses Aboard | \n", "
|---|---|---|---|---|
| 0.0 | \n", "378.0 | \n", "268.0 | \n", "490.0 | \n", "404 | \n", "
| 1.0 | \n", "216.0 | \n", "326.0 | \n", "104.0 | \n", "158 | \n", "
| 2.0 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "32 | \n", "
XGBClassifier(base_score=None, booster=None, callbacks=None,\n",
" colsample_bylevel=None, colsample_bynode=None,\n",
" colsample_bytree=None, device=None, early_stopping_rounds=None,\n",
" enable_categorical=False, eval_metric=None, feature_types=None,\n",
" feature_weights=None, gamma=None, grow_policy=None,\n",
" importance_type=None, interaction_constraints=None,\n",
" learning_rate=None, max_bin=None, max_cat_threshold=None,\n",
" max_cat_to_onehot=None, max_delta_step=None, max_depth=None,\n",
" max_leaves=None, min_child_weight=None, missing=nan,\n",
" monotone_constraints=None, multi_strategy=None, n_estimators=None,\n",
" n_jobs=None, num_parallel_tree=None, ...)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.