{ "cells": [ { "cell_type": "markdown", "id": "3d6bfa70", "metadata": {}, "source": [ "# Setting things up\n", "\n", "## About this notebook\n", "\n", "In this notebook we use [`RegressionSelector`](https://autocarver.readthedocs.io/en/latest/selectors.html#regression-tasks) to quickly rank and select the features most associated with a **continuous** target — here the median house value of the California Housing dataset. Unlike a full carving pass, the selector is a lightweight, association-centric step: it scores every feature against the target, ranks them, and drops those too correlated with a better-ranked feature.\n", "\n", "`RegressionSelector` scores **every** feature exactly (no sampling) yet stays fast: each measure is computed for all features of a type in a single vectorized pass. By default it uses **Spearman's rho** for quantitative features and the **Kruskal-Wallis eta-squared** effect size for qualitative ones (the latter via a reversed test, since the target is continuous)." ] }, { "cell_type": "markdown", "id": "25639069", "metadata": {}, "source": [ "## Installation" ] }, { "cell_type": "code", "execution_count": 1, "id": "64e9d727", "metadata": { "execution": { "iopub.execute_input": "2026-06-14T00:59:14.027810Z", "iopub.status.busy": "2026-06-14T00:59:14.026764Z", "iopub.status.idle": "2026-06-14T00:59:14.033781Z", "shell.execute_reply": "2026-06-14T00:59:14.032779Z" } }, "outputs": [], "source": [ "# %pip install AutoCarver[jupyter]" ] }, { "cell_type": "markdown", "id": "841d8358", "metadata": {}, "source": [ "## California Housing data\n", "\n", "The [California Housing dataset](https://scikit-learn.org/stable/datasets/real_world.html#california-housing-dataset) ships with `scikit-learn`. Each row is a census block group; the target `MedHouseVal` is the median house value (in $100,000s) — a continuous regression target." ] }, { "cell_type": "code", "execution_count": 1, "id": "d737ce96", "metadata": { "execution": { "iopub.execute_input": "2026-06-14T00:59:14.038284Z", "iopub.status.busy": "2026-06-14T00:59:14.037241Z", "iopub.status.idle": "2026-06-14T00:59:15.479375Z", "shell.execute_reply": "2026-06-14T00:59:15.479375Z" } }, "outputs": [ { "data": { "text/html": [ "
| \n", " | MedInc | \n", "HouseAge | \n", "AveRooms | \n", "AveBedrms | \n", "Population | \n", "AveOccup | \n", "Latitude | \n", "Longitude | \n", "MedHouseVal | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "8.3252 | \n", "41.0 | \n", "6.984127 | \n", "1.023810 | \n", "322.0 | \n", "2.555556 | \n", "37.88 | \n", "-122.23 | \n", "4.526 | \n", "
| 1 | \n", "8.3014 | \n", "21.0 | \n", "6.238137 | \n", "0.971880 | \n", "2401.0 | \n", "2.109842 | \n", "37.86 | \n", "-122.22 | \n", "3.585 | \n", "
| 2 | \n", "7.2574 | \n", "52.0 | \n", "8.288136 | \n", "1.073446 | \n", "496.0 | \n", "2.802260 | \n", "37.85 | \n", "-122.24 | \n", "3.521 | \n", "
| 3 | \n", "5.6431 | \n", "52.0 | \n", "5.817352 | \n", "1.073059 | \n", "558.0 | \n", "2.547945 | \n", "37.85 | \n", "-122.25 | \n", "3.413 | \n", "
| 4 | \n", "3.8462 | \n", "52.0 | \n", "6.281853 | \n", "1.081081 | \n", "565.0 | \n", "2.181467 | \n", "37.85 | \n", "-122.25 | \n", "3.422 | \n", "