Features
The AutoCarver.features module defines a set of features used in the AutoCarver project. This module includes classes and functions to handle different types of features, such as qualitative and quantitative features.
Features
- class AutoCarver.features.Features(categoricals: list[str] | None = None, quantitatives: list[str] | None = None, ordinals: dict[str, list[str]] | None = None, datetimes: list[tuple[str, str]] | None = None, config: FeaturesConfig | None = None)
A set of typed features
Build a
Featurescollection from column names.- Parameters:
categoricals (list[str], optional) – Categorical column names, by default
None.quantitatives (list[str], optional) – Quantitative column names, by default
None.ordinals (dict[str, list[str]], optional) – Ordinal column names mapped to their ordered value list, by default
None.datetimes (list[tuple[str, str]], optional) – Datetime features as
(column name, reference_date)pairs, by defaultNone. Values are discretized as the number of seconds elapsed sincereference_date.config (FeaturesConfig, optional) – Collection-level config propagated to each feature, by default
None.
Warning
At least one of
categoricals,quantitatives,ordinalsordatetimesmust be provided. To build aFeaturesfrom already-instantiated feature objects, useFeatures.from_list()instead.- property categoricals: list[CategoricalFeature]
Returns all categorical features
- property datetimes: list[DatetimeFeature]
Returns all datetime features (also part of
quantitatives)
- classmethod from_list(features: Iterable[BaseFeature] | Features, config: FeaturesConfig | None = None) Features
Build a
Featuresfrom already-instantiated feature objects.- Parameters:
features (Iterable[BaseFeature] | Features) – Feature instances to wrap. Iterating an existing
Featuresis supported.config (FeaturesConfig, optional) – Collection-level config propagated to each feature, by default
None.
- property history: DataFrame
Combined history of all features (concatenated, with a
featurecolumn).
- property names: list[str]
Returns names of all features
- property ordinals: list[OrdinalFeature]
Returns all ordinal features
- property qualitatives: list[OrdinalFeature | CategoricalFeature]
Returns all qualitative features
- property quantitatives: list[QuantitativeFeature]
Returns all quantitative features (datetimes included)
- property summary: DataFrame
Summary of discretization process for all features
- to_json(light_mode: bool = False) dict
Serializes
Featuresfor JSON saving- Parameters:
light_mode (bool, optional) – Whether or not to serialize in light mode (without statistics and history), by default
False
- property versions: list[str]
Returns versions of all features
Note
Use the default constructor when you only have column names; use
Features.from_list() to wrap already-instantiated feature objects.
FeaturesConfig
Collection-level state propagated to every feature in a Features. Internal
feature attributes (nan, default, ordinal_encoding, has_nan,
has_default, dropna, is_fitted) are not part of the public
BaseFeature constructor — set them via FeaturesConfig and pass
the instance to Features or Features.from_list().
- class AutoCarver.features.FeaturesConfig(nan: str | None = None, default: str | None = None, ordinal_encoding: bool = False, is_fitted: bool = False, has_nan: bool = False, has_default: bool = False, dropna: bool = False)
Collection-level config applied to each feature in a
Features.Internal feature state (
nan/default/ordinal_encoding/…) is not part of the publicBaseFeatureconstructor — pass them via this dataclass toFeaturesorFeatures.from_listand they are propagated to each constituent feature.
Qualitatitve features
- class AutoCarver.features.CategoricalFeature(name: str, *, max_n_chars: int = 50)
Defines a categorical feature
- property has_default: bool
Whether the feature has default values.
- property history: DataFrame
Combination history as a DataFrame (empty when no history yet).
Stored internally as a list of dicts in
_historyfor JSON serialization; the DataFrame is rebuilt on access. Append entries withhistorize().
- property summary: list[dict]
Summary of feature’s discretization process.
- class AutoCarver.features.OrdinalFeature(name: str, values: list[str])
Defines an ordinal feature
- Parameters:
values (list[str]) – Ordered list of all unique values for the feature
- property has_default: bool
Whether the feature has default values.
- property history: DataFrame
Combination history as a DataFrame (empty when no history yet).
Stored internally as a list of dicts in
_historyfor JSON serialization; the DataFrame is rebuilt on access. Append entries withhistorize().
- property summary: list[dict]
Summary of feature’s discretization process.
Quantitative features
- class AutoCarver.features.QuantitativeFeature(name: str)
Defines a quantitative feature
- property has_default: bool
Whether the feature has default values.
- property history: DataFrame
Combination history as a DataFrame (empty when no history yet).
Stored internally as a list of dicts in
_historyfor JSON serialization; the DataFrame is rebuilt on access. Append entries withhistorize().
- property summary: list[dict]
Summary of feature’s discretization process.
Datetime features
A DatetimeFeature is a quantitative feature backed by a datetime column. It is
discretized as the number of seconds elapsed since a user-provided reference_date
(see DatetimeFeature.to_timedelta()), after which it behaves exactly like any other
quantitative feature (quantile bucketization, carving, …).
reference_date may be either a fixed date literal or the name of another
datetime column in X. The two are disambiguated at fit time: if reference_date
matches a column of the fitted X, the elapsed seconds are computed row-wise against
that column; otherwise it is parsed as a fixed date. A row whose reference column value is
missing (NaT) yields NaN.
Datetimes can be declared from the Features constructor as
(column name, reference_date) pairs:
from AutoCarver.features import Features
features = Features(
quantitatives=["age"],
datetimes=[
("signup_date", "2020-01-01"), # seconds since a fixed date
("churn_date", "signup_date"), # seconds since another column
],
)
They are tracked under Features.datetimes and are also part of
Features.quantitatives (so the quantitative pipeline processes them transparently).
The datetime-to-seconds conversion is performed by the Timedelta Discretizer.
- class AutoCarver.features.DatetimeFeature(name: str, reference_date: str)
Defines a datetime feature.
A datetime feature is processed as a
QuantitativeFeatureafter its values have been converted to a number of seconds elapsed sincereference_date(seeto_timedelta()). The conversion is applied by theTimedeltaDiscretizerbefore continuous discretization.reference_datemay be either a fixed date literal (e.g."2020-01-01") or the name of another datetime column inX. The two are disambiguated at fit time: ifreference_datematches a column of the fittedX, the conversion is computed row-wise against that column; otherwise it is parsed as a fixed date.- property history: DataFrame
Combination history as a DataFrame (empty when no history yet).
Stored internally as a list of dicts in
_historyfor JSON serialization; the DataFrame is rebuilt on access. Append entries withhistorize().
- property summary: list[dict]
Summary of feature’s discretization process.
- to_timedelta(series: Series, reference: Series | None = None) Series
Converts datetime values to a float number of seconds since the reference.
When
referenceisNonethe fixedreference_dateliteral is used; otherwisereferenceis a datetime Series subtracted row-wise (column reference). Non-datetime entries (numpy.nan, thenanplaceholder, unparseable values) are coerced tonumpy.nanso the result is a plain float Series.