Iris Pipeline¶
An end-to-end pipeline that loads iris data from a CSV, engineers features,
and predicts species. Demonstrates schema inheritance, parse() at boundaries,
and @contract for typed transformations.
The full source is at
examples/iris_pipeline.py.
Schemas¶
The pipeline has three stages, each with its own schema. Inheritance adds columns at each stage — no duplication:
import ibis
import tacit
class Iris(tacit.Schema):
sepal_length: float
sepal_width: float
petal_length: float
petal_width: float
species: str
class IrisFeatures(Iris):
sepal_ratio: float
petal_ratio: float
petal_area: float
class IrisPrediction(IrisFeatures):
predicted_species: str
Iris has 5 columns. IrisFeatures adds 3 (8 total). IrisPrediction adds 1
(9 total). The chain mirrors the pipeline — each stage enriches the previous
one's output.
Transformations¶
Two functions transform data between stages. @contract reads the type
annotations and calls cast() on inputs and outputs automatically:
@tacit.contract
def engineer_features(df: tacit.DataFrame[Iris]) -> tacit.DataFrame[IrisFeatures]:
return df.mutate(
sepal_ratio=df.sepal_length / df.sepal_width,
petal_ratio=df.petal_length / df.petal_width,
petal_area=df.petal_length * df.petal_width,
)
@tacit.contract
def predict(df: tacit.DataFrame[IrisFeatures]) -> tacit.DataFrame[IrisPrediction]:
return df.mutate(
predicted_species=ibis.cases(
(df.petal_length < 2.5, "setosa"),
(df.petal_length < 4.8, "versicolor"),
else_="virginica",
)
)
Each function body is pure ibis — .mutate(), .cases(). The decorator
handles the schema verification so the function just does the transformation.
Pipeline¶
The pipeline() function ties the stages together. parse() validates data
at the boundaries:
def pipeline(path: str) -> tacit.DataFrame[IrisPrediction]:
con = ibis.duckdb.connect()
raw = con.read_csv(path)
iris = Iris.parse(raw)
features = engineer_features(iris)
predictions = predict(features)
return IrisPrediction.parse(predictions)
Iris.parse(raw) at the top coerces CSV string columns to the declared types
and validates the schema. IrisPrediction.parse(predictions) at the end is a
final check that the full pipeline produced the expected output.
Between those boundaries, @contract on each transformation does lightweight
structural checks via cast().
Running it¶
The pipeline runs against DuckDB via ibis — read CSV, coerce types, compute features, predict species, validate output. Schema safety at every stage.