Skip to content

Schemas

A schema is a Python class that declares the shape of a DataFrame — column names and their types. It's the single source of truth that your editor, type checker, and runtime validation all read from.

import tacit

class Iris(tacit.Schema):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float
    species: str

Field types

Schema fields use Python types, which tacit maps to ibis types:

Python type ibis type
float float64
int int64
str string
bool boolean

These are the types that parse() coerces to and cast() checks against. If your CSV has string columns where you declared float, parse() will coerce them automatically. cast() will reject the mismatch.

Inheritance

Schemas compose via inheritance. Each pipeline stage typically adds columns to the previous stage's schema:

class Iris(tacit.Schema):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float
    species: str


class IrisFeatures(Iris):
    sepal_ratio: float
    petal_ratio: float
    petal_area: float


class IrisPrediction(IrisFeatures):
    predicted_species: str

IrisFeatures has eight columns (five from Iris, three new). IrisPrediction has nine. No duplication — if you rename a column in Iris, it propagates to every schema that inherits from it.

This mirrors how pipelines actually work: each stage takes a DataFrame, transforms it, and produces a wider (or differently shaped) DataFrame. Schema inheritance expresses that directly.

Strict mode

By default, tacit is strict — both parse() and cast() reject DataFrames with extra columns that aren't declared in the schema:

import ibis

table = ibis.memtable({
    "sepal_length": [5.1],
    "sepal_width": [3.5],
    "petal_length": [1.4],
    "petal_width": [0.2],
    "species": ["setosa"],
    "row_id": [1],  # not in the schema
})

Iris.cast(table)
# ValueError: Extra columns: ['row_id']

This is intentional. Silent extra columns are a common source of bugs in data pipelines — a join produces more columns than expected, or an upstream change adds a column that shadows a downstream computation. Strict mode catches this immediately.

Adding constraints

Schemas can also declare constraints on individual columns using Annotated and tacit.Check:

from typing import Annotated

class Iris(tacit.Schema):
    sepal_length: float
    sepal_width: Annotated[float, tacit.Check.gt(0)]
    species: str

Constraints are validated by parse() and ignored by cast(). See Constraints for the full details.