Skip to content

Why tacit

Implicit assumptions

Every DataFrame operation makes assumptions about the data. When you write df.sepal_length / df.sepal_width, you're assuming both columns exist, that they're numeric, and probably that sepal_width is never zero. When you filter on df.species == "setosa", you're assuming species is a string column with that value in it.

These assumptions are usually invisible. They live in your head, in a Slack message, maybe in a wiki page that's three versions behind. When an assumption breaks — a column gets renamed upstream, a type changes from int to string, a join produces unexpected nulls — you find out three stages downstream when something produces garbage results. Or worse, you don't find out at all.

Schemas make assumptions explicit

A schema is a Python class that declares exactly what a DataFrame looks like — column names, types, and constraints:

from typing import Annotated
import tacit

class Iris(tacit.Schema):
    sepal_length: float
    sepal_width: Annotated[float, tacit.Check.gt(0)]
    petal_length: float
    petal_width: float
    species: str

This says: the DataFrame has these five columns, with these types, and sepal_width must be positive. Anyone — human or coding agent — can "go to definition" and understand the full contract without running anything.

At pipeline boundaries, Iris.parse(table) coerces types and validates every constraint. If the data doesn't match, you get a clear error where the bad data entered, not deep inside your pipeline logic. Between internal steps, Iris.cast(table) does a lightweight structural check — column names and types only, zero execution cost.

Once parsed, your code can safely assume the data is correct. No defensive checks scattered through your transformations. No silent failures.

Contracts enforce them at function boundaries

A contract ties schemas to the functions that transform data. The type signature is the contract — what goes in, what comes out:

@tacit.contract
def engineer_features(df: tacit.DataFrame[Iris]) -> tacit.DataFrame[IrisFeatures]:
    return df.mutate(
        sepal_ratio=df.sepal_length / df.sepal_width,
        petal_area=df.petal_length * df.petal_width,
    )

The @contract decorator enforces the schema on inputs and outputs at runtime. Type checkers verify it statically. "Find all references" on Iris shows every function that consumes that schema — rename a column and your type checker flags every site that needs updating, across teams, across repos.

Built on ibis and pandera, not replacing them

Tacit is not a new DataFrame library or a new validation framework. It builds on ibis for the DataFrame API and query execution, and pandera for data validation. Tacit provides a unified interface with type safety on top.

A tacit.DataFrame[S] is an ibis Table — you write transformations with the full ibis expression API, execute against any ibis backend, and get autocomplete on column names. Validation constraints are pandera Check objects — anything pandera can validate, tacit can validate. You can drop down to raw ibis or pandera at any point.

This also means tacit inherits some of their current limitations — for example, ibis's type stubs don't cover every dynamic API, so some type checker warnings may require annotations. We document workarounds as we find them.

How is tacit different from...

Raw pandera — pandera is excellent at runtime validation, and tacit builds on it — all constraint checking is pandera under the hood. But pandera alone validates and hands you back an untyped DataFrame. Your editor doesn't know what columns exist after validation, your type checker can't verify that one pipeline stage feeds correctly into the next, and there's no static safety between validation points. Tacit is the missing piece: it extends pandera with DataFrame[S] so the schema stays visible to your editor and type checker throughout the entire pipeline, not just at the validation boundary.

Great Expectations — a test-suite approach: you write expectations separately from your code and run them as a validation step. Tacit integrates validation into the code itself — the schema is the source of truth, not a parallel test suite that can drift.

dbt tests — SQL-only, post-hoc. You write tests in YAML or SQL that run after transformations. Tacit validates at the Python layer, at the boundary where data enters your pipeline, with the same schema that gives you type safety and editor support.

Design principles

  • Strict by default. Unexpected columns are an error, not silently ignored. You can opt into loose mode when you need it.
  • Parsing is the gateway. Schema.parse() coerces types and validates in one call. No separate coercion step.
  • Library, not framework. Use tacit with Dagster, Airflow, a script, a notebook — anything that runs Python and ibis. It provides tools for writing type-safe transformations, not an execution environment.
  • Ibis-native. Transformations use ibis's expression API directly. Tacit handles contracts, ibis handles the DataFrame API and query execution.

Who is this for

Tacit is for data engineers and ML engineers building DataFrame pipelines in Python who want their data assumptions to be explicit, checkable, and enforced by the type system.

It's a good fit if you:

  • Use ibis (or want to) for backend-portable DataFrame operations
  • Want Pydantic-style schemas for your pipeline data
  • Care about type safety and editor support in data code
  • Work in teams where schema changes need to be traceable

It's probably not for you if:

  • You don't use DataFrames (tacit is specifically for tabular data pipelines)
  • Your DataFrame library isn't supported by ibis — tacit is built on ibis and requires an ibis backend