Skip to content

Manifest Files

The manifests files are the soul of your tidylake project. They serve a dual purpose: it is the documentation for your business assets and the technical data contract enforced in your script file.

By centralizing this information, you enable several automated workflows:

Manifest File Schema

At a minimum, a manifest must include the data product's name and the path to its corresponding script file.

data_product:
  name: 'bronze_customers' # (1)!
  script: 'bronze_customers' # (2)!
  1. Name: The unique identifier used to reference this product across the project.
  2. Script: The Python file containing the transformation logic.

Defining the Schema

While the schema section is technically optional, we strongly recommend adding it to every data product. Even if you don't use the automated compute plugins, maintaining a schema manifest is a "gold standard" practice for documentation. It ensures that any team member—or any downstream tool—understands exactly what the data contains without having to read through lines of transformation code.

Currently, tidylake uses a superset of jsonschema to define these structures.

A Note on JSON Schema

We are actively evaluating alternative formats to JSON Schema. While powerful, it can sometimes be challenging to map perfectly across every compute engine (like Spark vs. Snowflake) and storage format.

data_product:
  name: 'bronze_customers'
  script: 'bronze_customers'
  schema:
    type: 'object'
    properties:
      customer__id:
        type: 'string'
      customer__name:
        type: 'string'
      customer__active:
        type: 'boolean'
      customer__city:
        type: 'string'