AboutServicesProjectsContact
All tools

Data

dsdiff

A git-style diff between two datasets, with distribution drift.

Install

pip install dsdiff

Once published to PyPI. Also available now from GitHub:

pip install git+https://github.com/jmweb-org/dsdiff

What it does

When a dataset is regenerated, columns quietly get renamed, retyped, gain nulls or shift, and the pipeline keeps running while the model degrades. dsdiff compares two files and reports what changed, ranked by severity.

Features

  • Schema changes: added, removed or retyped columns.
  • Per-column distribution drift with PSI.
  • Null-rate and cardinality jumps.
  • CI gate and JSON output; reads CSV, Parquet and JSONL.
View the code on GitHub

Other tools

hola@jmwebsoluciones.com