AboutServicesProjectsContact
All tools

Data

pii-sweep

Scan datasets for PII before sharing them.

Install

pip install pii-sweep

Once published to PyPI. Also available now from GitHub:

pip install git+https://github.com/jmweb-org/pii-sweep

What it does

Before a dataset is shared, copied to a notebook or pushed to a bucket, it is worth knowing whether a column holds emails, card numbers or IDs. pii-sweep samples each column and reports which look like PII and how strongly.

Features

  • Checksum detectors: cards (Luhn), IBAN (mod-97), SSN.
  • Email, phone and IPv4, grouped by severity.
  • Confidence per column and a configurable threshold.
  • CI gate; reads CSV, Parquet and JSONL.
View the code on GitHub

Other tools

hola@jmwebsoluciones.com