AboutServicesProjectsContact
All tools

Evaluation

evalgate

Decide whether a metric change is a real regression or noise.

Install

pip install evalgate-cli

Once published to PyPI. Also available now from GitHub:

pip install git+https://github.com/jmweb-org/evalgate

What it does

An eval dropping from 90.0% to 89.4% on 1,000 examples looks like a regression, but at that sample size it is noise. evalgate runs the right statistical test and fails only when the candidate is significantly worse.

Features

  • Two-proportion test over aggregate accuracies.
  • McNemar's test for paired per-example results.
  • Verdict: improvement, unchanged, noise or regression.
  • CI gate with a configurable alpha.
View the code on GitHub

Other tools

hola@jmwebsoluciones.com