Blog·21/04/2026

MLOps: the difference between a model in a notebook and one in production

A notebook is an exploration environment, not a system. What it takes for a model to work in production, maintain itself and not degrade silently.

The phrase I repeat most is that a model that only works in a notebook is not a production model, it is an experiment. The distance between the two is not technical — it is engineering. And that engineering has a name: MLOps.

The pipeline must be the same for training and serving. The most common production mistake: the preprocessing done in the notebook (normalisation, categorical encoding, missing value imputation) is re-implemented by hand in the inference API. The two versions will drift apart. The solution is for preprocessing to live inside the sklearn pipeline, so that the same serialised object that trains is the one that serves. There are no two versions of the same code.

Version registry: knowing what is in production. Every trained model should be registered with its validation metrics, the hyperparameters used and the identifier of the data it was trained on. If tomorrow the model gives strange results, it must be possible to know exactly what model is serving and what data it was trained on. A version registry does not have to be a complex system: a directory with JSON metadata files and a pointer to the active version is enough to start.

Drift monitoring: the model degrades even if nobody touches it. Production data changes over time. A marketing campaign changes the profile of incoming customers. A seasonal change affects features. A new regulation changes user behaviour. If nobody is watching the distribution of input data, the model can be giving very bad predictions for weeks without anyone knowing. PSI (Population Stability Index) on key features, calculated against the training reference profile, detects this before the business does.

The inference API as a contract. The API that serves the model must validate input with a schema (Pydantic, for example), reject malformed requests before they reach the model, version the response with the identifier of the model that responded and expose metrics to an observability system. Without that, debugging a production failure is guessing blind.

When to set up full infrastructure and when not to. A first MVP can be a training script that saves the model to S3 and a FastAPI on a small instance. You do not need Kubernetes, MLflow and Grafana from day one. Infrastructure grows when the problem justifies it: when prediction volume is high, when several models run in parallel, when retraining is frequent. Start simple and add complexity when the pain is real.

Work with JMWEB

Let's build something that reaches production.

It all starts with a conversation. Bring a dataset, a goal or a model that is stuck; I will take care of the rest.

Start a project

Keep reading:

15/05/2026

When is it worth using an LLM — and when is it not?

Read article

08/05/2026

How to evaluate whether an ML model actually works

Read article