ML Models · Rigorous evaluation · Madrid
Machine learning models in Madrid
I train and evaluate machine learning models with rigour: a leakage-free pipeline, cross-validation, calibrated probabilities and a decision threshold set by business cost. A model that passes an honest validation test is one you can trust in production.

Leakage-free validation
Preprocessing (scaling, encoding, imputation) lives inside the pipeline, so it is fit only on training data in each cross-validation fold. The metric I get in validation is what I can expect in production, without inflation.
Calibrated probabilities
If the model is going to make decisions based on probability, that probability has to be reliable. I calibrate the model (Platt scaling or isotonic) and check it with the Brier score and the reliability curve. A 0.7 has to really mean 70%.
The threshold matters as much as the model
The decision threshold is not 0.5 by default. I choose it by minimising the real business cost: how much a false negative costs versus a false positive. That adjustment usually changes the operational result more than switching algorithms.
Per-segment error analysis
Global metrics hide problems. I analyse error by relevant subgroups (tenure, category, region) to understand where the model fails and why, and adjust if needed.
Have data and want a model that actually works?
Write to me with the problem: what you want to predict, what data you have, what decisions you will make with the result. Free first call.
Start a projectFAQ
What types of ML models do you build?
Mainly tabular data models: classification (default, churn, fraud, conversion) and regression (demand, price, time). Also NLP models with transformers for text classification, information extraction and semantic search. For LLMs, integration and fine-tuning.
What does it mean for a model to be well calibrated?
It means that when the model says '70% probability of default', default actually occurs in around 70% of those cases. Without calibration, scores are a ranking but not real probabilities, which makes it hard to make business decisions with them.
Why not always use a 0.5 threshold?
Because the cost of being wrong is not symmetric. In credit risk, a false negative (giving credit to someone who defaults) typically costs 5 times more than a false positive (rejecting someone who would have paid). The optimal threshold reflects that asymmetry, not a convention.
How much does it cost to train an ML model?
It depends on the problem type and data complexity. A tabular model with a complete pipeline and rigorous evaluation starts around €1,500. More complex problems (heterogeneous data, explainability requirements, integration with existing systems) are quoted case by case.
Can you work with data we already have?
Yes, and it is the most common situation. You provide the data (database export, CSV files, connection to a data warehouse) and I design the preprocessing pipeline, model and evaluation. The first step is reviewing data quality and defining the target variable.