AboutServicesProjectsContact

ML Models · Rigorous evaluation · Madrid

Machine learning models in Madrid

I train and evaluate machine learning models with rigour: a leakage-free pipeline, cross-validation, calibrated probabilities and a decision threshold set by business cost. A model that passes an honest validation test is one you can trust in production.

Machine learning models in Madrid

Leakage-free validation

Preprocessing (scaling, encoding, imputation) lives inside the pipeline, so it is fit only on training data in each cross-validation fold. The metric I get in validation is what I can expect in production, without inflation.

Calibrated probabilities

If the model is going to make decisions based on probability, that probability has to be reliable. I calibrate the model (Platt scaling or isotonic) and check it with the Brier score and the reliability curve. A 0.7 has to really mean 70%.

The threshold matters as much as the model

The decision threshold is not 0.5 by default. I choose it by minimising the real business cost: how much a false negative costs versus a false positive. That adjustment usually changes the operational result more than switching algorithms.

Per-segment error analysis

Global metrics hide problems. I analyse error by relevant subgroups (tenure, category, region) to understand where the model fails and why, and adjust if needed.

Have data and want a model that actually works?

Write to me with the problem: what you want to predict, what data you have, what decisions you will make with the result. Free first call.

Start a project

FAQ

  • What types of ML models do you build?

    Mainly tabular data models: classification (default, churn, fraud, conversion) and regression (demand, price, time). Also NLP models with transformers for text classification, information extraction and semantic search. For LLMs, integration and fine-tuning.

  • What does it mean for a model to be well calibrated?

    It means that when the model says '70% probability of default', default actually occurs in around 70% of those cases. Without calibration, scores are a ranking but not real probabilities, which makes it hard to make business decisions with them.

  • Why not always use a 0.5 threshold?

    Because the cost of being wrong is not symmetric. In credit risk, a false negative (giving credit to someone who defaults) typically costs 5 times more than a false positive (rejecting someone who would have paid). The optimal threshold reflects that asymmetry, not a convention.

  • How much does it cost to train an ML model?

    It depends on the problem type and data complexity. A tabular model with a complete pipeline and rigorous evaluation starts around €1,500. More complex problems (heterogeneous data, explainability requirements, integration with existing systems) are quoted case by case.

  • Can you work with data we already have?

    Yes, and it is the most common situation. You provide the data (database export, CSV files, connection to a data warehouse) and I design the preprocessing pipeline, model and evaluation. The first step is reviewing data quality and defining the target variable.

hola@jmwebsoluciones.com