Question 1

What types of ML models do you build?

Accepted Answer

Mainly tabular data models: classification (default, churn, fraud, conversion) and regression (demand, price, time). Also NLP models with transformers for text classification, information extraction and semantic search. For LLMs, integration and fine-tuning.

Question 2

What does it mean for a model to be well calibrated?

Accepted Answer

It means that when the model says '70% probability of default', default actually occurs in around 70% of those cases. Without calibration, scores are a ranking but not real probabilities, which makes it hard to make business decisions with them.

Question 3

Why not always use a 0.5 threshold?

Accepted Answer

Because the cost of being wrong is not symmetric. In credit risk, a false negative (giving credit to someone who defaults) typically costs 5 times more than a false positive (rejecting someone who would have paid). The optimal threshold reflects that asymmetry, not a convention.

Question 4

How much does it cost to train an ML model?

Accepted Answer

It depends on the problem type and data complexity. A tabular model with a complete pipeline and rigorous evaluation starts around €1,500. More complex problems (heterogeneous data, explainability requirements, integration with existing systems) are quoted case by case.

Question 5

Can you work with data we already have?

Accepted Answer

Yes, and it is the most common situation. You provide the data (database export, CSV files, connection to a data warehouse) and I design the preprocessing pipeline, model and evaluation. The first step is reviewing data quality and defining the target variable.

Machine learning models in Madrid

Leakage-free validation

Calibrated probabilities

The threshold matters as much as the model

Per-segment error analysis

Have data and want a model that actually works?

FAQ

What types of ML models do you build?

What does it mean for a model to be well calibrated?

Why not always use a 0.5 threshold?

How much does it cost to train an ML model?

Can you work with data we already have?