Blog·01/05/2026

Why the decision threshold matters more than the model

Moving the decision threshold from 0.5 to the right value for your business cost can improve the operational result more than switching from logistic regression to XGBoost. Why nobody explains it that way.

When a classification model predicts, it does not directly predict 'approve' or 'reject': it predicts a probability. The decision comes after, when you apply a threshold: if the score is above X, predict positive. By default, X is 0.5. And that default is almost never the right choice.

The cost of being wrong is not symmetric. In most real problems, one type of error costs more than the other. In credit risk, a false negative (extending credit to someone who will default) can cost five times more than a false positive (rejecting someone who would have paid). In fraud detection, the ratios can be 20 to 1. In medical diagnosis, it depends on the cost of unnecessary treatment versus the cost of missing the disease.

How to optimise the threshold for real cost. If you define the cost of a false negative as C_FN and a false positive as C_FP, the optimal threshold that minimises expected cost is C_FP / (C_FP + C_FN). With C_FN=5 and C_FP=1, the optimal threshold is 0.17 instead of 0.5. That means accepting more false positives to reduce the more costly false negatives. The ROC curve lets you see the tradeoff at every possible threshold; the optimal threshold is the one that minimises total expected cost given the distribution of your test set.

A concrete example. Credit default dataset: 22% of the portfolio defaults. Cost of lost capital: 5 points. Cost of rejecting a good customer: 1 point. With threshold 0.5, the model has 83% accuracy but lets through 45% of defaults. With threshold 0.17, accuracy drops to 78% but only 11% of defaults get through and the total portfolio cost drops 23%. Accuracy is worse; the business does much better.

Why the threshold can matter more than the model. Switching from logistic regression to XGBoost with threshold 0.5 might gain 2-3 points of ROC-AUC. Keeping logistic regression but optimising the threshold to the correct value can produce a larger operational improvement because you are directly reducing the errors that cost the most money. This does not mean the model does not matter — it matters for good overall discrimination — but the threshold translates that discrimination into decisions with real value.

What you need to do this right. Probabilities need to be calibrated. If the model says 0.3 but the real risk is 0.7, the threshold calculated from costs will be wrong. That is why calibration (Platt scaling, isotonic) and verification with the reliability curve come before threshold optimisation.

Work with JMWEB

Let's build something that reaches production.

It all starts with a conversation. Bring a dataset, a goal or a model that is stuck; I will take care of the rest.

Start a project

Keep reading:

15/05/2026

When is it worth using an LLM — and when is it not?

Read article

08/05/2026

How to evaluate whether an ML model actually works

Read article