AI Glossary

Model bias

bias, model bias, AI bias, AI partiality

Model bias is a systematic skew in an AI's outputs, inherited from the training data or design assumptions. It leads to unfair or wrong outcomes for some groups or cases.

It's a repeatable, not random, skew in outputs to the detriment of specific groups or situations.
The most common source is imbalance and historical patterns in the training data.
It's detected by testing on separated groups, not by a single overall-accuracy score.

Model bias is a systematic, repeatable error in an AI system's outputs that works against particular groups, attributes, or situations. It isn't a single mistake but a persistent pattern: the model errs more often, or treats one category of cases worse, than another. The most common source is the makeup of the training data — if it reflects historical inequalities or leaves out part of the population, the model adopts those patterns as a rule.

Bias differs from hallucination in that it isn't random invention but an ordered skew that holds to a chosen direction. That's why an overall-accuracy score alone won't detect it — a model can have high average effectiveness and still fail for a narrow group of users.

In deployment practice, bias is probed through model quality evaluation run separately for each split group, and the results are documented as part of an AI audit. For a company this has a legal and reputational dimension: biased decisions in recruitment, scoring, or customer service create real risk, which is why bias is measured before deployment and monitored after it, rather than assessed once.

Related terms