AI Glossary
Model evaluation
model assessment, evaluation, AI evaluation
Model evaluation is the systematic measurement of answer quality on a fixed set of cases and metrics. It lets you compare versions and catch regressions instead of judging by gut feel.
- It rests on a fixed set of test cases and clear metrics.
- It lets you compare prompt or model versions and catch regressions.
- It combines automatic metrics with human judgment where accuracy really matters.
In model evaluation you build a fixed set of test cases and metrics against which you check every change to a prompt, model, or configuration. That turns "gut feel" assessment — where a single successful example proves nothing — into a repeatable measurement of the whole solution's quality.
In practice, automatic metrics are combined with human judgment, because some qualities (accuracy, tone, the risk of hallucination) are hard to capture in a number. Run this way, evaluation shows whether fine-tuning or a new version actually improved the result, or merely shifted the errors somewhere else.
Related terms
In guides
Related articles