Aurora AITell us your case

Offering

ServicesProductsCase studies

For whom

Private EquityEnterpriseSMB
ServicesProductsCase studiesAboutBlogContact

Knowledge base

Start hereWikiGlossaryGuides

AI Glossary

Model evaluation

model assessment, evaluation, AI evaluation

Model evaluation is the systematic measurement of answer quality on a fixed set of cases and metrics. It lets you compare versions and catch regressions instead of judging by gut feel.

In model evaluation you build a fixed set of test cases and metrics against which you check every change to a prompt, model, or configuration. That turns "gut feel" assessment — where a single successful example proves nothing — into a repeatable measurement of the whole solution's quality.

In practice, automatic metrics are combined with human judgment, because some qualities (accuracy, tone, the risk of hallucination) are hard to capture in a number. Run this way, evaluation shows whether fine-tuning or a new version actually improved the result, or merely shifted the errors somewhere else.

Related terms

In guides

Related articles