Aurora AITell us your case

Offering

ServicesProductsCase studies

For whom

Private EquityEnterpriseSMB
ServicesProductsCase studiesAboutBlogContact

Knowledge base

Start hereWikiGlossaryGuides

AI Glossary

Data labeling

data labeling, data annotation, labeling, annotation

Data labeling is the practice of attaching labels or annotations to raw data to describe the correct answer, so the data can train or evaluate a model. It is the basis of supervised learning and reliable evaluation.

Data labeling is the process of annotating raw examples with information about the correct outcome for each one. That might mean assigning a category to a piece of text, marking the sentiment of a review, outlining an object in an image, or recording the reference answer to a question. A dataset annotated this way becomes training data for supervised learning — the model learns to map an input to the label assigned by a human or another trusted process.

The difference from training data itself matters: training data is the whole body of material a model learns from, whereas labeling is the specific act of adding the correct answers to it. Labels are also used beyond training — in model evaluation, where the model's answers are compared against a previously labeled reference set, and in fine-tuning, where a ready-made model is adapted on a smaller, carefully labeled set for a specific task.

Labeling can be expensive and labor-intensive, because it usually takes people and clear instructions, and inconsistent or wrong labels carry straight through into model errors. That is why organizations watch for agreement between annotators and run quality control. Synthetic data, generated automatically, can be a partial supplement, but where fidelity to reality matters, manual or human-verified labeling remains the point of reference.

Related terms