AI Glossary
Training data
learning data, training data, training set
Training data is the set of examples a model learns patterns from during training. Its quality, quantity and representativeness directly determine how accurately the model performs on new data.
- It's the material a model draws its patterns from during training.
- Errors and gaps in the data carry over into the model's behavior.
- Representative data limits the risk of biased results.
Training data is the foundation of every machine learning model. A model knows nothing of the world beyond what it finds in these examples, so gaps, errors or the overrepresentation of one group carry straight over into its later decisions.
In project practice, much of the work goes into preparing and cleaning the data rather than the training itself. The same data is used in fine-tuning, when we adapt a ready-made model to a narrower task, and its effect on the model's quality is checked later in model evaluation.
Related terms
In guides