AI Glossary

Inference

inference, model inference

Inference is the phase in which a trained model produces a result for new input — for example, answering a question or classifying an image. It happens without changing the parameters, unlike training.

It is the stage of using the model, separate from training.
The model's parameters stay fixed; the model only computes a result.
The cost and latency of inference are often the deciding factor in a deployment.

A model's lifecycle splits into training, in which we tune the parameters, and inference, the actual use. During inference the model takes in input and computes a result, but it no longer learns anything.

In production, it is inference that drives ongoing costs, because every request to the model consumes compute. That is why, when deploying solutions built on an LLM, you plan for response time and the cost of a single call — not just the quality of the results themselves.

Related terms

In guides