What is Small language model (SLM)?

AI Glossary

Small language model (SLM)

Small Language Model, SLM, small LLM

A small language model (SLM) is a compact language model that can run on modest hardware or locally. In exchange for fewer parameters, it offers lower cost and greater control than large models.

Has far fewer parameters than frontier LLMs, at the cost of breadth of general knowledge.
Can run locally or on-premise, on a laptop or a single graphics card.
Excels at narrow, repeatable tasks, especially after fine-tuning on a company's own data.

A small language model (Small Language Model, SLM) is a compact variant of a large language model, designed so that it can run on modest hardware — a single graphics card, a company server, sometimes even a laptop. The line between an SLM and an LLM is a matter of convention and shifts over time, but the idea is models with one or two orders of magnitude fewer parameters than the largest systems.

Unlike a large model, which prioritizes broad general knowledge and the ability to handle arbitrary tasks, a small model gives up some of that versatility in exchange for low cost, speed and the ability to operate without an external API. Its effectiveness is boosted further by fine-tuning on a specific company's data and by quantization, which reduces hardware requirements even more.

From a deployment perspective, an SLM is the natural choice wherever data privacy, predictable inference cost and independence from a vendor matter. It works well for narrow, repeatable tasks — classifying documents, extracting data, handling routine queries — where broad general knowledge is not needed and the priorities are control and cost across a high volume of calls.

Related terms