AI Glossary

Guardrails

safety guardrails, model safeguards

Guardrails are rules and filters that constrain what a model may accept as input and return as output. They block disallowed content, enforce the answer's format, and keep behavior within set limits.

They act on the input (filtering requests) and on the output (controlling answers).
They enforce the scope of topics, the format, and the rules the model must stay within.
They do not replace human oversight; they reduce the number of cases that reach it.

Guardrails are a layer of rules wrapped around a model. They check what the user sends in and what the model sends back: they can reject an out-of-scope request, block sensitive data, enforce a set structure for the answer, or halt an action that goes beyond the permitted operations.

In practice, guardrails are combined with other mechanisms. They filter routine traffic and contain a known class of errors, but they do not understand context the way a person does. That is why, for higher-risk decisions, they are paired with a human in the loop and with broader oversight of the system.

Related terms

In guides