AI Glossary
Guardrails
safety guardrails, model safeguards
Guardrails are rules and filters that constrain what a model may accept as input and return as output. They block disallowed content, enforce the answer's format, and keep behavior within set limits.
- They act on the input (filtering requests) and on the output (controlling answers).
- They enforce the scope of topics, the format, and the rules the model must stay within.
- They do not replace human oversight; they reduce the number of cases that reach it.
Guardrails are a layer of rules wrapped around a model. They check what the user sends in and what the model sends back: they can reject an out-of-scope request, block sensitive data, enforce a set structure for the answer, or halt an action that goes beyond the permitted operations.
In practice, guardrails are combined with other mechanisms. They filter routine traffic and contain a known class of errors, but they do not understand context the way a person does. That is why, for higher-risk decisions, they are paired with a human in the loop and with broader oversight of the system.
Related terms
In guides
Related articles