Aurora AITell us your case

Offering

ServicesProductsCase studies

For whom

Private EquityEnterpriseSMB
ServicesProductsCase studiesAboutBlogContact

Knowledge base

Start hereWikiGlossaryGuides

AI Glossary

AI red teaming

red teaming, adversarial AI testing, offensive AI testing

AI red teaming is deliberately adversarial testing of a system, meant to find its weak points, safeguard bypasses and harmful outputs before it reaches users.

AI red teaming is a testing method in which a team deliberately acts adversarially toward a system in order to provoke undesirable behavior. Instead of checking whether the model performs well on typical tasks, red teaming probes the edges: attempts to bypass its rules, susceptibility to prompt injection, data leakage, and the generation of harmful content. The name comes from security practice, where the "red team" plays the role of the attacker.

The difference from standard model quality evaluation is significant: evaluation measures effectiveness on planned cases, while red teaming checks how the system behaves under pressure and against a user acting in bad faith. One answers the question "does it work well," the other "how can it be broken."

In a company deployment, red teaming precedes making the system available and is repeated after major changes. Its findings feed directly into the design of guardrails — every gap found points to where additional protection is needed. It is often combined with automated tests and human work, because some vulnerabilities only surface under a creative, non-obvious attack.

Related terms