Governed agents in the enterprise: data classification, human-in-the-loop, audit trail

Governed agents are the unglamorous but decisive part of today's conversation about AI in the enterprise. The pilot phase — when the team shows in a demo that the model "can answer from the knowledge base" — is slowly ending in most large organizations. What remains are systems that read client data every day, call tools, make decisions and leave traces — often with no owner, no classification and no register. In our view, the deployments that survive in production over the next eighteen months will be the ones where data classification, human oversight and the audit trail are part of the runtime, not a document glued on after a management incident.

This text isn't an implementation playbook or a guide to a specific product. It's an architecture note for the people who'll have to answer an auditor, a regulator or the board the question of who let the agent do what it did. We describe three pillars that, in our view, have to be in the reference architecture of every organization deploying agents in a regulated environment — and five anti-patterns that, in our practice, come back with interest.

Three pillars of governing agents

1. Data classification — which class of information the agent may read, write and pass on. A "good" picture is data classification that doesn't stop at the boundary of the mailbox and the SharePoint file but enters the model's context and every tool call. Every document, knowledge-base chunk and CRM record field has a sensitivity level (public, internal, confidential, personal data, regulated data), and at runtime the agent knows what it may wire into the prompt and what it may not. Classification propagates through the call chain — if tool A returns confidential data, tool B and the model know it and operate in the appropriate mode. The failure mode we avoid: regulated data lands in a public model endpoint because "RAG just happened to pull everything." The concrete artifact the team should own: a data-classification matrix for agents — a table of sources, levels and permitted paths, with a business owner and versioning.

2. Human-in-the-loop — where a human actually approves, and where they only watch. A "good" picture is a deliberate design of decision points: the agent autonomously executes what is reversible and cheap to get wrong, and stops before actions that are irreversible, regulated or high-value — and asks a specific role to approve, with full context. The failure mode we avoid: human-in-the-loop reduces to clicking "OK" in an inbox, because the approver has neither the time, the context nor the data to refuse sensibly — and in practice it becomes a rubber stamp. The metric that tells you whether the loop is real is qualitative but observable: the share of rejections and modifications in approval decisions. If that share has been zero for months, then in our view you don't have human-in-the-loop — you have oversight theater. The concrete artifact: an approval-points register — a list of the places in the flow where human acceptance is required, with the role, the response time and the decision history.

3. Audit trail — what gets recorded, for how long, and how it holds up before a regulator. A "good" picture is a log that, for each agent decision, keeps five elements: the prompt (with classification metadata), the model and its version, the tools called along with their parameters and results, the decision taken, and — if it was required — the identity of the approver and a timestamp. The retention policy is matched to the legal regime of the data (one for personal data, another for accounting documentation, another for credit decisions), and the log is immutable for the retention period. The failure mode we avoid: we log the prompt and the response, but not the context wired in from RAG nor the parameters of the tools called — and on audit day we can't reconstruct why the agent made that particular decision. The concrete artifact: an AI systems register in the spirit of the EU AI Act risk classification, with a reference to the audit sink and the per-system retention policy.

Reference architecture (a short overview)

In our view, a governed agent in the enterprise looks roughly like this end-to-end. A request comes in through a classification gateway that pins to the context the sensitivity level of the user's data and of the query itself. Then a policy engine decides what's allowed downstream: which models, which tools, which data sources are permitted for this class. The model call happens with a context built only from permitted sources, and the entire context wired into the prompt is logged together with the response. A tool gateway mediates between the model and the world — no tool is called directly; each passes through a layer of authorization, limits and logs. Where a decision is irreversible, regulated or high-value, an approval gateway halts execution and routes the matter to the right human role with full decision context. All of these steps — from classification to the result — land in an audit sink with a retention policy and immutability. This isn't a product description; it's the target architecture the architecture team should be able to draw on a single page before choosing vendors.

Five anti-patterns that come back with interest

The security policy exists in a document, not in the runtime. The rules are written down in an information-governance PDF, but at runtime nothing enforces them — the agent follows the policy only when someone remembered to code it in.
The agent has permissions like a human, but no one approves its decisions. The agent's service account inherits the permissions of an "experienced operator," except the human operator works under the team's supervision, while the agent runs at night with no one on the other side.
We log the prompt, we don't log the context and the tools. We have a record of the question and the answer, but we don't know which knowledge-base chunks were wired in, which tools the agent called and with what parameters — meaning we can't reconstruct the decision.
Human-in-the-loop = clicking "OK" without context. The approver sees a notification but doesn't see the data the agent based its recommendation on — and refusing becomes physically harder than accepting.
No owner for the AI systems register. No one in the organization maintains an up-to-date list of running agents, their risk class and business owner — on audit day the register gets created retroactively, and everyone knows it.

The next 90 days — what the CIO / CTO can do

Regardless of the maturity of the current AI portfolio, over the next quarter it's worth authorizing three moves that, in our view, don't require a new investment budget — they require a management decision. First — an AI systems register with a business owner, a risk class (in the spirit of the EU AI Act), classification of input data and a reference to the audit sink; if there's no register today, let a minimum version be created in thirty days, kept like a risk register. Second — a review of human-in-the-loop points in the running agent flows: where a human should approve, where they only watch, where the loop is theater — and what modifications to introduce in the next sprint. Third — an audit of the decision logs of one chosen system: are we able today to reconstruct a single decision from thirty days ago with full context; if not, that's the first gap to close. This isn't a full AI governance methodology — it's the minimum version you can present to the board at the next architecture review.

Bring the architecture

If you're designing governance over agents in the enterprise, or you're just trying to close the pilot phase without a management incident in the first months of production, bring the architecture. We work from something concrete — the three pillars above and an AI systems register are a good starting point for a conversation. Describe your case: mailto:[email protected]?subject=Rozmowa%20z%20Aurora%20AI.