The EU AI Act entered application in stages across 2024 and 2025. By the time most European teams had read the guidance documents, the prohibited-practices provisions were already in force; high-risk system requirements followed. The compliance conversation across most organisations has followed a predictable pattern: legal flags it, operations panics, IT produces a spreadsheet, someone drafts a policy. A document is created. The system is unmodified.
For conventional software, this is often enough. For agent systems, it is almost never enough, and the gap between “we have a compliance document” and “we are actually compliant” will surface at exactly the wrong moment.
The obligations that matter for agent systems under the Act are not documentation obligations. They are behaviour obligations. The Act requires that high-risk systems maintain meaningful human oversight, generate audit trails sufficient to reconstruct any decision, allow humans to intervene and override in real time, and operate in a way that is transparent to the people affected. These requirements are either true of your architecture or they are not. A document cannot make them true.
What “high-risk” actually means for agent systems
The Act stratifies AI systems into risk tiers. Unacceptable-risk systems are prohibited outright: social scoring, subliminal manipulation, real-time biometric surveillance in most contexts. High-risk systems face the substantive obligations: technical documentation, risk management, human oversight, data governance, accuracy and robustness requirements.
The high-risk list is specific. It covers AI used in employment (recruitment, performance assessment, task allocation), access to essential services (credit, insurance, health, education), critical infrastructure management, law enforcement, migration processing, and administration of justice. It also covers AI components in safety systems, biometric identification, and certain autonomous vehicle contexts.
Teams often assume their systems are not high-risk because they do not appear on that list at first glance. The assumption deserves scrutiny. An agent system that triages customer support escalations at a financial services firm is probably touching credit and essential services. An agent that assists HR teams in reviewing applications is in the employment bucket. An agent that routes insurance claims is in essential services. The classification is about what the system does and who it affects, not what the vendor calls it.
Teams building general-purpose agent orchestration layers that clients then deploy in regulated contexts face a further nuance: the downstream operator often carries the compliance obligation even when the upstream builder does not. If you are building an orchestration layer that clients will deploy in regulated environments, your clients’ obligations are your architectural problem.
The three requirements that are architecture problems
Of the full obligations under the Act, three are almost entirely architectural for agent systems. These are the ones that cannot be satisfied after the fact.
Human oversight. The Act requires that high-risk systems allow humans to “fully understand” the system’s behaviour, “override or reverse” its outputs, and “monitor” it during operation. These are not UI requirements; they are orchestration requirements. A system that can be monitored, overridden, and reversed is one that was designed to expose its state, accept interrupt signals, and roll back actions, or flag them for human confirmation before they execute. A system that fires actions autonomously with no operator surface for monitoring and intervention is non-compliant, regardless of what the policy document says.
This is not a novel requirement from our perspective. It is exactly what we build: observable orchestration layers with human-in-the-loop tiers for any action above a defined risk threshold. What the Act does is make these patterns legally mandatory for a defined scope of systems, rather than just operationally prudent for all of them.
Audit trails. The Act requires that high-risk systems automatically generate logs sufficient to allow the system’s operation to be assessed throughout its lifecycle. Automatic here means the system generates the logs itself, in real time, not that a human reconstructs what happened afterward from screenshots and memory. The log must cover inputs, outputs, decisions, and the conditions that produced them. For an agent system, this means every routing decision, every action taken, every tool call made, every handoff between agents, every escalation to a human, and every policy decision that shaped the above.
A system that was not architected to generate this trail cannot produce it on demand. A system that was can produce it for any incident, any audit, any regulatory inspection, without anyone working backward from memory.
Transparency to those affected. The Act requires that people interacting with a high-risk system be meaningfully informed they are interacting with one. For agent systems handling customer-facing work, this means the system must be identifiable as AI, must be able to explain the nature of its operation in accessible terms, and must not obscure the fact that decisions affecting people are being produced by an automated system. Operationally: every agent has a disclosed identity and function, every customer-facing action carries a disclosure, and no agent masquerades as a human.
Why retrofitting does not work
The compliance failure mode we see in practice is not organisations that know they are non-compliant. It is organisations that believe their documentation closes the gap. The belief is understandable. The Act is legislation, and legislation is usually addressed with documents. The problem is that the specific requirements above are not satisfied by agreeing to comply with them; they are satisfied by having built the mechanisms that make compliance possible.
A system built without observable state cannot be retrofitted with monitoring without changes to the core architecture. A system that fires actions without checkpoint gates cannot be retrofitted with human-in-the-loop without changes to the execution model. A system with no log schema cannot be retrofitted with audit trails without rebuilding the state layer. In each case, the document that says “we will do this” is contradicted by the system that does not.
We have been in enough post-hoc compliance exercises to have a view: they are expensive, they are slow, they produce systems that meet the letter of the requirement while routinely violating its spirit, and they consume engineering time that could have been spent on the architectural decisions that would have made the whole exercise unnecessary.
The economic case for building compliant architecture from the start is straightforward. The cost of adding human-in-the-loop checkpoints to an orchestration layer designed for them is near zero: it is a configuration decision. The cost of retrofitting them into a system that was not is weeks or months of re-architecture. For teams building in regulated sectors, the calculus is simple. The compliance requirements are not going away, and the gap between “designed for compliance” and “retrofitted for compliance” is measured in engineering weeks and operational risk.
What this means for how you start
There are two practical implications for teams building agent systems in regulated environments.
The first is to classify your system honestly before you build it, not after. Run the high-risk checklist against your intended use case, your target users, and the decisions the system will make or influence. If any part of the system is likely to be high-risk, treat the full set of requirements as design constraints from day one. The Diagnose tier is the right entry point for this: it surfaces the regulatory exposure alongside the operational opportunity, so the recommended first build already incorporates the constraints rather than discovering them at launch.
The second is to treat the three architectural requirements above as defaults, not special cases. Every agent system we build has observable state, human-in-the-loop tiers for actions above a defined risk level, and a full audit trail as standard. For systems in clearly non-regulated contexts, these features add modest overhead. For systems in or near regulated contexts, they are what allows the system to operate at all. The cost of providing them by default is low; the cost of discovering you need them after launch is not.
The EU AI Act will not be the last regulatory touchpoint for agent systems operating in Europe. It will be the first. The organisations that treat it as an architecture problem, and build accordingly, will have a foundation for everything that follows. The organisations that treat it as a documentation problem will be rebuilding that foundation under pressure, probably at the worst possible moment.