7 EU AI Act Architecture Traps in Internal AI Workflows · Field notes

Someone at your company got the AI mandate. Maybe it was you. The CEO said we need to be doing AI, looked at the person closest to the stack, and said Figure it out. No budget line for compliance. No playbook. Just a deadline that's already making them anxious.

A few weeks later, Claude is triaging recruiting applications before they hit the ATS. Or scoring inbound leads before they land in HubSpot. Or summarizing support tickets before a human reads them.

Nobody called this an "AI product." It's an internal workflow, wired together to save someone's Tuesday, which is exactly why almost nobody checks it against the EU AI Act.

Here's the part that surprises most of the mandate carriers we talk to: the Act doesn't care whether you're shipping a copilot feature to paying customers or duct-taping a Claude skill into your own hiring pipeline. If the workflow touches employment, creditworthiness, or another Annex III category, the same obligations apply either way. High-risk system requirements become enforceable on August 2, 2026, whether or not the workflow has a price tag attached to it.

Key Takeaways

EU AI Act high-risk rules apply to internal AI workflows (HR screening, credit decisions, customer eligibility calls) exactly as much as to a product you sell. Annex III is defined by use case, not by whether you have paying customers.
Penalties come in tiers: up to €35M or 7% of global turnover for prohibited practices, up to €15M or 3% of turnover for high-risk system non-compliance (Article 99, EU AI Act).
As of March 2026, no notified bodies were formally designated in the EU's NANDO database for AI Act conformity assessments, and the audit queue you're counting on may not exist yet.
The fix isn't a bigger GRC tool. It's four habits: Delegation, Description, Discernment, and Diligence, the same habits that make any AI workflow trustworthy, not just a compliant one.

We didn't invent those four habits. They come from Anthropic's AI Fluency framework, and they're the spine of every implementation we build. Below are the seven traps we keep finding in internal AI workflows at growing companies, organized around the D each one is really about.

Delegation: Deciding What Needs Governance Before You Build It

Delegation means thoughtfully deciding what work to hand to AI and what stays with a human, including which workflows carry enough risk to need a governance layer before anyone touches them.

The trap: You built a "summarize and route" skill for your ops team. Legal signs off because it "just summarizes text." Then someone on the recruiting side starts feeding it resumes to pre-screen candidates before a human looks at them. In a single afternoon, your low-risk internal tool became a high-risk system under Annex III, and nobody updated the risk assessment because nobody knew the use case had changed.

If every workflow gets treated the same regardless of what it actually touches, you end up choosing between applying heavy governance to everything (slow, expensive, kills adoption) or missing the one workflow that actually needed it (the one a regulator or a rejected candidate asks about).

The fix: Tag risk at the workflow level, not the tool level. Every Claude skill or connector we build carries a risk_profile flag set at creation and re-checked whenever its scope changes. A workflow flagged high_risk (recruiting, credit, access-to-services) routes through stricter logging and a mandatory human-review gate before output reaches anyone. The question to ask your own stack this week: can you name every internal workflow that touches an Annex III category, in one sitting, without opening five tools first?

Description: Writing Down What the System Actually Does, and Keeping It Current

Description is communicating clearly and precisely about what a system does. Applied to compliance, that means your documentation has to describe the system as it exists today, not as it existed the week you wrote the PDF.

The trap: Someone spends a week producing a beautiful technical write-up: the model version, the retrieval pipeline, the guardrails. Two weeks later, the model provider deprecates that version, and your team ships a hotfix. The document is now describing a system that no longer exists. Under the AI Act, outdated documentation for a high-risk system isn't sloppy. It's non-compliant.

The fix: Treat documentation as code, not as a deliverable. Every deployment should generate a small structured artifact (model ID, system prompt hash, temperature, retrieval source version) versioned alongside the code that produced it. If your compliance documentation doesn't update itself every time you ship, it's already lying to you.

Discernment: Judging the Output, Not Just Trusting the Vendor's Safety Filter

Discernment is evaluating AI outputs critically: knowing when to trust them, when to question them, and when to override them. Two traps live here, and they're really the same failure at different layers.

The trap (bias): You rely on your model provider's built-in safety filters and assume that makes your application safe by extension. It doesn't. The Act regulates you as the system provider, not just the underlying model. If you wrap a foundation model into "rank these applicants" or "flag this transaction," you own the bias your prompt structure and retrieval logic introduce. The provider's filter was never built to catch that your specific prompt quietly downranks non-native English speakers.

The trap (tooling): You have a strong DevOps culture: unit tests, integration tests, alerting. And you tell an auditor your Quality Management System is your CI/CD pipeline. Article 17 requires a QMS that explicitly addresses risk management and incident reporting. A failing unit test is a code problem. A model that hallucinates advice on a customer's eligibility is a risk problem. Standard DevOps tooling doesn't know the difference, which means you have no audit trail of risk decisions at all.

The fix: Build a small "golden dataset" of edge cases designed to surface bias, and run it against any prompt change before it reaches production. Layer a lightweight risk registry on top of your existing issue tracker: tag incidents by risk impact, and link every one to the specific model version and prompt version involved. You don't need a new tool. You need a habit of asking "is this a bug, or is this a risk decision?" every time something goes wrong.

Diligence: Building the Audit Trail Before Someone Asks For It

Diligence is understanding what happens to data, what leaves the building, and what the governance layer actually looks like, not as an afterthought but as a first-class part of the build. Three traps live here, and together they're the biggest gap we see.

The trap (vendor chain): Your architecture depends on a chain of providers: cloud hosting, a vector database, a model API. Article 25 covers the AI value chain: if one of those providers has an outage or changes terms and your system fails as a result, you're the one facing the regulator, not them. Pointing upstream doesn't work if you have no independent record of what you sent and what came back.

The trap (timeline): You've pencilled in "compliance sprint" for early 2026. As of March 2026, no notified bodies were formally designated in the EU's NANDO database for AI Act assessments (reporting on the notified-body gap), the third-party auditors' high-risk systems are meant to use. If the queue doesn't exist yet, appearing "audit-ready" with months of clean logs is what gets you to the front of it once it does.

The trap (scattered logs): A regulator, or a candidate who was rejected, asks you to reconstruct exactly why the system decided what it decided on a specific date. You start checking five different dashboards: hosting logs, the database, the vector store, the model provider's console, just to stitch together one decision. That's not just slow. Article 12 requires automatic, unified record-keeping. Manual archaeology across five systems is itself the compliance failure.

The fix: Log independently of your providers: keep your own record of every request and response, not a reliance on logs you don't control. Centralize the full "decision DNA" of every AI transaction (input, retrieved context, system prompt, model parameters, output, latency, cost), bound together by one trace ID.

We've written before about what happens when your architecture leans on a single provider for this, and about treating compliance like technical debt you pay down in sprints instead of all at once. The same logging gap shows up whenever an AI system fails silently in front of a user,r and nobody can reconstruct why.

This is the part I've actually built before, not just read about. Before PromptMetrics existed in its current form, we built an LLM observability tool that logged exactly this kind of decision trail, and eventually open-sourced it.

The lesson that stuck: centralized logging isn't a compliance checkbox. It's the difference between debugging a hallucination in minutes versus losing an afternoon to archaeology across five dashboards. We build that same audit-trail layer into every implementation we ship now, not as a separate product, but as a first-class part of the build itself.

Governance as Care, Not Just Compliance

Here's the reframe that actually matters, past the fines and the audit dates: every approval gate, every human-review step, every piece of this that feels like paperwork is the same requirement as "keep a human in charge of the decision." Article 12 traceability and the human is still in charge here are the same idea, described two different ways.

We build this because we believe the point of AI is to make work more human, not less, not because a regulator is watching. If you fix the logging (the Diligence habit above), you don't just satisfy an auditor. You gain the visibility to catch a bad output before it reaches a candidate, a customer, or a regulator asking why.

FAQ

Does the EU AI Act apply to AI tools we only use internally, never sell to customers? Yes. Annex III high-risk categories (employment, creditworthiness, access to essential services, among others) are defined by what the system does, not by whether it's sold externally. An internal recruiting-screening workflow carries the same obligations as a commercial one.

What's the actual fine for an EU AI Act violation? It depends on the violation type. Prohibited AI practices under Article 5 carry fines up to €35M or 7% of global turnover, whichever is higher. High-risk system non-compliance, the category most internal workflows fall into, carries fines up to €15M or 3% of turnover (Article 99). SMEs face the lower of the two figures, not the higher.

What does Article 12 record-keeping actually require me to log? Automatic, unified traceability of AI-driven decisions: enough that you can reconstruct why a specific output happened on a specific date without manually cross-referencing multiple systems. In practice: input, retrieved context, system prompt, model parameters, output, and a trace ID binding them together.

Are there enough notified bodies to get us audited before August 2026? Not yet, as of the most recent reporting. No bodies were formally designated in the EU's NANDO database for AI Act conformity assessments as of March 2026. Building a clean audit trail now is what gets you to the front of that queue once capacity exists.

We haven't started. What's the first thing to do this quarter? Audit your logging first. It's the trap with the widest blast radius and the one every other fix depends on. Then map which internal workflows touch an Annex III category. Then move your technical documentation into your deployment pipeline instead of a PDF.

Where we are, honestly: we're building this exact audit-trail and human-review layer into every implementation we ship right now, starting with our first customers. If you want a second set of eyes on whether your current internal AI workflows would survive an Article 12 request, that conversation is free.

Book a free 30-minute gap-check call →

Delegation: Deciding What Needs Governance Before You Build It

Description: Writing Down What the System Actually Does, and Keeping It Current

Discernment: Judging the Output, Not Just Trusting the Vendor's Safety Filter

Diligence: Building the Audit Trail Before Someone Asks For It

Governance as Care, Not Just Compliance

FAQ

Get the next field note

Build the fluency once. Keep it.