Insights/·10 min read
What Your Big Four Audit Will Miss About the AI Act
Big Four firms are good at many things. AI Act specifics, today, is not yet one of them — and the gap shows up in surprisingly consistent ways. Here's what we see when auditing their AI Act work for our clients.
By Mehmet Köse · Founder, Synapsrix
We respect the global audit firms’ work on cybersecurity, financial controls, and privacy. We also review their AI Act deliverables when clients ask for a second opinion — and the same gaps recur. This article names those gaps so you can use it as a QA checklist, whether you hired a Big Four team or not. Nothing here names a firm; examples are anonymised and composite.
If you want a regulation-native pass on an existing pack, email Mehmet with your deliverable — we will mark it up against the Articles once, free, for qualified enterprise prospects. That is not altruism; it is how we calibrate whether we should work together long term.
Contact: use the website contact flow or your existing Mehmet thread — do not send restricted law-enforcement documentation without a signed NDA; we will refuse classified material.
Why we're writing this
Big Four AI Act audit engagements often inherit GDPR programme staffing and ISO templates. The AI Act is not ISO, and it is not GDPR. When methodology does not update, you get consistent blind spots: classification, value-chain mapping, post-market monitoring, and incident pathways.
Keyword note: if you searched “Big Four AI Act audit” or “AI Act audit cost”, the market reality is simple — initial AI Act readiness projects from top-tier firms are often quoted in six figures for mid-market multinationals; scoping matters more than hourly rates.
Mid-sized SME quotes often land €40k–€150k for first-wave assessments when multiple entities and languages are involved — your quote may differ; negotiate scope, not page count.
Gap 1: GDPR-frame bleed
Symptom: Annex III classification treated like DPIA triggering — “we process personal data, therefore high risk.”
Why it fails: High-risk classification under Article 6(2) and Annex III depends on intended purpose and risk class, not on GDPR processing alone. A personal-data-heavy system can still be non–high-risk if Article 6(3) applies and profiling is absent; conversely, a non-personal critical infrastructure system can be high-risk without a DPIA.
What good looks like: a two-axis worksheet — GDPR processing (yes/no, lawful basis) and AI Act class (prohibited / high / limited / minimal) — with an explicit “no automatic link” rule between axes.
Training symptom: workshops that only teach GDPR vocabulary will not produce Article 9 risk-management artefacts. Ask whether your Big Four curriculum included Annex III case studies without personal data — if not, add internal training.
Anonymised example: a retailer’s fraud-only scoring engine was flagged as Annex III credit risk because “finance owns it.” Legal review showed fraud detection pathways excluded from the creditworthiness point — the system still needed risk analysis, but the classification error would have forced unnecessary Chapter III scope.
Gap 2: Article 6(3) under-claimed
Symptom: Every Annex III tool classified as high-risk, including narrow workflow assistants.
Why it fails: Article 6(3) exists precisely to prevent over-classification where significant risk is absent and a (a)–(d) case applies — unless profiling exists (Article 6(3) second subparagraph).
Cost of over-classification: duplicate documentation, board fatigue, and vendor disputes. You still must document under Article 6(4) if you claim non-high-risk — documentation does not disappear; it shifts.
Conservative lawyers sometimes prefer over-classification to avoid misclassification liability. That is understandable but expensive. The middle path is a written Article 6(4) memo with explicit profiling analysis — if you cannot defend the memo, upgrade to high-risk controls.
Anonymised example — internal ticketing assistant
A service desk bot suggested knowledge-base articles after humans triaged tickets — Article 6(3)(b) “improve previous human activity” was plausible. The Big Four report ignored the pathway and demanded full Annex III workflows. After reclassification, the client adopted Article 50 transparency plus light logging — not zero work, but proportionate work.
Gap 3: Annex IV treated as a template exercise
Symptom: Forty-page PDF where Annex IV sections repeat boilerplate and the post-market monitoring plan is one paragraph.
Why it fails: Annex IV is structured for auditable lifecycle evidence — including monitoring, control, and lifecycle-change content as separate headings in the official Annex. Boilerplate that repeats risk language without operational monitoring detail fails the spirit of the Annex even when the headings exist. Article 72 expects continuous post-market monitoring for providers. If monitoring is thin, the technical documentation is pretty but not testable.
Anonymised example: a provider pack copied accuracy metrics from a brochure without linking them to update cadence — after a monthly model change, the metrics were stale with no ticket proving re-measurement.
Reviewer heuristic: search the PDF for the words “rollback,” “retrain,” “incident,” and “field action.” If frequencies are zero, the document may be literature, not an operational system description.
Gap 4: No continuous monitoring plan
Symptom: One-shot “AI risk assessment” deck, no owner, no cadence.
Why it fails: Article 9 requires a risk management system that is continuous. Provider-side monitoring and deployer-side Article 26(5) monitoring must connect to real operations — incidents, tickets, rollback events.
What to ask in a review: show me last quarter’s monitoring notes tied to ticket IDs. If the answer is a slide, you do not yet have Article 9-grade evidence — you have a project artefact.
Model-update churn
Most LLM providers ship monthly changes. A static “model card” dated six months ago is a warning sign, not a comfort. Continuous monitoring means version pins, regression tests, and documented acceptance when prompt or weight updates roll out.
Gap 5: Post-market Article 73 path undefined
Symptom: Report lists Article 73 but does not name who receives the first call, what evidence triggers the serious incident label, or how deployers inform providers under Article 26(5).
Why it fails: Article 73 reporting is time-sensitive. If your RACI is blank, you will miss cooperation expectations under Article 26(5) and Article 26(12).
Cross-link: Article 72 is post-market monitoring; Article 73 is serious incident reporting. Reports that merge them into a single “incidents” bullet without separate playbooks usually hide ownership gaps.
Anonymised example: a client’s runbook sent personal-data breach emails to the DPO but routed model-safety failures only to IT — Article 73 escalation never fired because nobody owned the AI Act label.
Gap 6: Subprocessor chain unmapped
Symptom: “Vendor: Microsoft” on a slide — no foundation-model chain, no region, no failover path.
Why it fails: Article 25 value-chain responsibilities matter when incidents propagate. A Copilot deployment may rely on Azure + OpenAI APIs; your contracts and monitoring must reflect both.
Customer-support AI stacks compound this: Intercom, Zendesk, or bespoke bots often call Anthropic, OpenAI, or Google models through opaque routing. Deployers still owe Article 26 duties — “we didn’t know the subprocessor” is not a defence; it is a procurement failure.
Cost realism without fake statistics
We are not publishing a survey. In 2025–2026 commercial practice, six-figure first-wave AI governance programmes at large enterprises are common; SME budgets cluster lower but can still spike when multilingual rollouts and sector regulators are involved. Treat quotes as commercial facts to verify, not benchmarks to cite in court.
Gap 7: “Policy attestation” without artefacts
Symptom: a board policy that says “we comply with the AI Act” with no system list, no logs, no instructions for use on file.
Why it fails: Regulators inspect operational controls, not intent. Article 26 expects named evidence — human oversight, monitoring, logs — not a mission statement.
Gap 8: Limited-risk ignored
Symptom: Article 50 chatbot transparency treated as optional “legal will handle later.”
Why it fails: Article 50 imposes UX and disclosure duties for in-scope interactions. Even if you are not high-risk, limited-risk transparency is mandatory where the provision applies — separate test, separate evidence pack.
What to do if you've already paid for a Big Four audit
Do not throw the work away. Use it as a baseline inventory and gap list. Then:
- Fix classification with an Annex III decision tree (see Annex III article).
- Map Article 26 evidence for each high-risk deployment (checklist).
- Attach monitoring tickets to risk reviews quarterly.
- Re-baseline when providers ship material model updates.
Internal politics
Second-line risk functions sometimes resist revising paid work. Frame the follow-up as delta remediation, not audit failure. The Big Four team is rarely wrong on intent; the regulation is simply new and dense.
Pragmatic next step
Run the scanner internally, then decide whether you need tooling or legal hours. If you already have a Big Four PDF and want a second regulatory read, send it to Mehmet — we will mark it up against the Articles, free, for enterprise leads who may become customers.
The offer is simple: one pass, regulation-grounded comments, no redesign of your entire programme — enough to see whether you are defensible or decorated.
How to brief the review
Include: inventory, classification memo, vendor list, and incident runbook. PDF is fine; editable comments come back as annotated files or tracked suggestions depending on format.
Even-handed closing
Audit firms will mature their AI Act methodologies — this post describes today’s recurring gaps, not permanent competence limits. Use it to raise the right questions in your next steering committee.
Tie-in: deployer checklist
If gaps 2 or 5 showed up in your review, start with the Article 26 checklist — it is where most SMEs actually fail audits, not in Annex IV font choice.
Regulatory honesty
We sell Synapsrix software. We still recommend independent legal counsel for borderline Annex III cases and for Article 27 FRIA scoping — tools accelerate evidence collection; they do not replace legal judgement.
Link back to the SME survival guide
If you need deadline and tier context before diving into audit gaps, read EU AI Act for Dutch SMEs: 2026 Survival Guide — it frames 2 August 2026 and deployer reality without vendor hype.
Final QA questions to ask your auditor
- Show me Article 6(3) memos for each Annex III system you marked non–high-risk.
- Show me ticket links for post-market reviews after the last model update.
- Show me Article 26(5) serious incident drills completed in the last year.
- Show me subprocessor mapping to foundation-model endpoints.
If answers are thin, you still have time to fix evidence before August 2026 — start with classification, not branding.
What we are not claiming
We are not claiming Big Four teams lack integrity or skill. We are claiming that early AI Act deliverables often inherit formats from adjacent programmes — that shows up as consistent structural gaps you can fix with checklists and operational owners, not prettier covers.
Reading order for your committee
- Survival guide — deadlines and deployer economics.
- Annex III decoded — classification rigour.
- Article 26 checklist — evidence you control.
- This article — QA the consulting output before you sign the year-two statement.
If you want the comparison between regimes while you read, open AI Act vs GDPR in another tab — it prevents privacy teams and AI teams from talking past each other in the same meeting.
One-line takeaway: Big Four deliverables are useful inputs — they are not a substitute for regulation-native classification and deployer evidence under Article 26.
Treat external audits like financial audits: materiality matters — fix high-impact gaps first, document accepted risks explicitly, and revisit when models change. That is enough for today; tomorrow needs continuous monitoring.