There is a reliable way to make a room full of cloud engineers go quiet: mention TOGAF. The framework has accumulated a reputation as the domain of enterprise architects who produce wall-sized diagrams nobody reads, in PowerPoints nobody updates, for governance processes that slow everything down and add nothing to the system that ships.
I hold a TOGAF 10 certification. I have used the ADM in anger on real projects. And I want to defend it — not because it's perfect, but because the alternative to structured architecture in an enterprise AI context is not "moving fast." It's building systems that can't be audited, can't be governed, can't be explained to a regulator, and can't be handed over to a team that didn't build them.
What actually fails in enterprise AI — and why
The pattern I have observed across failed enterprise AI projects is consistent enough that I'd call it a syndrome. It goes like this: a motivated data science team gets executive sponsorship for an AI initiative. They move fast. They build something impressive. The demo lands well. Six months later, the system is in production but nobody trusts it, the compliance team has raised concerns, the original team has moved on, and the business owners have reverted to the spreadsheet because at least the spreadsheet is auditable.
What went wrong? Almost always, the same things:
- No documented architecture — so nobody knows why decisions were made, and nobody can change the system without risking breaking it
- No traceability from business requirements to deployed components — so when a requirement changes, nobody knows what to update
- No governance model — so when the system makes a decision that's wrong or anomalous, there's no defined process for reviewing and correcting it
- No integration with the enterprise data estate — so the system runs on a data island that doesn't stay in sync with production systems
These are not ML problems. They're architecture problems. And they're the exact problems that TOGAF ADM is designed to prevent.
The teams that dismiss enterprise architecture frameworks as overhead are often the same teams whose AI projects fail at the governance gate. Not because the AI was bad — but because they couldn't explain it, couldn't audit it, and couldn't hand it over.
What TOGAF actually provides — translated for an AI context
TOGAF ADM is a cycle of architecture phases, each producing a specific set of artifacts. When applied to an enterprise AI initiative, those phases map cleanly onto the questions you need to answer before and during delivery. Let me translate them:
| ADM Phase | What it produces for an AI project | Why it matters |
|---|---|---|
| Phase A — Architecture Vision | Statement of Architecture Work, stakeholder map, AI readiness assessment, high-level solution concept | Forces alignment on what the AI system is actually supposed to do before anyone writes code |
| Phase B — Business Architecture | Business process maps, capability gap analysis, AI use case catalogue, regulatory constraint catalogue | Identifies where automation is appropriate and where human judgment must be preserved |
| Phase C — Data & Application Architecture | Data flow diagrams, feature store design, agent topology, API contracts, integration specifications | Defines the data fabric the ML models and agents depend on — the piece most teams skip |
| Phase D — Technology Architecture | GCP reference architecture, Terraform module structure, security architecture, MLOps pipeline design | Maps every component to a named GCP service so the architecture is buildable, not aspirational |
| Phase E/F — Migration & Implementation | Phased delivery plan, Architecture Decision Records, ADK agent specifications, HITL checkpoint design | The governance artifacts that allow the system to be delivered by a team, not just built by a person |
Notice that none of this is bureaucratic ceremony. Every artifact is something a delivery team needs. The Architecture Decision Records tell the next engineer why the system is built the way it is. The data flow diagrams tell the compliance team what data moves where. The agent topology tells the business owner what the AI can do autonomously and what it will always ask a human about.
Architecture Decision Records — the single most valuable artifact
If I could preserve only one practice from the TOGAF ADM for enterprise AI projects, it would be Architecture Decision Records. An ADR is a short document that captures a significant architectural decision: what was decided, what alternatives were considered, and the reasoning behind the choice.
For an agentic AI system on GCP, the decisions that warrant ADRs include: why Firestore over Spanner for agent state (answer: agent state is document-shaped and requires sub-10ms reads, which Firestore delivers natively); why Cloud Run over GKE for stateless inference modules (answer: cost and operational overhead at the volume and latency profile of the use case); why SHAP over LIME for the XAI layer (answer: SHAP's additive feature attribution model maps to the explanation contract the Finance team can actually act on).
These decisions sound technical. But their consequences are organisational. When the system behaves unexpectedly, the ADR tells you whether the behaviour is a bug or a documented tradeoff. When a new regulation requires a change, the ADR tells you which components are affected. When the original architect leaves, the ADR tells the next person what they're working with and why.
What TOGAF ADM needs for agentic systems specifically
I want to be honest: TOGAF was designed for an era of monolithic enterprise systems, and applying it to multi-agent AI requires some translation. The ADM doesn't have native concepts for agent topology, autonomy boundaries, or HITL checkpoint design. Those are additions I've had to layer in.
The additions that I've found most valuable are:
An agent topology diagram in Phase C, showing each agent's domain, tool manifest, autonomy boundary, and communication protocol. This is not a UML class diagram — it's a contract specification that the delivery team can implement against.
An autonomy boundary specification for each agent: a formal definition of what the agent can do without human approval, what confidence threshold triggers a HITL checkpoint, and what the presentation contract for that checkpoint looks like. This is what makes EU AI Act Article 14 compliance designable rather than aspirational.
An ML model card template as a standard Phase C artifact. The model card documents intended use, known limitations, bias analysis, and the XAI contract — specified before training begins, not after. This is the difference between a model that can be audited and one that has to be rebuilt to satisfy an audit.
TOGAF ADM + Google ADK + TOGAF 10's updated guidance on digital transformation is not legacy architecture with AI bolted on. It's a complete framework for designing systems that are intelligent, governable, and deliverable. The enterprises that crack this combination are the ones that will ship AI that stays in production.
The practical ask for enterprise architecture leaders
If you're an Enterprise Architect at an organisation with active AI initiatives, here's what I'd suggest:
Stop being defensive about architecture governance and start framing it as delivery enablement. The data science team that is frustrated by the governance gate is frustrated because the gate is producing the wrong artifacts — compliance theatre instead of decision support. ADRs and agent topology diagrams are not overhead. They're the documentation that lets the AI system be maintained, extended, and trusted by people who weren't in the room when it was built.
Get involved at Phase A, not Phase E. The most expensive architectural decisions in an AI project are made in the first two weeks — which data sources to use, which agent boundaries to draw, which compliance obligations to satisfy structurally versus procedurally. An EA who arrives at Phase E to review the deployment is too late. The structural decisions are already made, and changing them means rebuilding.
Learn the GCP stack. TOGAF without the technology architecture is half a framework. If you can't map a TOGAF Phase D artifact directly to Terraform resources, Vertex AI pipelines, and ADK agent definitions, the architecture will stay aspirational. The intersection of TOGAF ADM and GCP production engineering is where the value is — and it's still a relatively uncrowded space.
That's the opportunity. Enterprise AI architecture done properly is not slow. It's the thing that makes everything else go faster — because it prevents the rebuilds, the compliance failures, and the systems that get shelved because nobody can explain them.