Public draft · open for comment

Out of the
Sandbox

How leaders get agentic AI out of the pilot and into production.

A field guide for the executives who fund it and the CIOs who govern it.


A ginger kitten exploring a wooden sandbox full of toys — the book's mascot

This is a draft — tell me where it's wrong

Read it, then push back.

This is a complete draft of the book, shared publicly ahead of publication. I'm after the feedback only working leaders can give: where the argument doesn't survive contact with your reality, which claims need stronger evidence, what's missing, and the examples — wins and failures — from your own organisation that would sharpen the next edition.

Corrections, disagreements, and hard questions are all welcome.

Writes to dr.ruiping.lua@gmail.com with a subject line filled in.

Preface

Every few weeks I sit in a version of the same meeting.

A team has built something impressive. An agent that can resolve a customer issue end to end, or reconcile a ledger, or triage a case file, or guide a citizen through a benefits application. The demo works. The room is energised. Someone senior asks the only question that matters: when does this go live? And the answer, delivered with a confidence that erodes a little more each quarter, is "soon."

It is almost never soon. Eighteen months later, that organisation has a portfolio of twenty or thirty of these impressive things, and few of them are running in production, touching real customers and real money without a human re-checking every move. The agents work. The organisation cannot ship them.

I wrote this book because the reason is misunderstood, and the misunderstanding is expensive. Boardrooms, the trade press, and most writing about enterprise AI assume this is a technology problem, and that the next, more capable model will solve it. It will not, because it was never a capability problem. The agent in the sandbox can already do the task. What it cannot yet do is be trusted to do the task: unattended, at scale, when an adversary is pushing on it, when something goes wrong — and to prove afterward exactly what it did and why.

That gap — between an agent that works and an agent you can trust in production — is not an engineering gap. It is a governance and operations gap. That single reframe changes who is responsible for closing it. The work of getting out of the sandbox is not work you can hand to your machine-learning team and wait on. The decisions that close the gap — how much autonomy to grant, what evidence to demand before you scale, where a human must stay in the loop, what risks you will and will not accept, how you will prove control to a regulator — are executive decisions. They are yours.

That is the good news in this book. The thing standing between your organisation and production agentic AI is not a breakthrough you have to wait for. It is a discipline you can choose to fund and govern, starting now. This book is about what that discipline is, what it costs to skip, and what good looks like when you see it.


Who this book is for

I wrote this book for two readers, and it assumes they will sometimes be the same person.

The first is the executive who funds and is accountable for AI — the chief executive, the agency head, the board member, the permanent secretary. You do not need to know how a model works. You need to know why most agentic projects fail to ship, what a credible path to production looks like, what it costs, and how to tell the difference between a team that is ready and a team that is performing readiness. You need to know which questions to ask before you sign off, and which answers should stop you.

The second is the CIO, CTO, or Chief Digital Officer who governs the delivery — the person who has to turn the mandate into an operating model, turn the risk appetite into controls, and stand in front of the board (or the regulator) when something goes wrong. You may be more technical than the first reader, but your problem in this book is not a technical one. It is building an organisation, a governance system, and a platform that can ship and run autonomous software safely and repeatedly.

This book is not a construction manual. If you are the engineer who has to build the evaluation harness or configure the identity controls, you will find the what and the why here, and a clear map of the decisions your leadership needs to make — but you will need a more technical reference for the how. That is deliberate. Other books handle the construction problem well. The leadership problem they leave alone, and it is the one killing these projects.


How to use this book

The book has four parts, moving from understanding to method to scale to practice.

Part I — Foundations establishes the problem: why agents die in the sandbox, what an agent is (in terms a board needs, not an engineer), and the new risks that come with software that acts on its own.

Part II — The Readiness Method is the core: the staged path from a scoped use case to a governed production system — use-case selection, evaluation, identity and access, human oversight, security, deployment, and operations. This is the part you fund.

Part III — The Operating Model scales it: how to govern not one agent but a portfolio of them, how to make good governance a property of your platform rather than a tax on every team, and how to reshape the organisation and meet the regulator.

Part IV — Cases and Practice grounds all of it: detailed case studies, reference patterns and anti-patterns, and a set of templates and checklists you can take into your own organisation.

Every chapter carries the same furniture, designed for a reader with no time to waste:

  • What this chapter equips you to decide — the leadership decisions the chapter is preparing you to make, stated up front.
  • A worked decision — a realistic scenario showing the chapter's ideas in the room where the call gets made.
  • Key takeaways — the chapter in a handful of lines.
  • Questions to ask your organisation — the diagnostic questions to put to your teams, your vendors, and yourself. For many readers this is the most useful page in the chapter.

Each chapter also closes with a brief look ahead to the next, so the book reads as a connected path rather than a set of separate topics.

The executive who funds can read the chapter openings, the worked decisions, the takeaways, and the questions, and come away able to govern the work. The CIO who delivers will want the full text. Both should start with Chapter 1, because it reframes the entire problem — and if the reframe doesn't land, nothing else in the book will.

A word on currency. This field moves monthly: the specific tools, model versions, and some of the governance frameworks I cite will have changed by the time you read this. I wrote the principles as the durable layer, and treat named products and frameworks as illustrations of those principles, not as the substance. Where I cite a specific framework — Singapore's agentic AI governance framework, the OWASP risk lists, and others introduced later — it is because it crystallises a principle, not because it is the last word. Govern to the principle; let the tooling change underneath it.


Part I — Foundations


Chapter 1: The Production Gap

What this chapter equips you to decide - Why your agentic AI initiatives are stalling — and why "the technology isn't ready yet" is almost always the wrong diagnosis. - Why the gap between a working pilot and a production system is yours to close, not something to delegate to your technical teams and wait on. - How to recalibrate the instincts you built shipping conventional software, which will mislead you with agents. - What kind of programme, and what kind of spending, moves an agent from sandbox to production.


The demo that never ships

Begin with the scene. You have lived it.

A capable team demonstrates an AI agent. Unlike the chatbots of a few years ago, this thing does more than answer questions — it acts. It works through a multi-step task, decides which systems to consult, takes the steps a competent employee would take, and produces a result. In the controlled conditions of the demonstration, it is remarkable. The natural conclusion in the room is that the hard part is done and the rest is a matter of rollout.

The rest is not a matter of rollout. Months later, the agent is still in its sandbox — the isolated, safe environment where it was built — and the organisation cannot explain why it hasn't gone live. The team is busy. There is always one more issue. Confidence in the room has curdled into a low, unspoken worry that the whole category was oversold.

This is the dominant outcome. Gartner projects that companies will cancel more than four in ten agentic AI projects by the end of 2027, and its stated reasons are the thread that runs through this book: escalating costs, unclear business value, and inadequate risk controls. An MIT study last year found that around one in twenty generative AI projects reaches real scale. The gap between the organisations in production and those still circling is widening into a competitive divide: the ones who cracked it are pulling ahead, while the rest stay stuck in what the industry calls, with some weariness, "pilot purgatory."

Now look again at Gartner's three reasons for cancellation: cost, value, controls. Notice what is not on the list. Not one of them is "the model could not do the task."

The bottleneck was never the model

The agent in your sandbox can already do the work. The thing stopping it from going live is not a missing capability that a future model will supply. It is the absence of everything around the model that production requires and a demo does not: the ability to handle being wrong without causing irreversible harm; the ability to prove, after the fact, exactly what the agent did and on whose authority; the ability to withstand someone, or something, trying to make it misbehave; and the ability to hold up not on the ten happy-path cases in the demo but on the ten thousand strange inputs that arrive every day in the real world.

A sandbox answers one question: can the agent do the thing? That is a necessary question, and a misleading one, because answering it tells you almost nothing about whether you can run the agent in production. Production asks four different questions, and the sandbox is structurally unable to answer any of them:

  • What happens when it is wrong? In a sandbox, a wrong answer is a useful learning. In production, a wrong action — a payment released, a record altered, a message sent — carries a consequence that may not be reversible.
  • Can you prove what it did? A demo has an appreciative audience. Production has auditors, regulators, and incident reviews. If you cannot reconstruct why your agent took a specific action six weeks ago, you do not have a production system; you have an unbounded liability that happens to be working for now.
  • Does it stay safe under pressure? No one attacks your sandbox. In production, the agent's exposure includes content it merely reads — a document, an email, a web page — any of which can carry instructions designed to hijack it.
  • Does it hold up on the long tail? The demo ran the cases you chose. Production runs the cases reality chooses, most of which you did not imagine.

Here is the reframe this book is built on. The sandbox's job was never to prove that the agent works. Its job is to surface what breaks when the agent is allowed to act. Getting out of the sandbox is the work of reproducing the sandbox's safety conditions — its bounded scope, its watching human, its reversibility, its careful observation — as permanent, engineered, governed infrastructure that holds when no one is watching. The pilot was never the destination. The infrastructure is.

Why this is your problem, not your technologists'

If the gap were a capability gap, it would not be your problem. You would fund the research, wait for the better model, and let the technical organisation tell you when it had arrived. Many leaders are doing exactly this — treating the stall as a technology timing issue and waiting it out. They will wait a long time, because they have misdiagnosed the disease.

The gap is a governance and operations gap, and that changes who owns it. Consider what has to be decided to put an autonomous system into production:

  • How much autonomy will you grant it, and over which actions?
  • What is the worst outcome you are willing to accept, and how much evidence do you require before you accept it?
  • Where must a human remain in the decision, and where is human review theatre that adds cost without adding control?
  • What is your appetite for the new risks — and your tolerance for the new costs — that come with software that acts?
  • How will you demonstrate, to your board and to your regulator, that all of this is under control?

Not one of those is a technical question. Every one is a question of risk appetite, accountability, and resource allocation, which is to say an executive question. Your machine-learning team cannot answer them, and you should not ask it to. Forced to answer by default, it will give you either an agent so constrained it delivers no value or one so unconstrained it becomes the incident that sets your programme back a year. You cannot delegate your way out of the sandbox. The decisions that close the gap belong to the people who can be held accountable for them. That is the burden, and the opportunity: the thing blocking you is not a breakthrough you must wait for, but a discipline you can choose to install.

The trap of your own experience

Capable, experienced leaders struggle with this transition for a counter-intuitive reason: your experience shipping software is partly a trap.

For thirty years we have built and run deterministic software. Given the same input, it produces the same output, every time. We learned to manage it: you specify what it should do, you test that it does that, and once it passes, it keeps behaving — the same way every time — until someone changes the code. Decades of executive instinct about technology risk, testing, sign-off, and control rest on that determinism.

An agent breaks the foundation in two ways at once. First, it is not deterministic: the same input can yield different reasoning and different actions on different runs, so "we tested it and it passed" no longer means "it will keep doing that." Second, and more important, it acts. Conventional software waits to be told what to do, step by step, by logic a human wrote in advance. An agent decides for itself which tools to use, what information to gather, and what actions to take to reach a goal — and a single piece of malicious input can redirect those decisions while it is running. The discipline for shipping predictable, instruction-following software does not survive contact with a system that reasons and acts on its own. Applied unchanged, your hard-won instincts will tell you that a passing demo means a shippable product, exactly as they always did. With agents, that instinct is wrong, and acting on it is how organisations either stall out of excess caution or ship the incident that teaches them caution the hard way.

That replacement discipline does for autonomous, acting software what testing-and-sign-off did for deterministic software: it establishes warranted trust before you depend on the system, and maintains that trust while it runs.

What moves an agent to production

If the gap is governance and operations, the spending that closes it looks different from what most organisations are funding. The instinct is to invest in capability: better models, more pilots, more impressive demonstrations. But more pilots do not get you out of pilot purgatory — in the sectors moving fastest, leaders are doing the opposite, retiring scattered pilots and consolidating onto shared, governed infrastructure.

What moves an agent into production is the unglamorous machinery of this book: a way to evaluate what the agent does, not only whether its final answer looked right; a way to give it a scoped identity and the narrowest permissions the job allows; a calibrated decision about where humans must stay in the loop; defences against a new attack surface; a disciplined rollout that earns autonomy with evidence rather than enthusiasm; the operational capability to watch a running agent, control its cost, and stop it on demand; and an operating model that lets you govern all of this across a growing portfolio without your governance cost rising just as fast. None of it demos well. All of it is what production means.

That is the work. This book is a method for doing it, and — just as much — for funding it, governing it, and recognising it when your teams get it right.


A worked decision

You are the CIO. The board has asked a fair and uncomfortable question: "We have spent two years and a significant budget on AI. We have thirty pilots. Why is almost nothing in production?" The instinct in your technology leadership is to answer with capability — to point to model limitations and ask for more time and a larger model budget.

The reframe gives you a better and more honest answer, and a more fundable one. The pilots are not stuck because the models are weak; the demonstrations prove the models are strong enough. They are stuck because the organisation has not yet built the means to trust an autonomous system in production: no real way to evaluate agent behaviour, no scoped permissions, no agreed position on human oversight, no ability to prove what an agent did after the fact. None of that is a model problem, and waiting solves none of it.

The better answer to the board is therefore not "we need more time for the technology to mature." It is: "We have proven the capability. We have not yet built the governance and operational machinery that lets us put autonomous systems into production safely — and that machinery is fundable, definable work that I can put a plan and a number against. The reason to fund it is that the organisations crossing this gap are pulling ahead of the ones still circling it." That answer reframes the spend from an open-ended bet on technology into a defined investment in production capability — and it puts the decision back where it belongs, with the people accountable for the risk.


Key takeaways

  • The dominant outcome for agentic AI projects is not failure in the lab but failure to ship — capable agents that never leave the sandbox. The leading analysts attribute the coming wave of cancellations to cost, unclear value, and inadequate controls, not to model capability.
  • The distance between a working pilot and a production system is a governance and operations gap, not a capability gap. The agent can already do the task; what is missing is everything required to trust it to do the task unattended, at scale, under attack, and provably.
  • Because the gap is about risk appetite, accountability, and resource allocation, closing it is an executive responsibility. You cannot delegate it to technical teams and wait.
  • Your experience shipping deterministic software is partly a trap: agents are non-deterministic and they act on their own, so "the demo passed" no longer means "it is safe to ship."
  • More pilots do not end pilot purgatory. The investment that moves agents to production is the operational and governance machinery — evaluation, scoped identity, calibrated oversight, security, disciplined rollout, operations, and a portfolio operating model — that this book describes.

Questions to ask your organisation

  • Of every agentic initiative we have funded, how many are in production taking real, unsupervised actions — and how many are impressive demonstrations that have not shipped? What, specifically, is blocking each one?
  • When we explain why a pilot has not gone live, are we citing model capability — or are we citing the absence of evaluation, controls, oversight, or auditability? If it is capability, are we sure?
  • If one of our agents took a wrong action in production tomorrow, could we (a) reverse it, (b) detect it fast, and (c) reconstruct exactly what it did and on whose authority? If the answer to any of these is no, why is that agent live?
  • Who in this organisation owns the decisions about agent autonomy, human oversight, and acceptable risk? If the honest answer is "the engineering team, by default," that is a finding, not an answer.
  • Are we funding more pilots, or are we funding the machinery that turns pilots into production? What is the ratio, and is it the right one?

Chapter 2 — "Anatomy of an Agent" — gives you the working understanding of what an agent is: how it perceives, reasons, remembers, and acts; how it differs from the automation and chatbots you already run; and the spectrum of autonomy you will be choosing a point on. You cannot govern what you cannot picture, and Chapter 2 builds the picture.

Chapter 2: Anatomy of an Agent

What this chapter equips you to decide - Whether the thing your team or your vendor calls an "agent" is one — or is automation and chatbots wearing a more expensive name. - Which parts of an agent create its value, and which parts create its risk, so you know where to direct attention and money. - How much autonomy to grant — the single most consequential decision you will make about any agent, and one only you can make. - Why two projects both labelled "AI agent" can require different governance, budgets, and risk acceptance.


You cannot govern what you cannot picture

In the meeting where an "AI agent" project gets approved, there are three different pictures in the room and no shared one.

The executive funding it pictures something like a tireless digital employee. The engineer building it pictures a model wired to a set of tools through a loop of code. The risk officer, if there is one in the room, pictures a vague but growing liability. Everyone says "agent," nods, and approves a thing each of them has defined in a different way. The definitional fog is not a harmless imprecision. It is the first governance failure, because the controls you need, the risks you carry, and the money you should spend all depend on what the thing is — and if the people accountable cannot picture it, they cannot govern it.

This chapter builds the shared picture. It is not an engineering account; you do not need to know how a model is trained any more than the chief executive of an airline needs to know how to forge a turbine blade. You need the working anatomy: what the parts are, what each part does for you, and — the part most explanations skip — what each part does to you if you leave it ungoverned. By the end you should be able to look at any proposed agent and locate where its value lives and where its danger lives.

Start with a single sentence that will carry the whole chapter:

An agent is a piece of software you have given a goal and the means to pursue it on its own.

Every clause in that sentence is a decision a leader owns. A goal — you define what it is for. The means — you grant it the ability to touch real systems and take real actions. On its own — you decide how much it does without you. Hold that sentence in mind as we take the machine apart.

The five parts of an agent

Strip away the jargon and an agent has five working parts. I will describe each in plain terms and then, in the same breath, name the governance surface it creates — because for the reader of this book those are the same subject.

1. A goal. Someone gives the agent an objective — "resolve this customer's billing dispute," "reconcile these accounts," "help this citizen complete their application." Conventional software is told how; an agent is told what, and works out the how itself. The governance surface: a goal that is vague, too broad, or poorly bounded is an instruction to improvise in directions you did not intend. The narrower and clearer the goal, the smaller the space of things that can go wrong. Scoping the goal is not an engineering nicety; it is the first and cheapest control you have.

2. Perception — what it can see. To pursue a goal, the agent takes in information: the user's request, data from your systems, documents, records, sometimes content from the open web. This is its window onto the world. The governance surface: everything the agent reads is a potential avenue of attack. Unlike a human, an agent can be deceived by content it merely reads — a document or message can carry hidden instructions that redirect it. What the agent is allowed to perceive is therefore both the source of its usefulness and the size of its exposure. We will return to this; it is one of the new risks of the category.

3. Reasoning and planning — the part that decides. At the centre is the model, which interprets the goal, breaks it into steps, and decides what to do next. This is the capability that impressed the room in your demo, and it is the part you cannot fully specify in advance. The governance surface: because the agent decides its own steps, and because it does not decide them the same way every time, you cannot guarantee its behaviour the way you could guarantee conventional software by reading the code. This is why "we tested it once and it worked" is not a basis for trust, and why a discipline of evaluation — Chapters 5 through 7 — exists to establish trust by other means.

4. Memory — what it carries. Agents retain context: within a single task, so that step five remembers what happened in step two; and sometimes across sessions, so that the agent recalls a customer or a case over time. The governance surface: memory is a data-governance question in disguise. What the agent retains, for how long, who can see it, and whether it carries one person's information into another's interaction are all decisions with privacy and compliance consequences. Memory is where an agent accumulates a data footprint no one approved.

5. Tools and action — the means to change the world. This is the part that separates an agent from everything that came before it. The agent is connected to tools — the ability to call your systems, query a database, send a message, issue a refund, update a record, trigger a process. Through these tools it does not merely produce words; it acts. The governance surface: this is the source of both the value and the danger, and the two are inseparable. An agent with no ability to act is a conversation. An agent that can act can create value at machine speed — and can cause harm at machine speed, including harm you did not anticipate. The breadth of what an agent can touch is the size of the risk you are carrying. More than any other single fact, this is why "give the agent the narrowest possible set of tools" is a recurring instruction in this book, and why an agent's tool access is something a leader should ask about by name.

These five parts run in a loop: the agent perceives, reasons about what to do, takes an action through a tool, observes the result, and decides whether it is done or needs another step — repeating until the goal is met or it gives up. That loop is the engine of an agent's usefulness. It is also why a small early error can compound: a wrong turn at step two becomes the premise for steps three through nine. An agent does not only make mistakes; it builds on them.

Notice the through-line. Each part is a place where capability and exposure are the same feature seen from two sides. That is the essential executive intuition about agents: you cannot increase what an agent can do without increasing what it can do wrong. Governance is not a brake on that trade-off; it is how you make the trade-off on purpose rather than by accident.

Agent, workflow, or chatbot — and why the label is a financial and risk decision

People now apply the word "agent" to almost anything with a model attached. Some of this is enthusiasm; much of it is marketing. The analysts have a blunt name for the latter — "agent washing," the rebranding of existing chatbots and rule-based automation as agents — and they estimate that only a small fraction of the things sold as agentic are. For you this is not a semantic complaint. Paying agent prices and accepting agent risk for capability that is a workflow is a waste; under-governing a genuine agent because procurement thought it was buying a chatbot is a hazard. You need to tell them apart.

Three categories are worth keeping apart:

  • A chatbot or copilot advises; a human acts. It answers, drafts, suggests, summarises. The human reads the output and decides what to do. Its blast radius is bounded by the human in front of it, who is a check on every output. Most of what enterprises have deployed so far lives here. The risk is real but contained, because nothing happens in the world without a person choosing to make it happen.

  • A workflow automates a path a human designed. It executes a sequence of steps, but the steps and their order were specified in advance by people. It may use a model inside one of those steps, but it does not decide its own route. It is predictable in the way conventional software is predictable, and it is governed the way automation has always been governed. RPA — the robotic process automation many organisations already run — lives here.

  • An agent decides its own path and acts on it. Given a goal, it determines the steps itself, chooses which tools to use, and takes the actions — with as much or as little human involvement as you have granted. It is neither bounded by a human checking every output (like the copilot) nor confined to a route a human pre-drew (like the workflow). This is the category this book is about, and it is the category that does not yet have a settled discipline for getting to production.

A simple litmus test cuts through most of the fog. Ask: Does it decide its own steps, or follow ones we specified? And: Does it take the action itself, or hand a recommendation to a human who acts? If it decides its own steps and takes its own actions, you have an agent, and you should govern it as one regardless of what the slide deck calls it. If it does not, you may have something useful — but you should not pay for, or fear, an agent.

The reason this matters to the person signing the cheque is that the three categories carry different risk and require different controls. A chatbot's safety rests on the human reading its answer. An agent's safety cannot, because for many of its actions there is no human reading anything. The governance machinery in Part II of this book exists because the human check that makes a copilot safe is gone — and something engineered has to take its place.

The autonomy dial: the decision that is yours alone

If you take one decision away from this chapter, take this one. Autonomy is not a property the technology has. It is a permission you grant. You can deploy the same agent at different levels of independence, and choosing that level is the most consequential — and least delegable — decision you will make about any agent, because almost everything else follows from it: the risk you carry, the cost of oversight, the controls you must build, and the value you can capture.

Think of autonomy as a dial with recognisable settings:

  • Advisory. The agent proposes; a human decides and acts. (This is the copilot.) Lowest risk, lowest leverage, because a human is still the bottleneck on every action.
  • Human-in-the-loop. The agent does the work and proposes the action, but a human must approve before it executes. The agent drafts the refund; a person clicks "approve." It contains risk at the cost of human time on every action.
  • Human-on-the-loop. The agent acts on its own, but a human monitors and can intervene or override. No one approves each action in advance; someone is watching, and can stop it. Higher leverage, and it depends on the watching being real.
  • Human-out-of-the-loop. The agent acts on its own, and human review is occasional, sampled, or after the fact. Highest leverage, highest risk, and defensible only where actions are reversible, low-stakes, or so well-evidenced that you have earned the right to step back.

The single most important thing to understand about this dial is that you do not set it once for the whole agent. You set it per action, according to how much it would cost to be wrong. The same agent handling a customer enquiry might operate human-out-of-the-loop when it looks up an order (trivial, reversible), human-on-the-loop when it drafts a reply (low cost, watchable), and human-in-the-loop when it issues a refund above a threshold (real money, worth a human's moment). Matching the level of autonomy to the stakes and reversibility of each action — not maximising autonomy, and not minimising it out of fear — is the central act of governing an agent well. We give it a chapter later (Chapter 9), and the principle that decides where the dial sits — reversibility — is the subject of Chapter 4. For now, the essential point is one of ownership: where that dial sits is a statement of your organisation's risk appetite, and risk appetite is not something an engineering team can set on your behalf. Forced to set it by default, they will choose either the timid setting that strands the value or the loose setting that becomes the incident.

There is a real prize at the high end of the dial, and this book is not an argument for timidity. Autonomy is where the economics of agents live: an advisory tool that still requires a human on every action saves far less than an agent trusted to handle the routine cases on its own. The goal is not to keep the dial low. The goal is to be able to turn it up safely — which is what the discipline in this book buys you. An organisation that has built rigorous evaluation, scoped permissions, real monitoring, and instant shut-off can grant autonomy with evidence behind it. An organisation that has built none of those things is choosing between leaving value on the table and flying blind. Discipline is what earns you the right to the high-value end of the dial.

The picture, assembled

Put the parts back together and the working picture is this: an agent is software you have given a goal and the means to pursue it, which perceives a situation, reasons about what to do, acts on the world through tools, and loops until done — at a level of independence you have granted. Its value and its risk rise together, action by action. It is not a smarter chatbot; the removal of the human who checks each output is what makes it a different thing to govern. And the master control you hold over it is the autonomy dial, set not once but per action, according to what it would cost to be wrong.

That picture is enough to govern from. The chapters ahead turn it into method. But before method, one more piece of foundation: if value and risk rise together, you need a clear-eyed map of the risks — including the ones that did not exist before software could act on its own. That map is Chapter 3.


A worked decision

Two proposals reach you in the same week, both titled "AI Agent for Operations," both requesting similar budgets.

The first, run through the litmus test, follows a fixed sequence of steps your team designed, with a model reading documents at one stage. It does not choose its own path, and a staff member reviews its output before anything happens. It is a workflow with a copilot bolted on — useful, and governable with the controls you already have for automation. It does not warrant agent-level scrutiny or agent-level fear.

The second has a goal — "clear the exceptions queue" — and decides for itself which records to investigate, which systems to query, and which corrections to post, executing many of them without a human in between. This is a genuine agent operating well up the autonomy dial, and the proposal, written by a capable and optimistic team, has not said so in those terms. Read through this chapter's lens, it is nothing like the first. It carries a far larger risk surface — it acts on real records on its own — and it needs the machinery this book describes: rigorous evaluation, scoped permissions, a decision about which corrections require human approval, and the ability to halt it instantly.

The decision is not to approve or reject. It is to recognise that you have two different animals with the same name, to govern and fund each on its own terms, and — for the second — to send it back with a single question this book will teach you to ask: "Show me how we will trust this to act on its own, and where we have decided it may not."


Key takeaways

  • An agent is software given a goal and the means to pursue it on its own. It has five working parts — a goal, perception, reasoning, memory, and tools/action — and each is at once a source of value and a governance surface.
  • The defining part is tools and action. An agent's ability to act is where its value and its danger both come from, and the breadth of what it can touch is the size of the risk you are carrying.
  • An agent is not a faster chatbot. A human reading its output makes a chatbot safe; an agent removes that human from many of its actions, which is why it requires engineered controls in the human's place.
  • "Agent washing" is real and costly in both directions: do not pay agent prices for re-badged automation, and do not under-govern a genuine agent because someone sold it as something tamer. A simple test — does it choose its own steps and take its own actions? — sorts them.
  • Autonomy is a permission you grant, not a property the technology has. Set it per action according to reversibility and stakes — and where it sits is a statement of your risk appetite you cannot delegate to the people building the system.

Questions to ask your organisation

  • For each system we are calling an "agent": does it choose its own steps and take its own actions, or does it follow a path we designed and hand decisions to a human? If it is not an agent, why are we governing and pricing it as one — and if it is, are we governing it enough?
  • What can each of our agents touch? List every system, dataset, and action it can reach. Is that list as narrow as the job requires, or as wide as was convenient to build?
  • For every action our agents can take, who decided the level of autonomy — advisory, approve-before, monitored, or unattended — and did someone decide it on the basis of how costly an error would be, or did it happen by default?
  • Which of our agents' actions are irreversible? For those, can we justify the autonomy level we have granted?
  • When a vendor sells us an "autonomous agent," do we have a standard way to verify the claim and to assess the real risk behind it — or do we take the label at face value in procurement?

Chapter 3 — "The New Risk Surface" — maps the dangers this anatomy creates, separating the risks you already know how to manage from the ones that did not exist before software could act on its own: non-determinism, excessive agency, the manipulation of an agent through what it reads, the compounding of small errors, and the new hazards that appear when agents act on behalf of people and alongside other agents. It is the map you govern against.

Chapter 3: The New Risk Surface

What this chapter equips you to decide - Which of your existing risk instincts still work against agents — and which are useless against a thing that acts on its own. - How to hold the landscape of agent risk in your head as a small number of governable families, rather than a hundred technical failure modes. - Why the most dangerous risks are not bugs that better engineering will fix, but permanent properties of software that can act — and therefore things to govern, not wait out. - The single security model worth carrying into every agent decision you make.


The failure you will spot, and the failure you will not

You may remember the first one. In late 2021 a worried parent typed a question into "Ask Jamie," the whole-of-government virtual assistant that GovTech had built and deployed across some seventy agency websites: their child had tested positive for COVID-19; what should they do? The chatbot, hosted on the Ministry of Health's site, advised them to practise safe sex through the consistent use of condoms. Asked where to get an antigen rapid test, it pointed citizens to the polio vaccine. The screenshots spread, the Ministry disabled the assistant for a system check, and the country had a brief, good-natured laugh at a government bot that had wandered off the topic.

It is a useful story because the lesson most people take from it is the wrong one. The comfortable conclusion is that AI fails visibly — that it says something so absurd that any reader catches it, the system gets pulled, and no harm is done. If that were the shape of all AI failure, agent risk would be a public-relations matter and little more. It is not the shape of the failure that should worry you.

Consider the second one, which looks nothing like the first. In 2025 the Australian government's Department of Employment and Workplace Relations commissioned Deloitte to produce an independent assurance review of an IT system that automates welfare compliance penalties — a contract worth around four hundred and forty thousand Australian dollars. Deloitte delivered a 237-page report, and the department published it on its website. Months later a university researcher noticed that some of its citations referred to academic papers that did not exist, and that it quoted a Federal Court judgment that had never been written. The fabrications were not absurd; they were plausible — fictitious sources sitting in the same authoritative register as the genuine ones, which is why they survived review and reached publication. Deloitte revised the report, disclosed that it had used a generative AI system to prepare it, and refunded part of the fee. The firm noted that its recommendations stood — its own quiet lesson about how deep the trust in a polished output can run.

Hold both stories in mind, because between them they contain most of this chapter's ideas. The systems were confidently wrong and could not know it. In the second case a government body relied on the output as authoritative and published it in its own name. And no one involved could shrug the error onto the tool: "the AI generated it" did not undo the refund, the public correction, or the fact that a department had put fabricated sources into the public record. You own what your AI produces in your name. Note what both systems had in common: they only spoke. One answered a question; the other wrote a document. Neither did anything in the world. The book you are reading is about agents that act — that issue the penalty, update the record, send the message, move the money — without a human reading each output first. Every risk these two cases exposed multiplies the moment you remove the human who was, at least, still reading the screen.

This is not an anomaly you can engineer away. In McKinsey's 2025 playbook Deploying agentic AI with safety and security, around 80% of organisations said they had already encountered risky behaviour from their AI agents — improper data exposure, access to systems without authorisation. That number should reframe how you think about agent risk: it is not a rare tail event that happens to unlucky teams. It is the normal condition of running these systems, and it is the reason this book's discipline exists. This chapter is the map of what can go wrong, organised so you can govern against it.

Old risks amplified, and new risks born

The first useful cut through the landscape is to separate the risks you already know how to manage from the ones that did not exist before software could act on its own. The distinction matters because it tells you where to extend an existing capability and where you have to build one you have never needed.

Old risks, amplified. Agents inherit every risk that any AI system carries, often made larger by scale and speed. Data privacy — an agent with access to personal records can expose them faster and to more people than a human ever could. Bias — an agent making or shaping decisions can apply a skewed judgement, at volume, to thousands of people. Security breaches, intellectual-property leakage, the generation of harmful or inappropriate content: all of these predate agents, and your organisation has at least the beginnings of a muscle for managing them. The work here is to extend those controls to a faster, more autonomous actor — not to invent something new. The world's standards bodies have catalogued this layer; the United States' National Institute of Standards and Technology, in its risk profile for generative AI, enumerates twelve such risk areas, from confabulation and data privacy to harmful bias and information security. You do not need to memorise the list. You need to know it exists, that it is authoritative, and that your teams should be working against it.

New risks, born of agency. Then there is a category that is new — risks that exist only because the software now decides and acts for itself. These do not map cleanly onto anything in your existing risk register, and the instincts that serve you for conventional software will mislead you here. They are the real subject of this chapter, and they are recent enough that the standards are only now catching up: NIST has had to publish a separate profile for agentic AI because, in its own framing, neither the base risk framework nor the generative-AI profile contemplated systems that acquire tools and execute on their own in live production. The security community did the same: in December 2025 the OWASP GenAI Security Project released a dedicated Top 10 for Agentic Applications, because agent risks "do not map cleanly to traditional application security." When the bodies that write the standards have to write new standards, that is your signal that the risk is structurally new — and that your auditors and regulators will be asking about it before long.

These new risks fall into four families a leader can hold in mind. They are not a technical taxonomy; they are a way of thinking. For each, I name what it is, why it is new, and a real example of it biting.

Family one: the agent is confidently wrong — and acts on it

The first family needs no attacker. It is the risk that the agent gets it wrong, presents the error with complete confidence, and then builds on it.

Three properties combine here. The first is confabulation — the technical term for hallucination: a model producing confidently stated content that is false. NIST treats this not as an accuracy problem but as a calibration problem: the danger is not only that the agent is sometimes wrong, but that it is wrong with the same fluent confidence with which it is right. There is no tremor in its voice. The Deloitte report did not flag the citations it had invented; the fabricated papers and the imaginary court quote sat on the page in the same measured, authoritative tone as the genuine references beside them. The second property is non-determinism: the same agent given the same situation may not behave the same way twice, which means that "we tested it and it worked" no longer carries the guarantee it carried for conventional software. You are not dealing with a system that will repeat its good behaviour. The third, and the one that turns a model's error into an agent's catastrophe, is compounding. An agent works in steps, each building on the last. A wrong conclusion at step two becomes the unquestioned premise for steps three through nine. The agent does not only make a mistake; it acts on the mistake, and then acts on the consequences of having acted. OWASP names the systemic version of this Cascading Failures — a single bad decision spreading across connected actions, tools, and workflows.

The governance consequence is direct: confidence and fluency are not grounds to trust an agent's output, and the higher the stakes or the longer the chain of actions, the more an early, invisible error compounds before anyone notices. This is the reason evaluation — the subject of three chapters in Part II — is not optional overhead but the load-bearing investment of the programme.

Family two: the agent is turned against you

The second family involves an adversary, and it rests on a single property of these systems: an agent cannot reliably tell the difference between the data it is supposed to process and instructions hidden inside that data. To a language model, the email it was asked to summarise, the document it was told to read, and the web page it was sent to check are all text — and text can carry commands. This is the mechanism behind the risk OWASP ranks first for agents, Agent Goal Hijack: an attacker plants instructions in content the agent will read, and the agent, unable to distinguish the planted instruction from its real task, pursues the attacker's goal instead of yours. There is no software vulnerability in the traditional sense. The attack vector is language itself, which is why conventional security controls do not catch it.

The most useful way for a leader to hold this family in mind is a model the engineer Simon Willison named in 2025: the lethal trifecta. An agent becomes exploitable when it has three capabilities at once — access to private or sensitive data, exposure to untrusted content, and the ability to communicate externally. Each is harmless alone. Together they are a complete attack: poisoned content the agent reads instructs it to take the private data it can see and send it where the attacker can collect it, using the agent's own legitimate, authorised tools. This is not hypothetical. Documented cases have shown this pattern exploited in common developer and productivity tools, with private repository contents and confidential document data exfiltrated through invisible instructions the user never saw.

Two things about the trifecta make it the single security idea worth carrying into every agent decision. First, it is architectural, not a matter of better prompts or training — you cannot instruct the problem away, because the model cannot reliably separate instruction from content. Willison's own conclusion is that the safest response is not a cleverer filter but to avoid combining the three capabilities in one agent — to use narrower agents that each hold at most two legs of the trifecta. Second, the filters that vendors sell to catch these attacks are imperfect by nature; even good ones stop only around ninety-five percent of attempts, and a security control that fails one time in twenty against a motivated attacker is not a control you build a high-stakes system on. The lesson for the person who funds the architecture is to ask, of every agent: does this single agent touch sensitive data, read things from outside, and have a way to send information out? If the answer to all three is yes, you are carrying the trifecta, and the right fix is structural separation, settled before you build.

This family has further members that the security frameworks detail — agents misusing the tools they are connected to, abusing inherited credentials and permissions, and risks arriving through the third-party tools and components an agent depends on. The unifying point for a leader is that the agent's connections to the rest of your world — the very tools that make it useful — are also the routes an attacker uses to turn it against you, which is why the discipline of granting it the narrowest possible access (Chapters 2 and 8) is a security control, not a tidiness preference.

Family three: the agent goes out of bounds

The third family is neither an honest mistake nor an external attack. It is the agent doing more than you intended, within the authority you yourself granted it — the risk the security community calls excessive agency, caused by giving an agent too much functionality, too many permissions, or too much autonomy.

SailPoint's 2025 survey of IT and security professionals should give any leader who thinks scope is theoretical pause: around 80% of organisations reported that their agents had already taken actions beyond their intended scope, including accessing things they should not have and exposing sensitive data. Note that this is not an attack statistic; it is agents, working as built, doing more than their designers meant — because the capability was there and the boundary was not. At the extreme, OWASP names Rogue Agents: agents that are compromised, misaligned, or have drifted over time into operating in ways no one designed or intended, continuing to act inside complex systems without anyone having decided they should.

The governance lesson is the one Chapter 2 introduced and this family makes urgent: an agent's risk is bounded by what it is able to do, not by what you told it to do. Instructions are not a control. The only reliable boundary is the capability you decline to grant in the first place — which is why bounding the action space upfront recurs throughout this book as the cheapest and most reliable safeguard you have.

Family four: the trust inversion

The fourth family is the most human, and the one leaders most often miss because it does not look like a technology risk. It has two faces.

The first is over-trust, which NIST files under "human-AI configuration" and which has a plainer name, automation bias: the well-documented human tendency to defer to a confident machine. The danger is sharpest where you have placed a human in the loop as a safeguard. You ask a person to approve the agent's recommended action as a check — but the agent is fluent, fast, and right most of the time, and so the human, hundreds of approvals in, begins to rubber-stamp. The oversight exists on the org chart and has evaporated in practice. OWASP gives this an adversarial edge as well, naming Human-Agent Trust Exploitation — agents whose persuasive, authoritative outputs lead people into unsafe approvals. The implication for design, developed in Chapter 9, is that human oversight only counts if it is real, and a checkpoint a tired human will wave through is not real.

The second face is the accountability gap, and the Deloitte report is its emblem. When an AI system produces work that an organisation acts on or publishes, who owns the outcome? The reflex is to point at the tool — the model hallucinated, the system erred — as though that transfers the responsibility somewhere. It does not. The firm whose name was on the report refunded the fee and issued the correction; the department that published it wore the fact that fabricated sources had gone into the public record under its banner. "The AI generated it" changed nothing about who answered for it. You own what your AI does in your name. This gets harder, not easier, as agents grow more capable: when an agent acts on behalf of a user, and perhaps delegates part of the task to another agent, the chain from "a human authorised this" to "this action happened" grows long and tangled — and if you cannot trace that chain, you cannot answer the only question that matters after something goes wrong, which is who decided this, and on whose authority. The ability to reconstruct that chain — raised in Chapter 1 as a precondition for production and detailed in Chapter 12 — is what closes the accountability gap. Without it, you have built something that acts in your name and cannot account for itself, which is a liability whatever its productivity.

And it compounds: when agents work together

Everything above describes the risks of a single agent. A growing number of organisations connect agents together — one agent delegating to another, agents exchanging messages, multi-agent systems pursuing a goal together. This is where the frontier of risk now sits, and the security frameworks have moved to meet it, naming insecure inter-agent communication and the cascading of a single failure across many connected agents as distinct, new risk classes that have no equivalent in single-system security.

The executive intuition to carry: connecting agents does not add risk, it multiplies it. A small error or a single compromised agent no longer stays contained; it propagates through the others, and the interaction can produce behaviour no one designed and no single agent's controls would catch. For most organisations, the prudent posture in this edition's timeframe is to treat multi-agent systems as a higher tier of risk, to approach only after single-agent discipline is in hand — and to be sceptical of any proposal that reaches for a committee of agents before it has shown it can govern one.

The map, assembled

Step back and the new risk surface resolves into four things a leader can hold and govern against. The agent can be confidently wrong and act on it — so you build evaluation. It can be turned against you through what it reads — so you break the lethal trifecta by design and grant the narrowest access. It can go out of bounds within its own authority — so you bound its capabilities rather than trusting its instructions. And the trust inversion can hollow out your human safeguards and obscure who is accountable — so you make oversight real and traceability total. Multi-agent systems multiply all four.

None of these is a defect that a future model removes. They are the permanent price of capability — the same trade-off Chapter 2 named, now itemised. If you cannot eliminate these risks, you must manage them: bound, evaluate, control, watch, and account for them. That management, stage by stage, is Part II, and it begins in the next chapter with the first and cheapest risk decision you will make: not how to control an agent, but where to point it first, so that when the risks in this chapter arrive — and they will — they arrive somewhere you can survive them.


A worked decision

Your team proposes an accounts-payable agent. It will read incoming supplier invoices as they arrive, match each against your finance system and the relevant purchase order, and schedule the payment — clearing the routine invoices without a human. The business case is strong and the demo is excellent.

Run it through the one security model from this chapter. Does a single agent hold all three legs of the lethal trifecta? It reads incoming invoices — that is untrusted content, the one input an attacker can control, since anyone can send your organisation an invoice. It has finance-system access — that is private data, and the means to move money. It can schedule payments — that is external action. All three. This is not a system with a manageable security concern; it is the textbook configuration for fraud through the agent's own legitimate tools — a planted invoice carrying instructions that redirect a payment — and no input filter your team adds will close it reliably.

The decision is not to kill the project. It is to refuse the architecture as proposed and require a redesign that breaks the trifecta — for example, separating the agent that ingests untrusted incoming invoices from the one that can touch the finance system and release payment, with a controlled, inspected boundary between them, so no single agent can be instructed by a stranger's document to pay an attacker. That redesign costs something now. The fraud it prevents — and the reckoning that would follow — costs far more later. The job this chapter prepares you for is to see the trifecta in the proposal before it ships, and to spend the redesign rather than the incident.


Key takeaways

  • McKinsey found roughly four in five organisations have already encountered risky or unexpected behaviour from their AI agents. Agent risk is the normal condition of running these systems, not a rare misfortune — and the standards bodies (NIST, OWASP) have now published dedicated agentic risk frameworks, which means your auditors and regulators will follow.
  • Separate old risks from new ones. Privacy, bias, security, and IP risks are amplified but familiar — extend your existing controls. The risks born of agency are new and need new discipline.
  • The new risks form four governable families: the agent is confidently wrong and acts on it; it is turned against you through what it reads; it goes out of bounds within its own authority; and the trust inversion hollows out human oversight and obscures accountability. Connecting agents multiplies all four.
  • The lethal trifecta is the one security model to carry everywhere: danger concentrates when one agent has access to sensitive data, exposure to untrusted content, and the ability to communicate externally. The fix is architectural separation, settled upfront — not a filter, which is never perfect.
  • None of these risks is a bug a better model will fix. They are the permanent price of capability, to manage rather than wait out — and an agent's risk is bounded by what it can do, never by what you told it to do.

Questions to ask your organisation

  • For each agent we run or plan to run: does one agent access sensitive data, read content from outside our control, and have a way to send information out? If yes for all three, what is our plan to break that trifecta — and is it a plan, or a filter we are hoping works?
  • Where we have put a human "in the loop" as a safeguard, have we checked that the human is scrutinising — or are they approving at a volume and speed that guarantees rubber-stamping? How would we know?
  • If one of our agents made a confident, wrong decision that then drove several follow-on actions, how far would the error travel before anything caught it? What stops a single mistake from cascading?
  • When an agent acts in our name and something goes wrong, can we reconstruct who authorised the capability and trace the action back to a decision? If not, we have an open accountability gap, regardless of how well the agent is performing.
  • Is someone asking us to connect agents together — to let them call or delegate to one another — before we have shown we can govern a single agent safely? If so, why?

Part I has given you the picture: why agents stall in the sandbox (Chapter 1), what an agent is (Chapter 2), and what can go wrong once it acts (Chapter 3). Part II turns understanding into method. It opens with Chapter 4 — "Scoping and Use-Case Selection" — and the first real decision of the readiness method: choosing where to point an agent first, governed by the principle that determines how much risk you take on: reversibility.

Part II — The Readiness Method

Part I gave you the picture: why agents stall, what they are, and what can go wrong once they act. Part II is the method — the staged discipline that turns a capable agent in a sandbox into a governed system in production. It is the part you fund, and it begins not with building, but with choosing.


Chapter 4: Scoping and Use-Case Selection

What this chapter equips you to decide - Which agent to put into production first — and why that is a risk decision, not a value decision, made before anyone writes a line of code. - How to use reversibility as the single variable that tells you how much risk a use case carries. - The criteria that separate a sound first use case from an impressive but dangerous one — and how to recognise the flagship temptation when your teams bring it to you. - Why your real task is to choose a sequence of use cases, not one, and how the public sector's obligations should shape that sequence.


The first control is the use case you choose

The method does not begin where most organisations think — with building the agent, selecting a model, or standing up a platform. It begins with a choice: which problem do we point an agent at first — and that choice, made early and often made badly, determines more of your risk than any technical decision that follows.

Organisations make it badly because they make it on the wrong axis. The natural instinct, especially under pressure to show progress, is to select the first use case by its value and its visibility: the transformational application, the one that will impress the board, the one a minister can announce. This feels like ambition. It is the most common way organisations make their riskiest possible bet with their least developed capability — choosing the highest-stakes, least-reversible, most-scrutinised application at the moment when they have none of the evaluation, controls, or operational experience that would make it safe.

The reframe that governs this chapter is simple and, once seen, hard to unsee: the use case you choose is itself a control — the first and cheapest one you have. Every chapter after this one is about building machinery to manage an agent's risk: evaluation, scoped permissions, oversight, monitoring. This chapter is about avoiding risk you don't need to take, for free, by being deliberate about where you point the agent first. A well-chosen first use case can make most of the dangers in Chapter 3 survivable before you have built a single safeguard. A badly chosen one can defeat every safeguard you later build.

Reversibility: the variable that tells you your real risk

If you carry one idea out of this chapter, carry this one: the property that most determines how much risk a use case holds is not how often the agent will be wrong, but what happens when it is. That property is reversibility, and it is the cheapest safety feature you will ever buy.

An action is reversible when, if the agent gets it wrong, you can undo it and make the affected party whole, before any lasting harm is done. An agent that drafts a document for a person to review has taken a reversible action: you delete a wrong draft and forget it. An agent that flags a case for attention, or retrieves and summarises information, is recoverable. An agent that sends a communication to a citizen has done something much harder to take back — the message has left the building. And an agent that issues a payment, imposes a penalty, alters an official record, or makes a determination about a person's entitlement has, in many cases, done something that cannot be undone.

Notice what this reframes. Two agents can use the identical model, with the identical accuracy, and carry different risk — because one drafts and the other decides, one proposes and the other acts. The model's quality sets how often you will face an error; reversibility sets what an error costs. You can improve the first only over time, and never to perfection. You can choose the second, today, for nothing, by selecting a use case where being wrong is recoverable. This is why reversibility, not model capability, is the right organising variable for your first deployment — and why "how good is the model" is the wrong opening question.

For the public sector, this variable carries extra weight, and it cuts in a specific direction. A government agent's errors are rarely confined to a commercial cost. An action that touches a citizen's benefits, rights, legal status, or standing is among the least reversible things an organisation can do — and beyond the individual harm sits a second, slower form of irreversibility: public trust. A private firm that mishandles a customer can refund them; a public agency that wrongly penalises a citizen, or is seen to have outsourced a consequential judgement to a machine, spends down a reserve of institutional trust that no refund replenishes and that took decades to build. The lesson is not that government should move slowly out of timidity. It is that the public sector has more reason than the private sector, not less, to begin where errors are reversible — because its irreversible errors are the most costly kind there are.

What a national programme did: assess risk before building

Singapore's own government offers the clearest worked example, and it was an ambitious programme, not a timid one. When GovTech, the Cyber Security Agency, the Infocomm Media Development Authority and Google ran a government AI Agents Sandbox in 2025, they did not reach for the three most transformational applications they could imagine. They scored candidate use cases on both risk and benefit, and the risk factors they used read like a summary of this chapter: how much the agent decides on its own, how far its actions reach, whether it touches sensitive data, the irreversibility of its actions, and whether an affected person has any route to appeal or redress. On that basis they ran three cases spanning the range — automated quality-assurance testing of government websites (low risk, high benefit); automated safety-testing of chatbots before deployment (low risk, low benefit); and the one citizen-facing case, helping people through social-assistance applications (high risk, high benefit) — and left aside the quadrant that is high risk and low benefit, to defer until the technology matured.

What they did with that high-risk case is the instructive part. They did not lead with it unprotected, and they did not abandon it. They ran it on mock applicant data in a simulated environment, with a human kept in the loop where accuracy and fairness mattered most — the heaviest controls of the three, wrapped around the case with the least room for error. An ambitious national programme assessed reversibility and impact before it built anything, advanced the recoverable cases, and put its consequential one behind the strongest guardrails rather than out in front. If a government with a dedicated sandbox worked this way, the burden of proof sits on any team proposing to lead with its least reversible application. When Singapore codified the approach, its agentic governance framework opened with the same move: assess and bound the risk before building — weighing autonomy, access to sensitive systems, and reversibility — and limit what the agent is allowed to do and reach.

Reversibility is necessary, not sufficient: the full selection test

Reversibility tells you whether you can survive being wrong. It does not, on its own, make a good first use case. A reversible task that no one cares about teaches you nothing and earns you no support; a reversible task you cannot measure cannot earn trust into production, however safe it is. A sound first use case meets a small set of criteria together, and a leader can hold them as a checklist to put to any proposal:

  • Reversible. When the agent errs, you can undo the action and make the affected party whole, and a redress path exists. This is the floor; without it, nothing else matters for a first deployment.
  • Bounded. The goal is narrow and well-defined — "draft replies to this category of enquiry," not "handle correspondence." The narrower the goal, the smaller the space of things that can go wrong, which is the cheapest risk reduction available after reversibility itself.
  • Measurable. You can tell whether the agent is doing the job well, because a ground truth or a clear standard exists. This is not optional: the next three chapters are about evaluation, and a use case you cannot measure is one you can never earn the right to trust. If you cannot say how you would know it is working, it is the wrong first use case.
  • Valuable enough to matter. It relieves real toil or unlocks real leverage — enough to justify the governance investment and to win the organisational support that carries the programme forward. A vanity demonstration fails this test even if it is safe.
  • Tolerant of error. The surrounding process has slack to absorb the mistakes that will occur — a human checkpoint, a review step, time to catch problems before they compound.

There is a failure mode at the safe end of this test, and cautious organisations fall into it. A first use case can be so trivial — so narrow, so low-stakes, so padded with human review — that it proves nothing and transfers no capability to the harder cases ahead. The goal is not the safest conceivable task; it is the most valuable task that is still survivable. "Real, but recoverable" is the standard. A use case that is real enough to be a true test and reversible enough that failing the test costs you little is the ideal place to begin.

You are choosing a sequence, not a use case

There is a deeper reframe here. The question "which use case first?" is the wrong question, or an incomplete one. You are not choosing a use case; you are choosing the first rung of a ladder.

The first agent's most valuable output is not the work it does; it is the capability it builds in your organisation — the evaluation harness you stand up, the controls your teams learn to design, the operational muscle of running an autonomous system, and the warranted trust that lets you, and your board, and eventually your public, believe the next one will be safe. A well-chosen first use case is reversible enough to survive, valuable enough to justify, and capability-building enough to earn the next rung. The flagship application — the one the instinct wanted to start with — is not abandoned; you place it where it belongs: near the top of the ladder, reached on the strength of everything the lower rungs taught you, and behind the heaviest controls in the book.

This is what turns selection from a one-off gamble into a strategy. Asking "where do we start?" invites a single bet. Asking "what is our climb?" invites a sequence in which each step is survivable and each step makes the next one safer. It is also the honest answer to give a board impatient for the flagship: not "no," but "that one is the summit, and here is the route that gets us there without falling off."

Once you have chosen the rung, one act of discipline remains before building begins: scope the goal as narrowly as it can be while still delivering the value. This is the bounding that Chapter 3 named as the only reliable boundary — an agent's risk is set by what it can do, not what you told it to do, so every capability, tool, and data source you decline to grant is a risk you never carry. Selection chooses a survivable place to stand; bounding shrinks the ground the agent can move across once it is standing there. Together they are the risk you avoid before you spend a cent managing the risk you keep.

You have chosen a use case that is reversible, bounded, valuable — and, above all, measurable. That last criterion was not an accident. It is the hinge between selecting an agent and trusting one, because trust in an autonomous system is not granted on the strength of a good demo; it is earned through evaluation. How you build that evaluation — and why evaluating an agent differs from testing the software you already know — is the next chapter.


A worked decision

A statutory board is overwhelmed by citizen correspondence: tens of thousands of enquiries a year about eligibility, applications, and the status of cases, handled by officers who are months behind. Leadership wants an agent to help, and two proposals arrive.

The first is the flagship: an agent that reads an application and the supporting documents and decides the straightforward cases itself — approving or rejecting eligibility — so officers handle only the complex remainder. The projected savings are large and the demonstration is compelling.

The second is humbler: an agent that drafts a reply to each incoming enquiry — pulling the relevant policy and the case details into a proposed response — which an officer reviews, corrects, and sends. The agent makes no determination; no citizen receives anything a human did not approve.

Run both through the test. The first fails at the floor: a wrong eligibility determination is close to irreversible — it affects a citizen's entitlement, it carries statutory and fairness obligations, and if it looks machine-made it spends public trust no one can refund. It is the summit, not the start. The second passes: it is reversible (you correct a bad draft before anyone sees it), bounded (one category of task), measurable (officers' edits are a built-in quality signal), valuable (it attacks the real backlog), and error-tolerant (the officer is the checkpoint). It also builds the capability — the evaluation data, the controls, the operational experience — the board would need before it could consider the flagship.

The decision is not "drafting agent yes, adjudication agent no." It is to choose the drafting agent as the first rung, and to place the adjudication agent on the roadmap as a later one — reached only after the lower rungs have produced the evidence and the controls to make a citizen-facing determination defensible. That is how you give the board its flagship: by naming the climb, not by leaping.


Key takeaways

  • The first decision of the method is selection, and it is a risk decision you make before any building. The use case you choose is itself your first and cheapest control.
  • Choosing the first use case by value and visibility — the flagship instinct — means making your riskiest bet with your least-developed capability. The flagship is where you arrive, not where you start.
  • Reversibility is the variable that determines a use case's real risk. The model's quality sets how often errors occur; reversibility sets what they cost. You can choose reversibility today, for free; you can improve the model only over time, and never to perfection.
  • The public sector has more reason to begin with reversible use cases, not less: its irreversible errors touch citizens' rights and spend public trust, the least refundable cost there is. Singapore's government sandbox scored its use cases on risk and benefit — with reversibility and redress among the risk factors — and ran its one high-risk, citizen-facing case only on mock data in a simulated environment, behind the heaviest controls of the three.
  • A sound first use case is reversible, bounded, measurable, valuable, and error-tolerant — and "real but recoverable," never so trivial it teaches nothing. And you are choosing a sequence: a ladder whose first rung earns the capability and trust to climb toward the flagship.

Questions to ask your organisation

  • For our proposed first agent: if it acts in error, can we undo the action and make the affected party whole? If the honest answer is no, why is this our first use case rather than a later one?
  • Are we choosing this use case because it is the safest valuable place to start, or because it is the most impressive thing to show? Who is the audience we are designing for — the citizen, or the announcement?
  • Can we state how we would know this agent is working well? If we cannot measure it, we cannot trust it into production — so is it the right place to begin?
  • Have we scoped the agent's goal as narrowly as it can be while still delivering value, and granted it the fewest tools and the least data the job requires? What have we declined to give it?
  • Have we drawn the ladder — named the sequence of use cases from this first rung up to the flagship — so that each step is survivable and each one makes the next safer? Or are we treating this as a single bet?

Chapter 5 — "Evaluation I: Foundations" — opens the most important and most skipped capability in the method. It explains why evaluating an agent differs from testing conventional software or even a single model, why a confident demo is not evidence, and what it means to measure whether an agent that reasons and acts is good enough to trust. It is the hinge on which the book turns.

Chapter 5: Evaluation I — Foundations

What this chapter equips you to decide - Why a confident demonstration is not evidence that an agent is fit for production — and why approving on the strength of a demo is the most common expensive mistake leaders make. - Why evaluating an agent is a different discipline from testing the software you already run, or even from testing a single AI model. - What it means to measure whether an agent is "good enough to trust," and why grading only its final answer hides its most dangerous failures. - Why evaluation is not a technical chore to delegate, but the instrument through which you govern an autonomous system.


The hinge of the book

Everything so far has been preparation. You understand why agents stall, what they are, what can go wrong, and how to choose a survivable first use case. Now comes the capability on which the transition from sandbox to production turns, and it is the one organisations most often underbuild: evaluation.

The claim is this: Trust in an autonomous system is not granted on the strength of a demonstration. It is earned through evaluation. The gap that Chapter 1 described — between an agent that works and an agent you can rely on — you close here or not at all. Every other safeguard in this book depends on it: you cannot deploy in stages without a way to measure each stage, you cannot monitor in production without knowing what good looks like, you cannot grant more autonomy without evidence that the agent has earned it. Evaluation is the hinge, and a programme that skimps on it has not built a cheaper version of the discipline; it has built no discipline at all.

The reason this needs saying to leaders is that the economics of attention run backwards. The demo gets the applause and the follow-on funding. The pilot gets the budget and the steering committee. Evaluation — unglamorous, invisible, producing no impressive output of its own — gets whatever is left, if anything. This is the wrong way round. When you are shipping autonomy, the evaluation harness is not overhead around the product; it is the product, because it is the only thing that converts a clever capability into something you can stand behind. The single most useful instinct a leader can develop about agentic AI is to be unmoved by demonstrations and to ask, every time: how do you know?

Why a demo is not evidence

Begin by disarming the demo, because it is the thing most likely to lead a capable executive astray.

A demonstration is a curated success. Someone chose the inputs, ran the agent, and showed you a case where it worked — and set aside the cases where it did not. This is not dishonesty; it is the nature of demonstrations. But it means a demo can tell you only one thing: that the agent can succeed on at least some inputs. It cannot tell you how often it succeeds across the messy distribution of real inputs. It cannot tell you how it fails when it fails: in quiet, recoverable ways, or confident, catastrophic ones. And it cannot tell you what it does on the inputs nobody thought to try, which in production will arrive by the thousand. A demo is an anecdote. Production decisions require statistics, and an anecdote is not a small statistic; it is a different kind of thing.

This is the same trap Chapter 1 named in a different guise. The demo proves capability, and capability was never the bottleneck. The bottleneck is warranted trust — knowing, with evidence, that the agent will hold up across the full range of what it will face, and that when it misbehaves, it will do so in ways you can live with. No demonstration, however impressive, supplies that. Only evaluation does.

Why evaluating an agent is a different discipline

Leaders reach for evaluation with two existing mental models, and both mislead, because an agent is unlike both of the things those models were built for.

The first model is software testing. For decades we have assured software by specifying what it should do and checking that it does so: the same input produces the same output, every time, and a test that passes today passes tomorrow. None of that holds for an agent. It is non-deterministic — the same situation can produce different behaviour on different runs — so a single success, or even a hundred, does not guarantee the next one. And you cannot specify its behaviour in full in advance, because it decides its own steps; there is no finite list of code paths to check. "We tested it and it passed" carried a guarantee for conventional software. For an agent it carries only a probability, and an unknown one until you have measured it.

The second model is model evaluation — testing the underlying AI. This is closer, but it asks too small a question. Model evaluation asks: given this input, was the output good? One input, one output, judged. But an agent does not produce one output; it takes a sequence of actions — consulting this source, calling that tool, deciding to take another step or to stop — and only at the end produces a result. Judging only that final result is the error at the centre of most failed agent programmes, and it fails in two directions at once. It passes agents that reached a right answer by luck or by wrong reasoning, which will not hold when conditions shift. And it gives you no way to find where an agent went wrong when it did, because all you measured was the destination, not the route.

The discipline that fits an agent, then, is not testing-the-output. It is evaluating the behaviour — the whole sequence of reasoning and action — across a representative range of real situations, often enough to speak of how it behaves in general rather than whether it behaved once.

The three layers: grade the working, not just the answer

The industry has converged on evaluating an agent at three layers. An analogy captures why a production system needs all three. Grading an agent only on its final answer is like grading a mathematics exam on the last line of the working. You will pass the student who copied the right number and fail to notice the one who reached it through reasoning that will collapse on the next, harder question.

The three layers are:

  • Final-response evaluation asks whether the end result was correct. This is the outcome — the answer delivered, the action taken. It is necessary, and it is never sufficient on its own.
  • Trajectory evaluation asks whether the agent took a sensible path to that result — whether it consulted the right sources, used the right tools in a reasonable order, and avoided wasteful or dangerous detours. This is the layer most organisations omit, and it is where most agent failures live: an agent can produce an acceptable-looking answer while having reasoned its way there through a route that is wrong, unsafe, or about to fail.
  • Single-step evaluation isolates one decision in the sequence — did the agent choose the right tool, with the right inputs, at this point — so that when something breaks, you can locate where, rather than knowing only that the overall result was poor.

The executive lesson sits in the middle layer. An agent that is right for the wrong reasons is not a success with a cosmetic flaw; it is a failure that has not happened yet. It will hold on the cases that resemble the demo and give way on the ones that do not, and because its answers look right, the giving-way will be invisible until it is expensive. This is also where the trust inversion from Chapter 3 does its quiet damage: an overloaded officer reviewing a fluent, plausible, badly reasoned output will approve it, because nothing on its surface says "wrong." Only trajectory evaluation — looking at how it reached the answer — catches this class of failure before it ships. A programme that measures only final answers has chosen, without realising it, not to see its most dangerous errors.

What "good enough to trust" means

Notice that the question is not "is the agent accurate?" Accuracy is a single number, and a single number is the wrong shape for the decision you have to make. The real question is: is this agent good enough, on the dimensions that matter for this use case, measured against a defined bar, that we can rely on it? That sentence folds in three things, and each is a decision a leader owns.

The first is which dimensions matter. For one use case the priority is factual correctness; for another it is consistency, or freedom from bias across different groups of citizens, or never taking a particular kind of irreversible action without escalating. You decide what you are measuring before you measure it, because measuring the wrong thing well is its own failure.

The second is how it fails, not just how often. Two agents that are each wrong five percent of the time can carry different risk, depending on whether their errors are small and self-correcting or rare and catastrophic. An agent whose failures are safe — caught by a checkpoint, reversible, never affecting a consequential decision — can be trusted at a level of raw accuracy that would be unacceptable in an agent whose failures are silent and severe. Evaluating the shape of failure is as important as counting it, and it connects to the reversibility you chose for in Chapter 4.

The third is the bar — good enough compared to what? The instinct is to demand perfection, but perfection is the wrong and often paralysing standard. The right bar is usually the alternative you have: the existing human process the agent would augment or replace. An agent that is more accurate, more consistent, and faster than the overstretched manual process it relieves can be an improvement even while imperfect — provided its failures are survivable. For the public sector this framing matters more, because the existing process is rarely flawless either, and "is the agent better than what citizens experience today, and does it fail in ways we can defend?" is a more honest and more useful question than "is the agent perfect?" The discipline of measuring against the human baseline, rather than against an impossible ideal, is what makes evaluation a tool for decision rather than an excuse for paralysis. (It is also what makes the staged, side-by-side deployment of Chapter 11 possible.)

Evaluation is how you govern, not a chore you delegate

It is tempting to file evaluation under "technical detail" — something the engineering team does, reports a number from, and moves on. That instinct is a governance failure, because evaluation is not a step within the work; it is the instrument through which you exercise control over the work.

Consider what the evaluation capability gives the person accountable for an agent. It is the evidence base for every go/no-go decision you make — the difference between approving a deployment because the demo impressed you and approving it because the measurements cleared a bar you set. It is your audit trail of quality over time, the thing you point to when a board member, an auditor, or a parliamentary committee asks how you know the system works and works fairly. It is your early-warning system, the only mechanism that will tell you the agent's behaviour is degrading before a citizen does. And it is the lever by which you grant more autonomy, because each step up the autonomy dial from Chapter 2 should be earned with evidence from the layer below. Strip evaluation away and you have not lost a quality check; you have lost your ability to govern the system at all. You are left to govern by hope, which is to say, not to govern.

Carry this reframe forward. You cannot govern what you cannot measure. Chapter 2 said you cannot govern what you cannot picture; evaluation is how the picture acquires numbers, and numbers are what let you decide, defend, and intervene. The next chapter turns this conviction into machinery — how to build an evaluation harness, how to score open-ended agent outputs at scale, and how to keep the apparatus honest. But the machinery only matters if the conviction lands first: that the unglamorous, invisible, demo-less work of evaluation is not the overhead of shipping an agent. When you are shipping autonomy, it is the main event.


A worked decision

Recall the statutory board's drafting agent from the previous chapter — the one that proposes replies to citizen enquiries for an officer to review and send. Your team brings it back with an impressive demonstration and a request: let us scale it from the pilot team to the division. The demo is good; every drafted reply on screen is accurate and well-phrased.

The instinct is to approve. The discipline is to refuse the demo as evidence and ask the question: how do you know, across the real range of enquiries, how often it is right, and how it fails when it is wrong? Pressed, the team produces a number: on a sample, reviewers judged ninety percent of drafts "correct." That sounds strong — until you ask what they measured. They measured the final answer: did the reply look right? No one had looked at the trajectory — at which policy provisions the agent retrieved and reasoned from to get there.

You commission that, and it changes the picture. A meaningful share of the "correct-looking" drafts had arrived at the right-sounding answer by citing the wrong provision — a plausible reply, reasoned from the wrong rule, that happened to land near the right place for the common cases and would diverge for the unusual ones. An officer skimming a fluent draft would approve it; the error would surface only when an atypical case produced a confidently wrong reply, grounded in the wrong rule, to a citizen. The final-answer metric had hidden the board's most dangerous failure mode.

The decision is not to abandon the agent. It is to refuse to scale on a demo or on outcome-only numbers, and to require evaluation at the trajectory layer — measured against the standard of what officers themselves achieve — as the condition for expansion. That is the difference between governing this rollout by impression and governing it by evidence, and it is a decision only the person accountable for the citizens on the other end can make.


Key takeaways

  • Trust in an autonomous system is earned through evaluation, not granted on a demonstration. Evaluation is the hinge on which the transition from sandbox to production turns — and the capability organisations most often underbuild.
  • A demo is a curated anecdote: it proves the agent can succeed, never how often it will, how it fails, or what it does on the inputs no one chose. Production decisions need statistics, and "how do you know?" is the leader's essential question.
  • Agent evaluation is a different discipline from both software testing (agents are non-deterministic and you cannot specify them in full) and single-model evaluation (an agent takes a sequence of actions, not one output). You must evaluate behaviour, not only output.
  • Evaluate at three layers — final response, trajectory, and single step. Grading only the final answer is grading the exam on its last line: it passes agents that are right for the wrong reasons, whose hidden failures are the most dangerous ones you carry.
  • "Good enough" is not a single accuracy number. It means good enough on the dimensions that matter, failing in ways you can survive, measured against a defined bar — usually the existing human process, not an impossible ideal.
  • Evaluation is not a technical chore to delegate. It is your evidence base for go/no-go decisions, your audit trail, your early-warning system, and the lever for granting autonomy. You cannot govern what you cannot measure.

Questions to ask your organisation

  • When someone shows us that an agent "works," is it a demonstration or an evaluation? If it is a demonstration, we have learned what the agent can do, not what it will do — so what do we know about how often it succeeds and how it fails?
  • Are we measuring only the agent's final answers, or also the path it took to reach them? If only the answers, what dangerous "right for the wrong reasons" failures might we be unable to see?
  • For this use case, what are we measuring — accuracy, consistency, fairness across groups, avoidance of specific harmful actions — and did we decide those dimensions before building the agent?
  • What is our bar, and what are we comparing it against? Are we holding the agent to an impossible standard of perfection, or to the real performance of the process it would replace — and do we know how that process performs?
  • Could we, today, produce evidence to an auditor or a minister of how well this agent works and how it fails? If not, we are governing it by hope — so what would it take to govern it by evidence instead?

Chapter 6 — "Evaluation II: Building the Harness" — turns this conviction into machinery. It covers how to assemble the test cases that represent reality, how to score open-ended outputs at scale without a human reading every one (and how to keep that scoring honest), and how to turn evaluation into a gate that makes change to an agent safe rather than a blind bet. It is the how to this chapter's why.

Chapter 6: Evaluation II — Building the Harness

What this chapter equips you to decide - How to tell real evaluation from evaluation theatre — reassuring numbers that rest on nothing — when your teams report that an agent is performing well. - Why the unglamorous work of assembling realistic test cases is the load-bearing investment, and why funding it feels like "not building the agent" but is the opposite. - Why an automated judge no one has checked against your own experts is worse than no measurement — and why insisting on that check is a decision only you can force. - Why no change to an agent should reach the public without clearing a defined bar, and how to make that a rule rather than a hope.


From conviction to machinery — and the danger to watch for

The last chapter argued that trust in an agent is earned through evaluation. This chapter builds the machinery that earns it — and warns of a danger that appears the moment you start asking for evidence.

The danger is not that teams refuse to evaluate. Ask for evidence and you will get it: a dashboard, a pass rate, a confident number. The danger is that the number rests on nothing.

An evaluation harness has three load-bearing parts, and each can be built well or faked. The fakes produce the same reassuring figure:

  • A test set that is only the demo cases.
  • A machine grading the machine, with no one checking the grader.
  • An evaluation that runs but never stops a bad change from shipping.

Each of these produces numbers. None of them produces trust. This is evaluation theatre — the performance of measurement without the substance. It is more dangerous than no evaluation. No evaluation leaves you uncertain; theatre leaves you confidently wrong.

So your task in this chapter is not to build the harness. It is to know its three parts well enough to tell whether each is real — and to insist on the rigour that teams drop under deadline pressure.

Pillar one: the dataset — your definition of "good," written down

Evaluation rests on a set of test cases: situations the agent might face, each paired with what a good outcome looks like. This sounds mundane. It is the most valuable and most skipped asset in the method, because it is your organisation's definition of "good," written down in a form a machine can check.

Two things separate a real dataset from a theatrical one.

The first is that it must look like reality, not like the demo. The temptation is to fill the test set with cases the agent handles well — the clean, common, central ones. They are easy to find and they produce a flattering score. But the cases that matter most are the ones where the agent fails without anyone catching it: the unusual, the ambiguous, the adversarial. For the public sector, add two more — the cases that test fairness across different groups of citizens, and the statutory exceptions the common path never touches. A dataset that leaves these out is not a smaller good dataset. It is a different thing that happens to produce a number. The discipline is to hunt down the hard cases and put them in, so the score reflects the reality the agent will meet rather than the reality you would prefer.

The second is that, for an agent, each case should capture not only the right answer but — for the cases that matter — the right path: which sources it should consult, which tools it should use. This is what makes the three-layer evaluation of the last chapter possible. Without expected paths in your dataset, you can only grade final answers, and you have built in the blindness to "right for the wrong reasons" from the start.

Now the part a leader must own, because it decides whether the dataset gets built. This work is slow, expensive, and looks like a distraction. Assembling realistic cases takes the time of your most knowledgeable people — the very people everyone wants building the agent instead. Measure progress only by visible features and you will starve it, and the programme will run on a flattering, hollow dataset. Treat the dataset as the asset it is, and you fund it on purpose.

For the public sector there is a second payoff. A good dataset — one that encodes your fairness and statutory requirements — is also the evidence you hand an auditor when they ask how you know the system performs, and performs equitably. It does double duty: your conscience and your defence.

Pillar two: scoring at scale — the judge you have checked

A realistic dataset creates a problem. With thousands of test cases, many of them open-ended — a drafted reply, a summary, a recommendation, where several phrasings are fine and no single answer matches — who reads them all? A human cannot, at the scale and frequency real evaluation needs.

The common answer is to use a capable AI model as a judge. It receives the input, the agent's output, your criteria, and any ground truth, and returns a score with its reasoning. This is what lets evaluation run at the speed of production rather than the speed of a reviewer.

It is also where the most seductive theatre lives. An automated judge is a measuring instrument. An instrument you have never checked against a known reference does not give you a measurement — it gives you a confident number of unknown relationship to the truth. A thermometer you have never calibrated still shows a temperature.

So the non-negotiable step is calibration. Take a sample. Have your experts judge those cases. Compare the machine's verdicts to theirs. Measure how often they agree. Feed the disagreements back to sharpen the criteria, and then keep tracking that agreement over time. A judge that agrees with your experts is a real instrument for scaling their judgement. A judge no one has checked is a second unverified model with an opinion — and building your evidence on it means you have automated the production of false confidence.

This dissolves a common worry. Using a machine to judge a machine sounds like taking humans out of the loop. It is the opposite. Done right, it scales human judgement rather than replacing it: the experts set and calibrate the standard, and the machine applies it at a volume no human could match. But "calibrate" carries that sentence, and it is the first step to vanish under deadline pressure. Your job is to make it non-negotiable. No judge earns trust until someone can tell you how well it agrees with your own people, and how they know.

Pillar three: the gate — making change safe instead of a blind bet

An agent is never finished. You will change its instructions, swap in a new model, add a tool, adjust its access. Every change can break something that worked. And because the agent is non-deterministic, you cannot read the damage off the code. You will not know unless you measure.

An organisation with no way to catch this ends up in one of two bad places. Either it is too afraid to touch a working agent, so the agent rots as the world moves on. Or it changes the agent without care, so it breaks in production in ways no one saw coming.

The gate fixes this. Before any change reaches the public, you re-run the full evaluation set. The change ships only if nothing has regressed below your bar. This turns change from a blind bet into an evidenced decision — the agent version of an old engineering rule: no code ships without passing its tests. It is also where the bar from the last chapter becomes a number the gate enforces rather than a principle you hope holds.

This decision is yours, because it is a policy, not a technique. A gate the team can wave a change past "just this once," under pressure, on a Friday, is not a gate. Making "clears the evaluation gate" a true condition of release — like a financial control or a safety sign-off the deadline cannot override — only holds if the person with authority insists it holds.

For a public body, the payoff is accountability you can show. The gate is an auditable control: evidence that no change reached a citizen without first clearing a defined, recorded standard. That is a sentence you want to be able to say to an oversight committee, and the gate is what makes it true.

The harness is an asset that compounds

Step back and look at what the three pillars produce together. A dataset that says what "good" means in your context. A calibrated instrument that applies that standard at scale. A gate that lets the standard govern every change.

This is not a test you pass once on the way to launch. It is a permanent capability — and, built well, an appreciating one. The dataset grows richer. The judge grows better calibrated. The gate grows more trusted. The cost of changing and improving your agent falls over time instead of rising. An organisation with this harness can move faster than one without it, not slower, because it can change its agent without fear.

But one thing is still missing, and it is the richest source of test cases you will ever have. Everything here concerns the cases you thought to assemble — your best guess at reality. In production, the agent meets reality itself: thousands of real situations, including ones no one imagined. The most valuable evaluation asset you own is the record of what your agent did, and the loop that turns yesterday's failures into tomorrow's test cases. That loop is the next chapter.


A worked decision

The statutory board's drafting agent returns once more. Having taken the trajectory lesson of the last chapter to heart, the team has built an evaluation harness and reports a strong result: a 92% pass rate — a freshly built harness number, not last chapter's reviewer-judged ninety percent. The recommendation is to scale.

Ninety-two percent of what, you ask — and you walk the three pillars.

First, the dataset. What is in it? A few hundred cases drawn from the pilot's smooth examples: the clean, common enquiries. The wrong-provision failures you uncovered last chapter, the atypical and statutory-edge cases, the tests of consistency across different applicants — absent. The 92% measures the agent on the cases it was always going to pass.

Second, the judge. Who decided each draft was "correct"? An automated model, scoring against a brief rubric. Has anyone checked that its verdicts match what your experienced officers would say? No one has. The grader is itself ungraded.

Third, the gate. When the team updated the agent's instructions last month, did a failing evaluation stop that change? It did not. The team runs and reports evaluations; releases proceed regardless.

All three pillars are hollow, so the 92% is not evidence. It is theatre, and acting on it would mean governing by a number that means nothing. The decision is not to punish the team, who have done what under-resourced teams do. It is to refuse the rollout until the harness is real: a dataset rebuilt to include the hard, edge, and fairness cases; a judge calibrated against your own officers, with a tracked agreement rate; and clearing the gate made a condition of every release. Only then does a pass rate become something you can stake a citizen's experience — and your own accountability — on.


Key takeaways

  • The risk is not that teams refuse to evaluate. It is evaluation theatre — reassuring numbers resting on a hollow harness. Theatre is worse than no evaluation, because it replaces uncertainty with false confidence.
  • Pillar one is a dataset that encodes your definition of "good." It must look like reality — including the hard, edge, adversarial, and fairness cases — not like the demo, and it must capture expected paths, not only answers. Starving it is the most common way the effort goes hollow.
  • Pillar two is scoring at scale with an automated judge, which counts only if you have calibrated it against your own experts, with a tracked agreement rate. An unchecked judge is a second unverified model with an opinion. Done right, it scales human judgement rather than replacing it.
  • Pillar three is the gate: no change reaches the public without clearing a defined bar. This turns change from a blind bet into an evidenced decision, and it is a policy only the accountable leader can make stick. A gate that can be skipped under pressure is not a gate.
  • Built well, the harness is a compounding asset: it lets you change and improve an agent faster, not slower, because you can do so without fear. For a public body it is also an auditable control — proof that nothing reached a citizen without clearing a recorded standard.

Questions to ask your organisation

  • When someone shows us a pass rate, can we answer "of what?" Does the test set include the hard, unusual, and fairness cases where the agent is likeliest to fail unnoticed — or is it a flattering collection of cases the agent was always going to pass?
  • Who, or what, is grading the agent's outputs? If it is an automated judge, has anyone checked how often it agrees with our own experts, and do we track that over time?
  • When we last changed this agent, did anything stop the change from shipping if it had degraded quality — or do we only report evaluations while releases proceed? Is the gate real?
  • Are we resourcing the dataset work with our knowledgeable people's time, as the load-bearing investment it is — or do we treat it as a distraction from "real" progress?
  • Could we hand an auditor our evaluation dataset and results as evidence that this agent performs, and performs equitably, across the cases that matter? If not, what is missing?

Chapter 7 — "Evaluation III: Online and Continuous" — completes the evaluation arc. It moves from the cases you imagined to the reality the agent meets: scoring live behaviour, detecting the slow drift that frozen tests will never catch, and the failure-mining flywheel that turns every real-world mistake into a permanent test case — the loop that makes an agent get safer the longer it runs, instead of more dangerous.

Chapter 7: Evaluation III — Online and Continuous

What this chapter equips you to decide - Why a launched agent is not a finished agent, and why "we evaluated it before launch" is not enough. - What drift is — the silent decay that frozen tests will never catch — and why it is the failure you will not notice until a citizen does. - Why you must keep measuring an agent's live behaviour, and why cutting that spend once an agent is "working" is a false economy. - How the failure-mining flywheel makes a watched agent get safer the longer it runs, instead of more dangerous.


Launch is not the finish line

The last two chapters built an evaluation harness and used it to decide whether an agent is fit to deploy. It would be natural to think the evaluation work is now done. It is not. It has barely started.

Offline evaluation tests your imagination of reality. Production tests reality.

The harness from Chapter 6 is only as good as the cases you thought to put in it. However diligent your team, that dataset is a best guess at what the agent will face. Production is not a guess. It is the real thing — thousands of real situations, including the ones no one imagined, arriving every day. The cases that matter most are often the ones you could not have foreseen, and the only place they appear is in live use.

So launch is not the finish line. It is where real evaluation begins. A launched agent is not a finished agent, and treating it as one is a costly mistake.

A launched agent does not stay still

Even if your agent were perfect on launch day, it would not stay perfect. Three things change underneath it, and none of them announces itself.

The inputs change. New kinds of enquiries appear. Patterns shift with the season, with the news, with a policy announcement. People learn how the agent behaves and start to use it in new ways — sometimes to test its limits.

The model can change. If your agent runs on a model supplied by a vendor, that model may be updated beneath you. An update meant to improve the model in general can change how it behaves on your specific task.

The world changes. Policies change. Rules change. Facts that were true when you launched stop being true. The agent keeps reasoning as it always did, now against a world that has moved.

A frozen offline test cannot see any of this. It checks the same fixed cases and reports the same reassuring pass. Meanwhile the ground has shifted, and the agent faces questions your test set never contained. This is why evaluation cannot be a gate you pass once. It has to be something you keep doing.

Drift: the failure with no alarm

The danger that follows from all this has a name: drift. It is the slow, quiet decay of an agent's quality over time.

Drift is dangerous because it is gradual. There is no crash, no error message, no incident. The agent still works. It works a little worse this week than last, and a little worse again the week after. Nothing trips an alarm, because nothing has visibly broken. Then one day the agent is far worse than the day you launched it — and no one ever decided that should happen.

The reason a leader must take drift seriously is that the usual signals will not catch it. You might assume that if the agent got worse, complaints would rise. They often will not. A citizen who receives a slightly wrong reply does not know it is wrong. They act on it. The error surfaces later, somewhere else, disconnected from its cause. By the time it reaches you as a complaint or an incident, the drift has been running for months.

Drift is the failure you will not notice until a citizen does — unless you are watching.

Online evaluation: watching live behaviour

Online evaluation is how you watch. The idea is simple, and it reuses what you already built.

Take the calibrated judge from Chapter 6 — the one you checked against your own experts. Point it at a sample of the agent's real production traffic, scoring live behaviour as it happens. Set a quality threshold. When the scores start to slip below it, you get an alert.

That is the shift that matters. Without this, you are waiting for a complaint or an incident to tell you something is wrong — which means waiting until the damage is done and out in the world. With it, you are measuring the agent's behaviour as it runs, and you get an early warning when quality starts to fall. You find out before the citizen does.

For the leader, the decision here is about money and will, not technique. Online evaluation costs something to run, and it runs forever. The temptation, once an agent has been live and quiet for a few months, is to ask why you are still paying to watch something that "works." That instinct is the false economy. The agent is quiet in part because you are watching it; stop, and you have chosen not to know when it starts to drift. A launched agent without live measurement is not a finished agent. It is an agent you have stopped watching.

One dependency matters. You can only evaluate live behaviour if you recorded it — every input, every step, every action, captured as you go. You cannot replay or learn from what you did not capture. The machinery for that recording is the subject of Chapter 12; for now, hold it as a precondition. No record, no online evaluation.

The flywheel: how a watched agent gets safer

Now the most important idea in the evaluation arc, and the most hopeful one.

Most things you ship decay after launch. Software rots as the world moves around it. A watched agent does the opposite. It gets safer the longer it runs — if you build one loop.

The loop works like this. Production surfaces real failures: a low score from your online judge, a complaint, an officer catching a bad reply, a near-miss. You investigate each one and turn it into a new test case in your offline dataset — the real input, the right answer, the right path. That case now lives in the harness permanently. From then on, the gate checks every future change against it.

Follow the consequence. Once a failure becomes a test case, that failure cannot recur unnoticed. The agent might fail in a new way, but it cannot fail in that way again without the gate catching it. Each real-world mistake becomes a permanent immunisation. The agent stops failing the same way twice.

And the dataset itself transforms. It began as your imagination of reality — your best guess at the cases that mattered. Mistake by mistake, it becomes a record of reality itself. Over time, the gap between the cases you imagined and the cases the agent meets closes. Your evaluation comes to measure what does happen, not what you thought would.

This is the loop that justifies the last two chapters. It is also what makes a launched agent an asset that compounds rather than a liability that decays. But it does not turn on its own. You have to fund someone to mine the failures, investigate them, and feed them back. If no one owns the flywheel, it does not spin, and the agent rots like everything else. For a public body there is a further reward for keeping it spinning: the difference between assuring an oversight committee that the agent worked on the day you launched it and showing that it stays assured, with every real failure captured and closed.

Closing the evaluation arc

Three chapters on evaluation, and this is the sum of it. Trust is earned through measurement, not granted on a demo. The harness rests on a realistic dataset, a calibrated judge, and an enforced gate. And the work does not stop at launch — you watch live behaviour, you hunt for drift, and you feed every real failure back so the agent grows safer with use.

Evaluation tells you how well the agent behaves. It does not, by itself, limit what the agent is able to do. And when an agent misbehaves — which, however well you evaluate, it eventually will — the size of the damage depends not on how well you measured it but on the bounds you placed on it. That is the next stretch of the method: not judging the agent's behaviour, but limiting its reach, so that a failure is survivable by design. It begins, in the next chapter, with the agent's identity and the permissions you grant it.


A worked decision

The statutory board's drafting agent has been live for two quarters. It has been quiet — no incidents, no spike in complaints. At a review, someone proposes scaling back the "ongoing evaluation" line in the budget. The agent works; why keep paying to watch it?

Before deciding, you look at what the online evaluation has been recording. The picture is not the calm one the absence of complaints suggested. The judge's scores have slipped over the quarter — a few points a month, no single drop large enough to notice, adding up to a meaningful decline. The cause, once traced, is mundane: someone updated a policy two months ago, and no one updated the agent with it. It has been citing the old rule ever since — fluent, confident, and wrong.

No citizen complained, because no citizen knew. The replies looked right. Officers, trusting a tool that had been reliable for months, had been waving them through. The only reason you know about it is that you were still watching — the very spend someone just proposed to cut.

The decision makes itself. You do not cut the evaluation budget; you have just seen what it bought. You update the agent for the policy change, and you do one more thing: you turn this failure into a permanent test case, so that "agent kept using a superseded policy" can never again go unnoticed. The drift that nearly became an incident becomes one more case the agent cannot fail again unnoticed. That is the flywheel doing its job — but only because someone was still turning it.


Key takeaways

  • Offline evaluation tests your imagination of reality; production tests reality. Launch is not the finish line — it is where real evaluation begins, and a launched agent is not a finished agent.
  • An agent does not stay still after launch. Its inputs change, the underlying model can change, and the world it reasons about changes. A frozen test cannot see any of it.
  • Drift is the slow, silent decay of quality with no alarm. Complaints will not catch it, because citizens who get a slightly wrong answer do not know it is wrong. It is the failure you will not notice until a citizen does — unless you are watching.
  • Online evaluation points your calibrated judge at live traffic and warns you when quality slips. It costs money and runs forever, and cutting it once an agent is "working" is a false economy: a launched agent without live measurement is one you have stopped watching.
  • The failure-mining flywheel turns every real failure into a permanent test case. The agent stops failing the same way twice, and the dataset shifts from your imagination of reality to reality itself. A watched agent gets safer the longer it runs — but only if you fund someone to keep the loop turning.

Questions to ask your organisation

  • Are we measuring this agent's behaviour in production, or did our evaluation stop at launch? If it stopped, how would we know the agent has started to drift?
  • If the agent's quality had declined over the last three months, what would have told us — an alert from live measurement, or a citizen complaint after the damage was done?
  • When a vendor updates the underlying model, or a policy changes, what process re-checks that the agent still behaves as intended? Or do those changes reach citizens unexamined?
  • When the agent fails or has a near-miss in production, what happens to that failure? Do we fix it and forget it, or capture it as a permanent test case so it cannot recur unnoticed?
  • Who owns the flywheel — whose job is it to mine real failures and feed them back into the harness? If the honest answer is "no one," the loop is not turning, and the agent is decaying like any unwatched system.

Part II now turns from judging the agent to bounding it. Chapter 8 — "Identity, Access, and Least Privilege" — is the first of the control chapters. Evaluation tells you how well an agent behaves; it does nothing to limit the damage when it misbehaves. What the agent can reach and do sets the size of that damage — a decision about identity and permissions, and one most organisations get wrong.

Chapter 8: Identity, Access, and Least Privilege

What this chapter equips you to decide - Why the damage an agent can do is set by what it can reach, not by how well it behaves — and why that makes access a decision you take before failure, not after. - Why an agent needs its own identity, and why letting it borrow a human's login or a shared key is the quiet mistake most organisations are making. - What "least privilege" means for an agent, and why granting it broad access for convenience can undo the safety you designed in. - Why governing agent identities is becoming a condition of doing business, not just good practice. - Why "we don't restrict our people this way" is not the objection it first seems — and what an agent borrows from, and lacks beside, a human worker.


From judging the agent to bounding it

The evaluation chapters were about how well an agent behaves. This one starts from a hard fact: however well you evaluate it, the agent will eventually misbehave. It will be hijacked, or it will drift, or it will be confidently wrong on a case you never tested.

When that happens, the size of the damage will not depend on how well you measured the agent. It will depend on what the agent can reach and do. An agent that can only read a narrow slice of data and draft a reply can do little harm when it fails. An agent that can alter records, move money, or email the outside world can do a great deal.

The principle, then: an agent's blast radius is set by its access, not its intentions. You cannot guarantee good behaviour. You can decide, in advance, how much a lapse is allowed to cost. And you decide it by controlling what the agent is — and is not — allowed to touch.

This is the cheapest and most reliable control in the book, because it is a capability you never grant rather than a risk you have to manage. It is also the one most often given away for convenience.

Agents are a new kind of actor — and we are handing them the wrong keys

An agent in production is a new actor on your systems. It logs in, reads data, calls tools, and takes actions. The question of who it is, and what it is allowed to do, is an identity question — and most organisations are getting it wrong, because they are answering it the easy way.

The easy way is to wire the agent up with credentials that already exist. A human employee's login. A shared service account that other systems use. A broad, long-lived key that opens many doors at once. This is fast to build and it works in the demo. It is also dangerous, for a reason specific to agents.

A human's credentials assume a human's judgement and a human's pace behind them. A shared service account assumes predictable, pre-written software. An agent is neither. It decides its own actions at runtime, and — as Chapter 3 showed — it can be talked into the wrong action by the content it reads. Give that agent a broad, borrowed key, and a single hijack or a single drift now has the run of everything that key unlocks. You have handed an unpredictable actor the master keys, because issuing it a narrow set of its own was more work.

But do we treat a new hire this way?

A fair objection arrives here. We do not hold our own people to strict least privilege. We give employees broad latitude, far more access than the bare minimum, and we trust them to use judgement about the rest. If we don't lock down humans this way, why lock down agents?

The answer is plain: an agent is not a feature release. It is a new hire. And the right comparison is not a trusted veteran of fifteen years — it is that new hire on their first day.

We do not hand a new hire the master keys on day one. We scope their access to their role, we supervise them, and we widen their latitude as they earn trust. Least privilege for an agent is not a strange new burden. It is onboarding.

But there is a deeper difference, and it is the heart of the matter. Even on day one, we extend a human more than the strict minimum — on credit, against collateral the agent cannot post. A person brings judgement: they know an action is wrong even when the system would permit it. They bring accountability that bites: a career, a conscience, a professional standing they can lose. And they bring resistance to manipulation: you cannot talk a clerk into wiring out the reserves with a clever letter, but an agent, as Chapter 3 showed, can be hijacked by the very content it reads.

So weigh what makes an action safe. For a human, it is access plus judgement plus accountability — and the last two do most of the work, which is why the access can be loose. For an agent, you cannot supply judgement, and you cannot make accountability bite it. Access is the only lever you hold. The boundary must therefore carry, by itself, the whole load that judgement and accountability carry for a person.

That is why an agent's access is tighter than a colleague's. It is not a harsher standard. It is the same standard, doing three jobs instead of one. We give people latitude because we can hold them responsible. We must give agents boundaries because we cannot.

The four-part discipline

The discipline that follows has four parts: give each agent its own identity, grant it the least it needs, make its keys narrow and short-lived, and govern those identities over their whole life.

Give each agent its own identity

Every agent should be a distinct, verifiable actor — its own identity, not a borrowed or shared one. This matters for two plain reasons.

The first is attribution. When something goes wrong, the first question is which agent did this, and on whose authority — the accountability question from Chapter 3, and the one an auditor will ask. If three agents share one set of credentials, you cannot answer it. The action is traceable to the shared account and no further. Per-agent identity is what lets you trace any action back to a specific agent and the human who authorised its capability.

The second is containment. A shared credential is a shared blast radius: compromise it, or let one agent drift, and everything that credential touches is exposed at once. A unique, scoped identity per agent keeps a failure contained to that one agent's narrow remit. The fire does not spread, because the agents are not all holding the same key.

The rule is therefore short. No shared keys. No borrowed human logins. Each agent is its own actor, and you can name what each one did. None of this is novel: we already refuse to let employees share a single login, for this reason. Per-agent identity extends to our newest workers a rule we accepted long ago for the rest.

Grant the least it needs

Once an agent has its own identity, the next decision is what that identity is allowed to do. The answer is: the least the job requires, and nothing more.

This is least privilege, and for an agent it is not a tidiness preference — it is the main lever on blast radius. Every tool you connect, every dataset you open, every action you permit is a door the agent can be made to walk through, including in ways you did not intend. A permission the agent does not need is pure risk: it adds nothing to the value and everything to the danger.

The evidence that organisations get this wrong is not subtle. In SailPoint's 2025 survey of 353 IT and security professionals, AI Agents: The New Attack Surface, around 80% of organisations reported that their agents had already taken unintended actions, reaching unauthorised systems and sensitive data no one meant them to touch — the same scope finding Chapter 3 flagged on the risk surface. And IBM's 2025 Cost of a Data Breach report found the same root cause in the field: of the organisations that suffered an AI-related breach, 97% lacked proper AI access controls. These are not exotic attacks. They are agents doing more than they were meant to, because the access was there and the boundary was not.

The leader's question for every agent is the one from Chapter 4, now applied to permissions: what is the narrowest set of tools and data this agent needs to do its job — and what have we chosen not to give it? If the honest answer is "we gave it broad access because that was what existed," that is a finding to act on, not an answer to accept.

Make the keys narrow and short-lived

Even the right permissions become a liability if the key that grants them is broad and permanent. Two further moves shrink the risk.

The first is to make credentials short-lived and just-in-time. Rather than handing the agent a standing master key, issue narrow, time-bound credentials for the task at hand, and re-check at each tool and action rather than once at the start. The longer and broader a credential lives, the larger the window in which a hijacked or drifting agent can misuse it. Short, scoped, expiring credentials close that window.

The second, and the strongest pattern, is to keep the secret out of the agent's hands entirely. Instead of the agent holding credentials, it acts through a mediation layer that holds them and enforces the rules. The Model Context Protocol is increasingly used this way: the agent asks the mediation point to act, and that point applies your access policy, performs the action, and logs it. The agent never sees the key.

This is where an AI-native foundation pays off. If agents reach your systems through well-defined, mediated interfaces rather than scattered credentials, you have a single place to enforce who can do what, to watch it happen, and to switch it off. That single control point is worth far more than any number of instructions to the agent, because it is a boundary the agent cannot talk its way around.

Govern agent identities for their whole life

An agent identity is not something you grant once and forget. Like any other identity in your organisation, it needs an owner, periodic review of what it can do, and revocation when it is no longer needed.

This matters more as you scale. The first agent is easy to track. The fiftieth is not. In many organisations, machine and agent identities already outnumber human ones, and the dangerous ones are the orphans — agents whose task ended but whose access did not, sitting idle with live permissions and no owner. Those are the accounts an attacker hunts for, and the ones no one is watching. An agent that finishes its job, or goes quiet, should lose its access automatically. It is the same discipline as collecting a departing employee's pass and closing their accounts — offboarding, applied to a worker who never had a desk.

Keeping order across many agent identities is itself a governance task, and a register of every agent — what it is, who owns it, what it may touch — is part of the operating model in Chapter 13. Provisioning an agent is not the end of the access decision. It is the start of an identity you have to manage for as long as it lives.

Why this is now a gate, not just good practice

There is a final reason for the leader to treat this as a first-order decision rather than a technical detail to delegate. Agent identity and access have moved from good practice to a condition of doing business.

Enterprise customers and security reviewers increasingly require it by name: each agent its own identity, least privilege, short-lived credentials, controls aligned to the recognised standards. Weak agent access governance is starting to block deals and accreditations the way a missing security certification did a few years ago. For a public-sector body the equivalent is your security accreditation and your auditor: "the agent runs on a shared account with broad access" is not a sentence that survives either.

So the bounding this chapter describes is not only how you keep a failure survivable. It is increasingly how you earn the right to deploy at all. Evaluation told you whether to trust the agent's behaviour. Identity and access decide how much that trust is allowed to cost when it is misplaced — which is the same job, approached from the other side. The next chapter takes up the human half of that job: where a person must stay in the loop, and how to make their oversight real rather than a rubber stamp.


A worked decision

The statutory board's drafting agent is ready to deploy. Reviewing how it has been built, you ask a question that had not come up before: what can it reach? The answer is that the team, to save time, connected it using an existing service account — one that already had broad read and write access across the whole case-management system, because that account was on hand and wiring up a new one would have taken time.

Stop and see what that means. You chose this use case in Chapter 4 because it was reversible: the agent drafts, an officer reviews and sends, and the officer catches a bad draft before it reaches anyone. But the account it was given can do far more than draft. It can alter citizen records. Through that account, a hijacked or drifting agent could change case data — an action that cannot be reversed. The careless access grant has undone the reversibility that made the use case safe. On paper you have a low-risk drafting agent; in fact you have an agent that can rewrite the record.

The decision is to refuse deployment until the access is fixed, and the fix follows the chapter. Give the agent its own identity, not the shared account. Grant it read-only access to the narrow slice of case and policy data it needs to draft a reply — and nothing else. No write access. No ability to send. Where possible, route even that read access through a mediation layer that holds the credential and logs every call. Now the agent's blast radius matches the use case you chose: it can produce a bad draft, which an officer catches, and it can do nothing worse. The reversibility is real again, because the access reflects it.


Key takeaways

  • An agent's blast radius is set by its access, not its intentions. You cannot guarantee good behaviour, but you can decide in advance how much a lapse is allowed to cost — by controlling what the agent can touch. It is the cheapest control in the book and the one most often given away for convenience.
  • The right mental model is the new hire, not the feature release — and the comparison is the new hire on day one, who gets scoped access and supervision, not the master keys. We extend people more latitude only because they post collateral an agent cannot: judgement, accountability that bites, and resistance to being manipulated. Access is the one lever you hold over an agent, so it must carry the load that judgement and accountability carry for a person.
  • Agents are a new kind of actor. Wiring them up with borrowed human logins or broad shared keys is fast and dangerous: it hands an unpredictable, hijackable actor the master keys.
  • Give each agent its own identity — for attribution (knowing which agent did what) and containment (a failure stays inside one narrow remit instead of spreading). No shared keys, no borrowed logins.
  • Grant the least it needs. Every extra tool, dataset, or action is pure risk. In SailPoint's 2025 survey around 80% of organisations said their agents had already taken unintended actions, and IBM's 2025 breach report found that 97% of organisations hit by an AI-related breach lacked proper access controls — agents doing more than meant because the boundary was not there.
  • Make keys narrow and short-lived, and where possible keep the secret out of the agent's hands entirely — acting through a mediation layer (such as MCP) that holds credentials, enforces policy, and logs every action. Then govern agent identities for life: owner, review, and automatic revocation when idle. This is now a condition of deployment, not just good practice.

Questions to ask your organisation

  • For each agent: does it have its own identity, or is it using a human's login or a shared service account? If shared, can we even say which agent took a given action?
  • Would we give a trusted human this same access? If yes, does this agent have what makes that safe in a person — judgement, accountability we can enforce, and resistance to being talked into the wrong action? If not, the access that is fine for the colleague is not fine for the agent.
  • What can this agent reach and do — every tool, dataset, and action? Is that the minimum the job requires, or is it broad because broad access was already lying around?
  • Could a hijacked or drifting version of this agent take an irreversible action — alter a record, move money, send something externally? If so, why does a reversible use case have irreversible reach?
  • Are the agent's credentials short-lived and scoped, or is it holding a broad, standing key? Could we route its access through a single mediation point where we can see and revoke it?
  • When an agent's task ends or it goes idle, is its access withdrawn automatically — or do we have orphaned agents sitting with live permissions and no owner? Who owns each agent we run?

Chapter 9 — "Human Oversight and Control" — takes up the human half of bounding an agent. It returns to the autonomy dial from Chapter 2 and asks where a person must stay in the decision, how to set that by risk and reversibility rather than by reflex, and — hardest of all — how to keep human oversight real, when the natural drift is toward a tired official rubber-stamping whatever a confident agent proposes.

Chapter 9: Human Oversight and Control

What this chapter equips you to decide - Where a human must stay in an agent's decisions, and where insisting on one adds cost without adding control. - How to set the level of oversight by risk and reversibility, action by action, rather than by reflex. - How to tell real oversight from oversight theatre — a checkpoint that has become a rubber stamp — and how to keep it real. - How to taper oversight as an agent earns trust, the way you would with a capable new hire, without ever letting the supervision become a signature given unread.


Oversight is a dial, not a switch

The last chapter bounded what an agent can reach. This one bounds it from the other side: where a person stays in the loop.

"Human in the loop" gets used as if it were a single setting — on or off, supervised or autonomous. It is not. As Chapter 2 set out, autonomy is a dial, and oversight is the same dial seen from the human's end. At one end, the agent only advises and a person acts. Then: the agent proposes and a human must approve before it acts. Then: the agent acts on its own while a human watches and can intervene. At the far end, the agent acts and review is occasional or after the fact.

The craft of oversight is knowing where to set that dial — and setting it is only half the job. The other half is making sure that whatever oversight you placed there is real.

Both extremes are expensive

It is tempting to treat oversight as a virtue you can never have too much of. More humans checking more things must be safer. It is not, and seeing why is the start of doing this well.

Put a human in front of everything and two things happen, both bad. You rebuild the manual process with an extra step bolted on — the agent does the work, then a person redoes the checking, and the efficiency you were chasing evaporates. And, worse, you train your people to stop looking. A reviewer who must approve hundreds of items a day, almost all of them fine, learns to approve without scrutiny. The oversight is there on paper and gone in practice.

Put a human nowhere, and you have the opposite problem: an agent taking consequential, irreversible actions with nothing between it and the world. That is the unbounded liability the book is trying to avoid.

So oversight is not something to maximise or minimise. It is something you place — heavy where a wrong action would be costly or hard to reverse, light where it would be cheap and recoverable. What decides which is which? The same two questions that chose your use case back in Chapter 4.

Set the dial by risk and reversibility

The level of oversight an action needs is a function of two things: how much harm a wrong action would do, and whether it can be undone. These are the same variables that chose your use case in Chapter 4, now applied one level down — to each action the agent can take.

The principle is plain. High-stakes, irreversible actions should require human approval before they happen — the agent proposes, a person authorises, then it acts. Low-stakes, reversible actions can proceed on their own, with review after the fact or on a sample, because if one is wrong you can catch it and put it right. Between those, the dial has middle settings: act-and-notify, act-under-watch, act-with-periodic-review.

The crucial move: you set this per action, not per agent. The same agent can sit at different points on the dial for different things it does. Our drafting agent needs no approval to look up a case file — trivial, reversible — but if it were ever allowed to alter that file, it should not be able to do so without a human in front of the action. Defining these points — the specific places where the agent must stop and get a human decision — is what oversight design is. Singapore's agentic governance framework makes the checkpoints at which human approval is required one of its core dimensions — they are what make a human meaningfully accountable for the agent, rather than nominally responsible for a black box.

That phrase — meaningfully accountable — is the hinge of this chapter, because it is what the next problem threatens.

The hard part: keeping oversight real

Here is the failure almost every discussion of human oversight skips. You can place a checkpoint perfectly, and it can still be worthless — because the human at it is not deciding.

This is the trust inversion from Chapter 3. An agent is fluent, fast, and right most of the time. A person reviewing its output, hundreds of times, comes to expect it to be right. Fluency reads as competence. The nineteen good outputs train the reviewer to wave through the twentieth without the scrutiny that would have caught it — and nothing on the surface of the wrong one looks different from the right ones. They are all equally confident. So the reviewer, doing what humans reliably do, begins to rubber-stamp.

Call this what it is: oversight theatre. A checkpoint that a tired official will approve on reflex is not a control. It is a rubber stamp with a job title. It produces a record of human approval, which is worse than no record, because it manufactures the appearance of accountability while supplying none. And it is the reason that "we have a human in the loop" should never, by itself, reassure you. The question is not whether a human is there. It is whether the human is deciding.

Making oversight real takes four things.

First, give the human a judgement to make, not a box to tick. Asking "is this okay?" invites a yes. Instead, show the reviewer what they need to evaluate: the agent's reasoning, the evidence it used, the specific basis for its proposed action — the trajectory, not just the answer. Recall from Chapter 5 that the dangerous drafts were the ones that reached a plausible answer through the wrong provision. A reviewer shown only the answer cannot catch that. A reviewer shown which provision the agent relied on can. Real oversight requires giving the human the thing the agent was reasoning over, not just its conclusion.

Second, concentrate human attention where it matters. The counterintuitive part: spreading oversight evenly across everything buys less real scrutiny, not more, because it exhausts the reviewer on cases that never needed it. Aim the human at the high-stakes, the irreversible, the unusual, and the cases where the agent itself signals low confidence — and let the routine, reversible majority flow with lighter, sampled review. Less oversight, better targeted, produces more genuine judgement than blanket review ever will.

Third, give the human the power and the cover to say no. Oversight is fake if overriding the agent is slow, difficult, or penalised. If you measure a reviewer on throughput, they will approve to hit the number, and you will have built a control your own incentives defeat. The official must be able to reject the agent, expected to when warranted, and safe from being punished for the time it takes. In the public sector especially, watch that a target meant to clear a backlog does not convert your reviewers into rubber stamps.

Fourth, measure the oversight itself. If a checkpoint is approved 99.8% of the time, in two seconds each, you do not have oversight — you have a rubber stamp you can now see in the data. Track override rates and the time spent deciding. Treat an approval rate that never moves as a warning that the human check has gone hollow, the same way a flat line on any other monitor would worry you.

Supervision, and the latitude that is earned

There is a better way to hold all of this in mind, and it is the one running through this book. An agent is a new hire, not a feature release — and oversight is supervision.

Think about how you supervise a capable new colleague. On day one you check their work. As they show they can be trusted with a kind of task, you pull back — you move from reviewing every instance to spot-checking. You extend latitude as they earn it. And if they make a serious mistake, you tighten the supervision again until trust is rebuilt. You never "supervise" them by signing their work without reading it. That is not supervision; it is the rubber stamp under another name.

Every part of that translates. Oversight should taper as the agent earns trust — but earned the way you would let a new hire earn it: through demonstrated reliability, not through the mere passage of time or because everyone got busy. And the evidence that earns it is the evaluation from Chapters 5 to 7. When the harness shows the agent handling a category of cases reliably, you have grounds to lighten the human check on that category and concentrate it elsewhere. When the online evaluation shows drift, you tighten the dial back up. Delegation is earned, and it is revocable. That single idea — supervise closely, delegate on evidence, claw back on failure — is how you move an agent up the autonomy dial over time without ever taking your hand off the control.

It is also the honest answer to the tension running through this chapter. Oversight is not a permanent tax that makes agents pointless, and it is not a formality you discharge with a nearby human. It is active supervision that starts tight, relaxes on proof, and tightens on trouble — the same judgement you would apply to a promising employee you are not yet ready to leave unwatched.

Where accountability lives

Step back to that phrase. The purpose of oversight is not to have a human present. It is to keep a human accountable — and a human who cannot, or will not, meaningfully say no is not accountable, whatever the organisation chart records. The checkpoint is where accountability lives or dies.

That is why this chapter and the last belong together. Access bounds what the agent can do; oversight bounds it by keeping a deciding human in the consequential path. Both assume the agent's failures are, at worst, honest ones — drift, error, overreach. But some failures are not honest. Some are engineered by someone trying to turn your agent against you, using the very inputs it reads. Bounding access and placing humans well will not, by themselves, stop a determined attacker. That is a different discipline, and it is the subject of the next chapter.


A worked decision

The drafting agent has been live for a quarter. An officer reviews every drafted reply before it goes out, so on paper the oversight is strong: a human approves every single output. Leadership is satisfied there is a human in the loop.

Then you look at how the loop runs. The review queue is enormous, and officers are measured on how fast they clear it. The approval rate is 99.7%. The average time spent on each review is a few seconds. And the interface shows the officer only the drafted reply — not which policy provision the agent used to write it.

Put those facts together and the human in the loop is an illusion. The officers are not deciding; they are clearing a queue under a throughput target, waving through fluent drafts they have no realistic way to scrutinise. The wrong-provision failures you identified two chapters ago — plausible replies built on the wrong rule — are sailing straight through the one control meant to catch them, because the control has become theatre. You have manufactured a record of human approval and none of the judgement it implies.

The fix follows the chapter, and none of it is "add more reviewers." Show the officer the provision the agent relied on, so there is something real to judge. Stop reviewing every draft the same way; route the routine, low-risk replies to light sampling and concentrate genuine review on the high-stakes, unusual, and low-confidence ones. Remove the raw throughput target that is defeating the control, and make clear that overriding the agent is expected, not penalised. And start tracking the override rate as a health signal. Then, as the evaluation harness proves the agent reliable on whole categories of routine enquiry, taper the human check on those — supervision relaxing on evidence — while keeping it tight where a mistake would reach a citizen and could not be undone. That is oversight doing its job, rather than wearing its uniform.


Key takeaways

  • Oversight is a dial, not a switch — the autonomy dial seen from the human's end. The craft is aiming it, then making sure whatever you placed there is real.
  • Both extremes cost you. Oversight everywhere rebuilds the manual process and trains reviewers to stop looking; oversight nowhere is unbounded liability. Aim it, don't maximise or minimise it.
  • Set the level by risk and reversibility, per action, not per agent. High-stakes, irreversible actions need approval before they happen; low-stakes, reversible ones can run with after-the-fact or sampled review. Defined checkpoints are what make a human meaningfully accountable.
  • The hard part is oversight theatre: a checkpoint a tired reviewer rubber-stamps. It is worse than no checkpoint, because it manufactures the appearance of accountability. Keep it real by giving the human the agent's reasoning to judge (not just the answer), concentrating attention where it matters, giving reviewers the power and cover to say no, and measuring the oversight itself.
  • Oversight is supervision of a new hire. Start tight, taper as the agent earns trust on evidence from the evaluation harness, and tighten again on drift. Delegation is earned and revocable — and you never supervise by signing the work unread.

Questions to ask your organisation

  • For each consequential action our agents can take, did we deliberately decide the level of human oversight by its risk and reversibility — or is the setting an accident of how the system was built?
  • Where we have a "human in the loop," is that human deciding, or clearing a queue? What is the approval rate, and how long does a review take? If approvals are near-universal and near-instant, we have a rubber stamp, not a control.
  • Does the reviewer see what they need to judge the agent — its reasoning and the basis for its action — or only its polished output? Could they catch a confident answer built on the wrong basis?
  • Do any of our targets or incentives punish a reviewer for taking time, or for overriding the agent? If so, we have built a control our own metrics defeat.
  • As an agent proves reliable, do we have an evidence-based way to taper oversight — and to tighten it again when evaluation shows drift? Or is oversight stuck on, regardless of what the agent has earned or lost?

Chapter 10 — "Security for Agentic Systems" — turns from honest failure to deliberate attack. Bounding access and placing humans well protects you when an agent errs, drifts, or overreaches. It does not stop someone who is trying to turn your agent against you through the content it reads. The next chapter takes up that adversary: prompt injection and goal hijack, breaking the lethal trifecta by design, and red-teaming an agent before someone else does.

Chapter 10: Security for Agentic Systems

What this chapter equips you to decide - Why your agent can be attacked through the ordinary content it reads, and why that is a threat traditional security was not built to catch. - Why you cannot filter your way out of this — and what it means to design your way out instead. - How breaking the "lethal trifecta" by design is the single most important security decision you make, and why it is an architecture choice, not a bolt-on. - Why you should attack your own agent before someone else does, and treat security as a shared responsibility you cannot wholly delegate to a vendor.


From accident to adversary

Everything in the last two chapters defended the agent against itself. Least privilege limits the damage when an agent errs or drifts. Oversight keeps a human on the consequential path. Both assume the failures are honest — mistakes, not malice.

This chapter assumes the opposite. It is about someone who is trying to turn your agent against you.

That is a different problem, and it needs a different approach, because an adversary is intelligent and adaptive in a way random error is not. A drifting agent fails in the directions the world happens to push it. An attacker fails it in the direction that hurts you most, and keeps trying until something works. Defences that are adequate against accident can be trivial to defeat on purpose.

For most of computing history we have had reasonable defences against this. Agents weaken them, for one reason every leader should understand.

Your agent can be attacked through what it reads

Traditional software keeps a clear line between code — the instructions — and data — the stuff the instructions act on. Security is built on that line. You guard the channels where instructions come in, and you treat data as inert.

Agents erase the line. An agent reads content — an email, a web page, a document, a customer message, a field in a record, even another agent's output — and that content is, to the agent, indistinguishable from instructions. It is all language, and language is what the agent takes its direction from. So a piece of text that the agent merely reads, as part of doing its job, can carry commands the agent then follows.

This is the heart of the threat the security community calls prompt injection, and its more dangerous form, goal hijack. Sit with what it means. The attacker does not need to breach your systems or steal a password. They do not even need to talk to your agent. They only need to get hostile text in front of it — and an agent that reads from the outside world will encounter text from the outside world. A crafted message in an inbox the agent triages. A hidden instruction in a document it summarises. A booby-trapped web page it consults. The agent reads it, cannot tell the planted instruction from its real task, and does what the attacker asked using its own legitimate access.

Your firewall does not see this, because nothing is breaking in. The malicious payload arrives through the front door, as ordinary content, and the agent carries it inside. That is why agent security is not your existing security with an "AI" label. It is a new surface, and the surface is language itself.

You cannot filter your way out

When a leader first understands this, the natural instinct — and the one the engineering team will propose — is to build a filter. Scan the incoming content for malicious instructions and block them before the agent sees them. Add a guardrail.

Filters are worth having. They are not, on their own, a defence, and believing otherwise is the most common security mistake in this field.

The reason is fundamental, not fixable with effort. The agent cannot reliably tell an instruction from content — and neither can a filter, because the attacker writes in ordinary language and has endless ways to phrase the same hostile intent. Good guardrails catch most attempts; reported figures put even strong ones at around 95%. That sounds reassuring until you say it the other way: roughly one crafted attack in twenty gets through. Against a motivated adversary who can try as often as they like, a control that fails one time in twenty is not a control you build a high-stakes system on. It is a speed bump.

So the lesson, due to the security engineer Simon Willison, is the one to carry: you cannot filter your way out of prompt injection. You design your way out. The defence is not a cleverer scanner. It is an architecture in which a successful injection cannot do anything worth doing.

Break the trifecta by design

Chapter 3 introduced the model that makes this concrete: the lethal trifecta. An agent becomes dangerous when it holds three capabilities at once — access to sensitive data, exposure to untrusted content, and the ability to communicate or act externally. With all three, a single piece of poisoned content can instruct the agent to take the sensitive data it can see and send it where the attacker can collect it, through the agent's own authorised tools. Each capability is harmless alone. Together they are a complete attack.

The defensive move follows, and it is the central security decision in this book: ensure that no single agent holds all three legs at once. Break the trifecta by design.

In practice that means separating the capabilities across the architecture. The agent that reads untrusted, outside content does not also hold sensitive data and the power to act externally. The agent that can act externally does not also ingest attacker-controlled input. Where a task seems to need all three, you put a boundary between the parts — a controlled hand-off, a human checkpoint, a narrowed interface — so that the dangerous combination never lives inside one actor. A hijack of the reading agent reaches nothing worth taking, because the reading agent was never given anything worth taking.

This is why security cannot be the thing you add at the end. The decision that creates or prevents the vulnerability — does one agent hold all three legs? — is made when the system is designed, often by builders reaching for the convenient shape. It is the same point Chapter 8 made: composing what an agent can read, hold, and do is an architecture decision, not a separate security one. The leader's job is to ensure that decision is made deliberately, by someone who is thinking about the adversary, and not settled by accident in favour of whatever was easiest to wire together.

Defence in depth

Breaking the trifecta is the foundation, not the whole building. Security holds when it is layered, on the assumption that any single layer will sometimes fail. The discipline is to assume each control fails and make sure the next one catches what it missed.

The layers reinforce the two chapters before this one.

Least privilege, from Chapter 8, is a security control as much as a safety one. Even if an agent is hijacked, it can only do what its permissions allow. An agent scoped to read a narrow slice of data and nothing else is an agent whose compromise is survivable. The narrower the access, the smaller the prize for the attacker.

Human checkpoints, from Chapter 9, are the next layer. A hijacked agent still has to get a consequential, irreversible action past a human — and if that checkpoint is real rather than theatre, the attack stalls there. This is one more reason oversight on high-stakes actions must be genuine: it is also your last line against an attacker who got through everything else.

Around these sit the supporting layers: treat all ingested content as data and never as commands; constrain and check what comes back from a tool before acting on it; keep guardrails and filters as one useful layer among several; and make the whole system safe by default, so that the easy path is also the secure one. No single layer is trusted to hold alone.

One more point a leader must not miss: security here is a shared responsibility, and the shares are not all yours. Some of the risk is the model or platform vendor's to manage; some is yours, in how you compose and bound the system; some sits with the end user. Singapore's sandbox drew this conclusion — that safeguards must be spread across the layers, each placed where it is best able to anticipate a given risk. The danger is the gap between the layers: assuming the vendor handled something they expected you to handle. Know which risks you own, and do not assume the model came secure.

Attack your agent before someone else does

You do not know where your agent is vulnerable until someone tries to break it. The only question is whether that someone works for you.

Red-teaming is the practice of attacking your own agent — trying to hijack it, jailbreak it, manipulate it through the content it reads, and push it into actions it should refuse — before you deploy it, and continuously after. It is the security equivalent of the evaluation discipline from Part II, and it connects to it: every successful attack your red team finds becomes a permanent test case in your harness, so that the same exploit cannot return unnoticed. The flywheel from Chapter 7 applies to attacks as much as to errors.

For the leader, the decision is to fund this as a standing capability rather than a one-off box-tick before launch. Attackers do not stop after your launch date, and your agent, your models, and the outside world all keep changing. Adversarial testing that happened once, a year ago, tells you little about today.

This is also where the new-hire comparison reaches its limit. You can train a new employee to be suspicious of a strange request — to notice that an email asking them to wire funds to an unfamiliar account is wrong, and to stop and check. That wariness is a real defence, and people get better at it. You cannot train an agent to be reliably suspicious of the text it reads in the same way; its susceptibility to a well-crafted instruction is a property of what it is. So you cannot lean on the agent's judgement the way you lean on an employee's. You remove the opportunity instead — the argument of this chapter restated. Where you would coach a person, you must constrain an agent.

A brief word on the frontier. When agents talk to other agents, the surface grows: a compromised or poisoned agent can hijack the ones it communicates with, and a single failure can cascade across the group. The security community treats insecure inter-agent communication and cascading compromise as distinct, new risk classes for good reason. For most organisations the prudent posture is the one from Chapter 3 — treat multi-agent systems as a higher tier of risk, and earn the right to them by first securing a single agent well.

Security, access, and oversight are one defence

Step back and the last three chapters resolve into a single posture, each answering the same question: when this agent fails — by accident or by an attacker's hand — how do we keep the failure survivable? Access answers by bounding what the agent can reach, so a lapse touches little. Oversight answers by keeping a deciding human on the consequential path, so a lapse meets a person before it meets the world. Security answers by assuming a hostile actor and breaking the lethal trifecta by design, so that an injection that gets through reaches nothing worth taking and can do nothing worth doing. These are not three separate programmes. They are one defence, seen three times — bound the reach, keep a human on the path, break the combination — and the agent that is scoped to the minimum, genuinely supervised, and architected so no single part holds all three legs at once is safe enough to trust with something that matters, because an accident or an attack has to defeat all three to do real harm.

The agent is now evaluated, bounded, supervised, and secured. It is ready to meet the world — but not all at once. How you take it from a controlled environment into live use, without betting everything on the first day, is the last stretch of the method, and the subject of the next chapter.


A worked decision

The drafting agent has earned its place. Now the team proposes an upgrade: for the simplest, highest-volume enquiries, let the agent send its reply itself, with no officer in between. The backlog would fall. To keep it safe, the team will add a filter that screens incoming enquiries for malicious content before the agent reads them.

Look at what the change does to the agent's shape. Today the agent reads incoming enquiries — untrusted content, since anyone can send one — and it can read case data — sensitive information — but it cannot send anything; the officer does that. Two legs of the trifecta, not three, which is part of why it has been safe. The proposal hands it the third: the ability to communicate externally on its own. Now a single agent reads attacker-controlled text, holds private citizen data, and can send messages out. The lethal trifecta is complete. A crafted enquiry could instruct the agent to fold another citizen's details into its auto-sent reply, or worse — and the agent, unable to tell that instruction from a genuine question, might comply, using its own legitimate access.

The team's filter is the instinct this chapter warns against. It will catch most attempts, not all, and "most" is not a standard for an action that sends citizen data into the world unsupervised. You cannot filter your way to safety here.

The decision is to refuse the design as proposed and require one that breaks the trifecta instead. If auto-send is worth pursuing, separate the capabilities: let auto-sent replies draw only from fixed, vetted templates with no agent-chosen inclusion of case data, so the agent that touches untrusted input has nothing sensitive to leak and no freedom in what it sends — or keep a real human checkpoint on any reply that contains case-specific information. Keep the filter as one layer, not the layer. And before anything ships, have someone try to hijack it. The efficiency is real and worth capturing; it is not worth capturing by assembling, for convenience, the exact configuration an attacker would have asked you to build.


Key takeaways

  • This chapter is about deliberate attack, not honest failure. An adversary is intelligent and persistent in a way error is not, and defences adequate against accident can be trivial to defeat on purpose.
  • Your agent can be attacked through the content it merely reads. Agents erase the line between instructions and data, so a hostile message, document, or web page can carry commands the agent follows — arriving through the front door as ordinary content, where your firewall never sees it.
  • You cannot filter your way out. Even strong guardrails stop only around 95% of attempts, and one crafted attack in twenty is not a standard for a high-stakes system. Filters are one layer, never the defence.
  • Break the lethal trifecta by design. Ensure no single agent holds all three of sensitive-data access, untrusted input, and external action at once. This is an architecture decision made when the system is designed — by someone thinking about the adversary, not settled by whatever was easiest to build.
  • Layer the defences: least privilege limits what a hijack can achieve, real human checkpoints stop the consequential action, and security is a shared responsibility you cannot wholly hand to a vendor. And attack your own agent — continuously — before someone else does. You can coach a person to be suspicious; you must constrain an agent instead.

Questions to ask your organisation

  • For each agent: could it be manipulated through the content it reads — an email, a document, a web page, a record — and if it were, what is the worst it could be made to do with the access it has?
  • Does any single agent hold all three legs of the lethal trifecta at once — sensitive data, untrusted input, and the ability to act externally? If so, what is our plan to break that by design, rather than to filter it?
  • Are we relying on a filter or guardrail as our protection against prompt injection? If so, are we comfortable that it will fail a meaningful share of the time against someone who is trying hard?
  • Do we attack our own agents — red-team them — before launch and on an ongoing basis? Do the attacks we find become permanent tests, the way our other failures do?
  • Which security risks do we assume the model or platform vendor has handled, and have we confirmed that, or merely hoped it? Where are the gaps between what they own and what we own?

Chapter 11 — "Deployment: Shadow, Canary, Progressive Autonomy" — is the last stretch of the method. The agent is now evaluated, bounded, supervised, and secured, but none of that is proof it will behave in the real world. This chapter is about earning production in graduated steps: running the agent in parallel before it acts, releasing it to a small slice of reality first, and turning up its autonomy only as the evidence earns it — the probation period of a new hire, applied to a system.

Chapter 11: Deployment — Shadow, Canary, Progressive Autonomy

What this chapter equips you to decide - Why you do not flip a switch from sandbox to live, and what it means to earn production in graduated steps instead. - How to run an agent in parallel before it acts, so you learn how it behaves on reality without risking anything. - Why the gate between each step is evidence you defined in advance — not a launch date, and not the mood in the room. - How to avoid the two opposite ways deployment fails: the reckless launch and the pilot that never ends.


You do not flip a switch

The agent is evaluated, bounded, supervised, and secured. The instinct now — and the pressure from everyone waiting on the value — is to turn it on. Announce it. Roll it out to everyone at once.

Resist that, because of a fact this book keeps returning to. Everything you know about the agent so far comes from cases you imagined. Your evaluation harness, however good, tested the agent against your best guess at reality. It has not yet met reality itself: the full, strange, unbounded distribution of what real people will send it. You do not yet know how it behaves out there, and a launch is an expensive way to find out.

So you do not flip a switch from sandbox to live. You earn production, in graduated steps, each one gated by evidence and each one survivable if it fails. This chapter is that staircase.

The two ways deployment fails

Before the staircase, the two ways of falling off it. They look like opposites, but they share a cause.

The first is the big-bang launch. Under pressure to deliver, the organisation switches the agent on for everyone, all at once, on a chosen date. If it works, you learn nothing a smaller step would not have taught at less risk. If it fails, it fails at full scale, in public, on real people, with no containment. The whole bet rides on a first day you have no evidence about.

The second is the opposite, and just as common: the pilot that never ends. The agent goes into a limited trial and stays there — a year later it is still "in pilot," touching the same small group, while everyone waits for a confidence that never arrives. This is pilot purgatory from Chapter 1, seen from the deployment side. The value is never captured, and the caution curdles into paralysis.

These look like recklessness and timidity, but they are the same mistake: no one decided, in advance, what each step of deployment had to prove in order to move to the next. Without that, you either skip the steps or get stuck on one forever. The cure for both is the discipline this chapter describes — defined stages, with a defined bar to clear between them. Get that right and deployment is neither a leap nor a limbo. It is a climb.

Shadow mode: run it in parallel before it acts

The first step lets the agent meet reality while risking nothing. In shadow mode, the agent runs on real, live inputs and produces its outputs — but it takes no real action. Its work goes nowhere. You compare what it produced against what happened, or against what your people did with the same inputs.

This is how you find out how the agent behaves on the real distribution before a single citizen is affected. The cases you never imagined arrive, and you see how the agent handles them — without risk, because its outputs are being measured, not used. And the comparison gives you the bar from Chapter 5 in its most honest form: not the agent against an abstract ideal, but the agent against the human process it might replace.

The clearest public example is from the UK government. Its "Consult" tool, part of a suite of AI aids for civil servants, was built to analyse the thousands of responses that public consultations generate. It was not switched on in place of the existing process. It was run alongside it — on a real consultation, in parallel with the human analysts doing the same work — and its results were compared against theirs. The agent ranked the key themes much as the human team did. Only on the strength of that measured, side-by-side result did the wider rollout proceed. The prize was large — the tooling was projected to save the equivalent of tens of thousands of analyst-days a year — but the prize was not the reason to trust it. The parallel run was.

For the public sector especially, shadow mode is close to a gift. It lets you prove an agent matches or beats your current process before it ever touches a member of the public — which is the evidence an auditor, a minister, or a sceptical union will ask for. You are not asking anyone to take the agent on faith. You are showing them the comparison.

Canary: let it act, but small and reversible

Once the agent has earned trust in shadow — matching the baseline on the real distribution — you let it act. Not everywhere. On a small, contained slice of real traffic, taking low-risk, reversible actions, under heightened watch. This is often called a canary release, after the bird that warned miners of danger ahead: a small, sensitive first exposure that surfaces trouble before it reaches everyone.

Two principles shape the canary step, and both are familiar. Keep the slice small, so that if something goes wrong it goes wrong for few people and you catch it fast. And let the agent act on the reversible things first — the same reversibility logic that chose your use case in Chapter 4, now sequencing the rollout within it. An agent earns its first real actions on the work that can be undone, not the work that cannot.

The canary reveals something shadow mode cannot. In shadow, the agent's outputs were watched but never used — each draft compared against an officer's, then set aside. In the canary they are real: the draft is sent, the action is taken. An agent can look reliable when its work is only measured and still stumble once that work has real consequences — a reply that reads well on the screen but prompts a confused follow-up the agent then mishandles. Looking right under inspection and working right in the world are not the same thing, and the small, watched slice is where you would rather find the gap — on a few recoverable cases, not across everyone at once.

Progressive autonomy: turn the dial up on evidence

From the canary, you widen. More of the traffic, more kinds of action, higher up the risk tiers — and, as it earns the right, less human oversight on the things it has proven it can handle. This is the autonomy dial from Chapter 2, turned up a notch at a time.

The discipline lives in what governs each notch. The gate between stages is evidence, not the calendar and not the mood in the room. You move up when the measurements — from the evaluation harness, against the bar you set in advance — clear the threshold for the next stage, held for long enough to be real. You do not move up because the deadline arrived, because the demo went well, or because everyone is keen. The bar is set before the stage, so that graduating is a finding, not a feeling.

And the staircase goes down as well as up. Progressive autonomy is not a ratchet. If the online evaluation from Chapter 7 shows the agent drifting, or the canary surfaces a problem at the wider slice, you step back down — narrow the traffic, restore the human checkpoint, return autonomy you had granted. This is the revocable delegation from Chapter 9 made operational. An organisation that can only turn autonomy up has not built a deployment process; it has built a one-way bet. The ability to roll back a stage, fast and without drama, is what makes turning the dial up safe.

Probation, and the right to fail it

There is a simpler way to hold all of this, and it is the motif this book has been building toward. An agent is a new hire, and deployment is its probation.

Think about how a capable organisation brings on someone new for an important role. They do not hand over full responsibility on the first morning. The new hire shadows an experienced colleague and watches how the work is done. Then they take on small, low-stakes tasks under close supervision. Then, as they prove themselves, they are trusted with more, and the supervision eases. The arrangement has a name — probation — and everyone understands that it is a period of earning trust through demonstrated work, not a formality that passes with time.

Every step of this chapter is that, applied to a system. Shadow mode is the new hire watching and being watched, doing the work but not yet acting on it. The canary is the first small, supervised, reversible responsibilities. Progressive autonomy is the widening trust as the work proves itself. And the gate between stages — evidence, not time — is what distinguishes real probation from the kind that lapses because the weeks went by.

The most important thing probation carries, which a launch date does not, is the right to fail it. A new hire who does not meet the bar is not promoted; they get more supervision, or the role does not work out. The same must be true of your agent. If it does not clear the bar in shadow, it does not get the canary. If the canary surfaces trouble, it does not get the wider rollout. Deployment that cannot say no at any stage is not deployment; it is a launch wearing a costume. The power to hold an agent at a stage, or send it back down one, is not a failure of the process. It is the process working.

The honest answer to "when do we go live?"

This chapter gives you the answer to the question every agent programme eventually faces, from someone impatient. When does it go live?

The honest answer is not a date. It is: "In stages — and here is what each stage has to prove before it earns the next." That answer is more useful than a date, because it is true, and because it gives everyone — the board, the minister, the team, the public — a way to see progress that is real rather than announced. It is also the same shape as the answer Chapter 4 gave the board that wanted the flagship now: not "no," but "here is the climb, and here is how each step earns the one above it."

The agent is now live — earned into production, a stage at a time. But getting it there is not the end. A running agent is a living system that must be watched, costed, and, when something goes wrong, stopped and accounted for. What production demands, day after day, is the subject of the final chapter of the method.


A worked decision

The drafting agent is ready, and the question left open since the evaluation chapters returns in full force: roll it out to the whole division. The backlog is acute, and there is real pressure to switch every officer over to it at once and clear the queue.

There is also a quieter risk in the other direction. The agent has been with one pilot team for the better part of a year. No one has defined what would justify expanding it, so it has stayed where it is — useful to a few, invisible to the rest.

Both pulls are the same error, and the chapter resolves them the same way. You deploy in stages, with a bar defined in advance for each.

First, shadow. Run the agent across the division's real incoming enquiries in parallel with the officers, drafting replies that no one sends, and compare its drafts against what officers write. The bar to clear: it matches the officers' quality on the real mix of enquiries, including the awkward ones. No citizen is touched while you learn this.

Then, canary. For one team, let the agent's drafts go forward for real — still officer-reviewed, still reversible, under close watch. The bar: quality holds, and officers' edits stay within an expected range, over a defined period.

Then, progressive autonomy. Widen team by team. For the routine categories the harness has proven reliable, taper the human review toward sampling — earned delegation — while keeping a real checkpoint on the high-stakes and unusual ones. And keep the staircase two-way: if the online evaluation shows drift, step back down.

Each move is gated by evidence you set beforehand, not by the backlog's urgency or the calendar. That is the answer you give the division head who wants it switched on everywhere by quarter-end: not "no," and not "it's still in pilot," but "here is the climb, here is what each step must prove, and here is roughly how fast the evidence will let us go." It captures the value the backlog demands — without betting the division's citizens on a first day you have no evidence about.


Key takeaways

  • You do not flip a switch from sandbox to live. Everything you know so far comes from cases you imagined; production is reality, and a launch is an expensive way to first meet it. You earn production in graduated, evidence-gated, survivable steps.
  • Deployment fails in two opposite ways — the big-bang launch and the pilot that never ends — and both come from the same omission: no one defined, in advance, what each step had to prove to move on. Defined stages with defined bars cure both.
  • Shadow mode runs the agent on real inputs while it acts on nothing, so you learn how it behaves on reality — measured against the human baseline — before anyone is affected. The UK's Consult tool was validated this way, run in parallel with human analysts before any wider rollout.
  • Canary lets the agent take real but small, reversible actions on a contained slice under heightened watch. Progressive autonomy then widens the slice and raises the risk tier — but every step is gated by evidence, not the calendar, and the staircase goes down as well as up when evaluation shows trouble.
  • Deployment is the new hire's probation: shadow, then supervised, then trusted — earned through demonstrated work, not the passage of time. Its essential feature is the right to fail it. A deployment that cannot hold an agent at a stage, or send it back, is a launch in costume.

Questions to ask your organisation

  • Are we planning to switch this agent on all at once, or to earn production in stages? If all at once, what are we betting on a first day we have no evidence about?
  • Have we run the agent in shadow — on real inputs, acting on nothing — and compared it against the human process it would replace? Could we show that comparison to an auditor or a sceptic?
  • For each stage of rollout, did we define the bar to graduate before the stage began — or are we deciding to expand based on the deadline, the demo, or the level of enthusiasm in the room?
  • Can our deployment go backwards? If the agent drifts or the wider slice surfaces a problem, can we narrow the traffic, restore human review, and reclaim autonomy fast — or have we built a one-way ratchet?
  • Is our pilot on a path to expand on evidence, or has it parked — useful to a few, with no defined trigger to grow? If it has parked, what bar would justify the next step?

Chapter 12 — "AgentOps: Day-2 Operations" — closes the method. A live agent is a living system: it drifts, it costs money every second, and it will eventually do something you need to stop and explain. This chapter is what production demands day after day: watching the agent, controlling its costs, being able to halt it instantly, and reconstructing what it did and on whose authority when someone asks.

Chapter 12: AgentOps — Day-2 Operations

What this chapter equips you to decide - Why getting an agent live is the start of the work, not the end of it — and why operating it is a permanent cost, not a project that closes. - The two sentences you must always be able to say about a production agent: we can stop it instantly, and we can prove exactly what it did. - Why an agent is the first kind of software that can run up an unbounded bill on its own, and what it takes to cap it. - Why you will eventually have an agent incident, and what separates an organisation that can contain and explain it from one that cannot.


Launch is where the meter starts running

Most organisations treat going live as the end of the job. The agent is evaluated, bounded, supervised, secured, and rolled out in stages. The project is done. The team moves on.

This is the last and quietest way an agent programme fails. The day an agent goes live is the day the meter starts running — and almost everything that decides whether it stays trustworthy happens after.

A production agent is a living system. It drifts, as Chapter 7 showed. It costs money every second it runs. It is exposed to attack every day, not just on launch day. And sooner or later it will do something you need to stop and explain. The work that got it live handles none of that. The work of running it does — the unglamorous, permanent discipline of watching an autonomous system, keeping it under control, and being ready for the day it goes wrong.

This work has a name now: AgentOps, the operations of running agents. It is the difference between an agent that stays trustworthy on day two hundred and one that, unwatched, is no longer the system you built. This chapter is the handful of capabilities it requires, each of which is a decision for the person accountable for the agent.

You cannot operate what you did not record

Everything in this chapter rests on one foundation: the agent's behaviour has to be recorded as it happens. Every input it received, every step it took, every tool it called, every action, every error, every delay — captured as you go, in a form you can search and replay.

This is observability, and it is not optional plumbing. It is the precondition for all the rest. You cannot investigate an incident you have no record of. You cannot run the online evaluation from Chapter 7 without the production behaviour to score. You cannot prove what the agent did to an auditor if you did not capture it. An agent you cannot see is an agent you cannot operate, debug, evaluate, or account for. The rule is blunt: no record, no operations.

The decision for a leader is to insist this is built in from the start, not retrofitted after the first incident exposes its absence. By then the very event you most need to reconstruct is the one you have no record of.

The agent can spend your money on its own

Here is a risk conventional software does not have. An agent decides its own actions, which means it decides how much work to do — and work costs money. Each model call and each tool use has a price. An agent that gets into a loop, or over-consults its tools, or faces a surge of demand, can run up a bill at machine speed, overnight, with no human deciding to spend a cent of it.

This is new. Traditional software costs what it costs; its consumption is predictable because its behaviour is fixed. An agent's consumption is as variable as its behaviour, and its behaviour is not fully predictable. An agent is the first kind of software that can run up an unbounded bill on its own.

So you cap it. Per-run limits, per-workflow budgets, rate limits, and hard spending thresholds that stop the agent when a ceiling is hit. These are not finance niceties; they are operational controls, and an agent without them is a standing invitation to a budget incident that no one chose. The leader's question is whether the agent can spend without limit — and if the answer is yes, that is the finding.

Design the off-switch before the on-switch

The single most important operational control is the ability to stop the agent — instantly, completely, on demand. This is the kill switch, and it deserves to be designed before the agent is ever turned on.

It has two forms, and you need both. The automatic kind trips when a monitored measure crosses a limit you set — quality falling, error rates rising, costs spiking, behaviour going out of bounds. The manual kind lets a human halt the agent the moment they judge they need to, without raising a ticket or waiting for a deploy. DBS, running AI across a regulated bank, does this: real-time metrics with defined limits for its higher-risk applications, and automated kill switches that activate the instant a limit is breached. Not a human eventually noticing a dashboard — an automatic halt.

The principle behind it is the one to remember: you must be able to stop the agent faster than it can do damage. An agent acts at machine speed. A control that depends on a person spotting a problem, escalating it, and arranging a fix is too slow by an order of magnitude. The off-switch has to be as fast as the thing it is stopping — which is also why the kill switch and the monitoring are one system. A switch you cannot reach in time, because you did not see the problem, is not a switch.

Be able to prove what it did

When something goes wrong — and on a long enough timeline, something will — the question that follows is not "what did the system do?" It is "what did the agent do, on whose authority, and why?" You need to be able to answer it, in detail, after the fact.

That requires an audit trail: a durable, tamper-resistant record tying every consequential action to the specific agent that took it — the per-agent identity from Chapter 8 — and back through the chain to the human who authorised that capability. This is what closes the accountability gap from Chapter 3. It is the difference between "the agent did something" and "we can show precisely what it did, when, on whose authority, and on what basis."

For a public body this is not a technical nicety; it is the core of being able to face oversight. "We could not reconstruct what the agent did" is not a sentence that survives a committee of inquiry. "Here is the complete record of every action, traced to the agent and the official accountable for it" is. The audit trail is what lets an agent operate in the state's name and still be answerable for what it does — which, since Chapter 3, has been the point.

You will have an incident — rehearse for it

Put the pieces together and you can do the thing that defines a mature operation: respond to an incident on purpose, rather than improvise in a crisis.

An agent incident has the same shape as any other, and you should have the playbook written before you need it. Detect: the monitoring tells you something is wrong — ideally before a citizen does. Contain: trip the kill switch, revoke the agent's access, stop the bleeding. Investigate: replay the recorded traces to establish exactly what happened and how far it reached. Remediate: fix the cause, and — closing the loop from Chapter 7 — turn the failure into a permanent test case so it cannot recur unnoticed.

The leader's decision here is to treat an agent incident as a when, not an if, and to rehearse for it accordingly. The organisation that has done this contains and explains a problem in hours. The one that treated launch as the end of the job discovers, in the worst moment, that it cannot see what happened, cannot stop it cleanly, and cannot tell anyone how far it spread. The difference between those two organisations is not luck. It is the operating discipline in this chapter, built before it was needed.

Management does not end at onboarding

The motif that has run through this book gives the simplest way to hold all of this. An agent is a new hire — and you do not onboard a new colleague and then never manage them again.

Real management continues for as long as the person works for you. There is ongoing supervision, attention to performance, the ability to step in, a record of the work, and a process for when something goes wrong. AgentOps is that — ongoing management — for a worker who never sleeps and acts at machine speed. And that last difference is why the management has to be instrumented and automated rather than personal and occasional: no human manager can watch a machine-speed worker in real time, so the watching, the limits, and the off-switch must be built into the system itself. An agent needs more instrumented management than a person, not less, because it is faster and has no self-restraint to fall back on.

It also means every production agent needs a named owner — a person accountable for running it, the way every employee has a manager. An agent with no owner is the orphan from Chapter 8 in a new guise: running, costing, acting, and answerable to no one. Keeping order across many such owners and agents is where this book turns next.

The two sentences, and the end of the method

Strip this chapter to its core and you get two sentences you must always be able to say about any agent you run: we can stop it instantly, and we can prove exactly what it did. If either is not true, you are not operating the agent. You are hoping about it. Everything in the chapter — the recording, the cost caps, the kill switch, the audit trail, the rehearsed response — exists to make those two sentences true on day two hundred as surely as on day one.

And with that, the method is complete. Across Part II you have taken a single agent from a sandbox to a trustworthy life in production: chosen where to point it, built the evaluation that earns trust in it, bounded what it can reach, kept a real human on the consequential path, secured it against attack, deployed it in earned stages, and stood up the operations to keep it under control while it runs. You can now do, for one agent, the whole of what getting out of the sandbox requires.

Which raises the question that the rest of the book answers. Almost nothing above was about one agent in isolation — it was about the capabilities and decisions one agent demands. What happens when you have ten? Fifty? When every team wants one, and governing each by hand would consume the organisation? Doing this once is the method. Doing it many times, without your governance cost rising as fast as your agent count, is the operating model — and that is Part III.


A worked decision

The drafting agent has been live across the division for months, running well. One afternoon an automatic alarm fires: the agent is producing replies on one category of enquiry at several times its usual rate, and the online judge's scores on those replies have collapsed. This is not the slow drift of an earlier chapter; it is an acute break, happening now, at volume.

Because the operations are real, the response is fast and contained. The alarm is the detection — no one waited for a complaint. You pull the kill switch on that category, halting the agent there in seconds while the rest keeps running, and begin to investigate. Replaying the recorded traces shows what happened: an unusual run of malformed enquiries had tipped the agent onto a brittle path, where it produced confident, wrong replies and tried to send them. The audit trail tells you precisely how many citizens received one, and which officers had reviewed them — so you can state the blast radius in numbers, not guesses, and reach the people affected. You fix the path, add it to the harness as a permanent test case, and bring the agent back on that category only once it clears.

Hold the episode against the two sentences. You could stop it instantly, and you could prove exactly what it did. Now picture the same afternoon at an organisation that treated going live as the end of the job: no alarm, so the first signal is a slow accumulation of complaints weeks later; no clean way to halt only the affected behaviour; no traces to replay; no way to say how many citizens were affected, or to reach them. The same event — an agent meeting an input it mishandled — is a contained afternoon for one organisation and a reportable failure for the other. The difference is the operating discipline this chapter describes, built before it was needed.


Key takeaways

  • Launch is where the meter starts running, not the end of the job. A production agent drifts, costs money every second, is attacked daily, and will eventually do something you must stop and explain. Operating it is a permanent capability, not a project that closes.
  • You cannot operate what you did not record. Full observability — every input, step, tool call, action, and error captured — is the precondition for evaluation, debugging, audit, and incident response alike. No record, no operations.
  • An agent is the first software that can run up an unbounded bill on its own. Cap it with per-run limits, budgets, rate limits, and hard spending thresholds, or a loop overnight becomes a budget incident no one chose.
  • Design the off-switch before the on-switch. You need automatic kill switches that trip on breached limits and a manual halt a human can pull instantly — because you must be able to stop an agent faster than it can do damage. And you must be able to prove what it did: a durable audit trail tying every action to the agent and the human who authorised it, which is what makes the agent answerable to oversight.
  • You will have an incident; rehearse for it — detect, contain, investigate, remediate, and feed the failure back as a permanent test. Management does not end at onboarding: an agent is a worker that never sleeps and acts at machine speed, so its management must be instrumented and automated, and every production agent needs a named owner. The two sentences you must always be able to say: we can stop it instantly, and we can prove exactly what it did.

Questions to ask your organisation

  • For each production agent, can we currently say both: we can stop it instantly, and we can prove exactly what it did? If either is not true, we are hoping about the agent, not operating it.
  • Do we record enough of what our agents do to replay an incident after the fact? If something went wrong last week, could we reconstruct it — or would we be guessing?
  • Can any of our agents spend without limit? What caps and thresholds stop a looping or surging agent from running up a bill no one authorised?
  • Do we have an automatic kill switch that trips on breached limits, and a manual one a human can pull at once? Could we stop a misbehaving agent faster than it could do harm?
  • Do we have a written incident playbook for our agents, and have we rehearsed it? Does every production agent have a named owner accountable for running it — or do we have orphans, running and answerable to no one?

Part II is complete: you can take one agent from sandbox to a governed life in production. Part III — "The Operating Model" — turns from the single agent to the portfolio. Chapter 13, "Governance as a System," asks how you govern not one agent but fifty: classifying them by risk, keeping a register of every one you run, and building the review and accountability structures that let you scale without your governance cost rising as fast as your agent count.

Part III — The Operating Model

Part II got a single agent from the sandbox to a governed life in production. Part III is about doing that many times over — governing a whole portfolio, building a platform so good governance is inherited rather than rebuilt, reshaping the organisation around this new kind of worker, and meeting the regulator. It begins with the problem that arrives the moment the method works: you no longer have an agent. You have a growing number of them.


Chapter 13: Governance as a System

What this chapter equips you to decide - Why governing fifty agents is a different problem from governing one — and why hand-governing each will either collapse under its own weight or get skipped. - Why the first thing to build is a register of every agent you run, because you cannot govern what you cannot see. - How to right-size governance to each agent's risk, so you neither smother the harmless ones nor under-watch the dangerous ones. - Why you should adopt a recognised framework rather than invent your own, and what doing so buys you.


From one agent to fifty

Everything in Part II was the work of getting one agent right. But the method has a consequence its authors rarely plan for: when it works, it does not stay at one. The first successful agent is the most persuasive business case anyone has ever seen, and soon every team wants one. You go from a single agent you know intimately to a portfolio you are struggling to keep track of.

Governing a portfolio is not governing an agent many times over. It is a different problem, and organisations fail it in two opposite ways.

The first failure is governance that does not scale. Each agent is reviewed by hand, in a bespoke process, by the same small group of senior people. This works for three agents and breaks at thirty. The queue grows, the reviews slow, and the governance function becomes the bottleneck everyone resents and eventually routes around.

The second failure is the routing-around itself: governance that does not happen. Teams build and buy agents without telling anyone, because the official path is too slow. Now you have agents in production that no one has assessed, no one is watching, and no one is accountable for. That is the sprawl.

Both failures share a cause. The organisation is trying to govern each agent as a one-off, when what it needs is a system — a repeatable way of governing any agent, so that handling the fiftieth costs barely more than the fifth. That system has a few parts, and the encouraging finding from organisations that have built it is that it does not slow AI adoption down. It speeds it up, because a clear, fast, predictable path to approval is what stops teams from going around you.

You cannot govern what you cannot see

This book has built a small ladder. You cannot govern what you cannot picture, which is why Chapter 2 taught you to picture an agent. You cannot govern what you cannot measure, which is why evaluation runs through the middle of the book. At portfolio scale, a third rung appears, and it is the foundation of everything else here: you cannot govern what you cannot see.

So the first thing to build is not a policy. It is a register — a single, living inventory of every agent the organisation runs. For each one it records the same handful of facts: what it does, who owns it, what data it can touch, what it is allowed to do, which model it runs on, its risk tier, and its current status. Nothing exotic. The truth about what you have, in one place, kept current.

The reason this comes first is that every other part of governance depends on it. You cannot apply a control to an agent you do not know exists. You cannot review a portfolio you cannot list. You cannot answer an auditor, or a minister, who asks "how many AI agents are making decisions about citizens, and who is accountable for each?" if the honest answer is that nobody knows. The people who govern AI well are near-unanimous on this: if you do only one thing to improve your AI governance, build the inventory.

The register is also where two problems from earlier in the book come home to roost at scale. Shadow agents — the ones built or bought without telling anyone — are invisible until the register goes looking for them, which is why building it always turns up more than leadership expected. And the orphans from Chapters 8 and 12 — agents whose owning team has since reorganised, still running, still acting, accountable to no one — are what a register surfaces and a missing register hides. For agents specifically, the inventory should be live, not a once-a-year spreadsheet, and tied to who owns each one and how it is performing.

There is a simpler way to say what the register is. It is the org chart for your AI workforce. If an agent is a new hire, the register is the basic HR record every organisation keeps about everyone it employs: who they are, what they are authorised to do, and who their manager is. No serious organisation employs people it has no record of. It should not run agents it has no record of either.

Right-size the oversight

Once you can see the portfolio, the next question is how hard to govern each agent. The wrong answers are "all of them, heavily" and "all of them, lightly." The right answer is that governance intensity should match the agent's risk.

This is risk tiering, and it is the portfolio-level version of two ideas you have already met. Chapter 4 chose use cases by how much damage a wrong action could do; Chapter 9 set oversight by the same variable per action. Risk tiering applies that logic across the whole estate. You classify each agent — low, medium, high — on a few plain dimensions: how bad the worst plausible outcome is, how much the agent acts on its own, and how sensitive the domain it works in. Then you attach a proportionate set of controls to each tier: what review it needs before approval, how closely it is monitored, how often it is re-checked, how heavy its audit requirements are.

The phrase to hold onto is that not every agent needs the same oversight, but every agent needs the right oversight. A read-only agent that drafts internal summaries does not need what a citizen-facing agent making consequential decisions needs. Treat them the same and you make one of two mistakes. Either you under-govern the dangerous one, or — more common, and more insidious — you smother the harmless one in process. That second mistake has a name, overfitting governance, and it is how well-intentioned programmes kill their own adoption. Every needless approval on a trivial agent is another reason for the next team to go around you.

The new-hire lens makes the tiering intuitive. You do not supervise a junior clerk and a treasury trader the same way, because the trader can lose the bank money the clerk cannot touch. You match the oversight to what the person can do. Risk tiering is the same judgement, applied to a workforce that happens to be software.

One caution: a tier is not permanent. An agent's risk can change when it is updated, given new access, or moved to a new task — so classification is reviewed on significant change, after any incident, and on a regular cadence, not set once and forgotten.

A consistent path to "yes"

A register tells you what you have. Tiering tells you how hard to govern each one. The third part is the path every agent travels from idea to production — and the body that owns it.

The path is a lifecycle, and its value is that it is the same every time: an agent is registered at intake, classified by risk, documented, validated against the evaluation discipline from Part II, approved, and then monitored on a cadence set by its tier. The point is not the specific steps. It is that the steps are repeatable and auditable rather than improvised — so that a team bringing a new agent knows what is required, and an auditor can see that every agent went through the same gates.

Owning that path is a cross-functional review body — what many organisations call a Responsible AI committee. It is not a technology committee; it brings together the people who carry the different risks: security, legal, risk, privacy, the business, and a senior owner who chairs it. Its job is to hold the organisation to a consistent standard and a stated risk appetite — the explicit lines the organisation will not cross, such as no automated decision against a citizen without a human review and a route of appeal. DBS runs a version of this: a single bar every AI use case must clear — purposeful, unsurprising, respectful, explainable — overseen by a responsible-AI committee from conception through deployment.

The trap to avoid is letting the committee become the very bottleneck this chapter warned about. A good review body makes "yes" fast and predictable for the common, low-risk case, and reserves its real scrutiny for the high-risk ones. It is the role-approval function for your AI workforce — and like good hiring, most of it should be quick, with the careful deliberation saved for the consequential posts. A committee that reviews every agent with equal intensity has rebuilt the failure of governance-that-does-not-scale inside a meeting room.

Underneath all of it sits the rule from Chapter 12, now made universal: every agent has a named owner, the way every employee has a manager. An agent with no owner is not governed, whatever the register says about it.

You don't have to invent this

A leader reading the last three sections might reasonably worry that building all this from scratch is a vast undertaking. It is not, because you do not start from a blank page. The world has already converged on a small set of frameworks that fit together, and the sensible move is to adopt and adapt rather than invent.

Three are worth knowing by name. The NIST AI Risk Management Framework gives you the process — four functions, govern, map, measure, and manage, with governance as the cross-cutting foundation that runs through the rest. It is voluntary and widely treated as the reference. ISO/IEC 42001 gives you the management system — the world's first certifiable AI governance standard, built on the model of ISO 27001 for information security. Its significance for a leader is that word, certifiable: it turns your governance from a set of good intentions into an auditable system an independent body can verify, the way a security certification works today. And the local and sector layer — in Singapore, the IMDA frameworks and GovTech's practitioner guidance — translates the global standards into your specific context.

These harmonise rather than compete; the standards bodies have mapped them to each other on purpose. The decision for you is not which one to back, but to adopt the recognised scaffolding instead of improvising your own — both because it is faster, and because "we follow NIST and we are working toward ISO 42001 certification" is a far stronger answer to a board or a regulator than "we wrote our own approach." (The regulation that increasingly sits on top of these frameworks — the EU AI Act, sector rules, and the assurance regimes — is consequential enough to need its own treatment, in Chapter 16.)

The public sector's portfolio: Singapore

The clearest worked example of governing an agent portfolio as a system, rather than a pile of one-off approvals, is a government — Singapore's — and it is worth examining, because it is the case nearest your own situation.

Singapore approached agentic AI in the public sector as a coordinated programme, not a scatter of agency experiments. GovTech publishes practitioner guidance — a Responsible AI Playbook for public officers, built around a consistent set of principles including safety, fairness, robustness, security and privacy, explainability, and transparency — so that agencies govern to the same standard rather than each inventing their own. It published an Agentic AI Primer to guide agencies deploying autonomous systems, and pushed a cross-agency execution strategy to stop every agency from solving the same problems in parallel and to encourage shared solutions. This is portfolio thinking at the scale of a state: a common standard, common guidance, and coordination so that governance and capability are built once and reused, not rebuilt in every ministry.

Two details from that programme reinforce arguments this book has made throughout. First, the Singapore guidance insists that rigour precede rollout — that agencies define robust success metrics before a pilot begins, which is the measurability of Chapter 5 and the defined bars of Chapter 11, now written into national practice. Second, and striking for our running theme: GovTech's own guidance models AI agents as government officers, each handling a specialised task, and sequences deployment from internal work toward citizen-facing services. The new-hire framing this book has used as a lens is, in Singapore's public service, close to literal doctrine. The government is governing its agents as a workforce.

The public sector also carries obligations a private portfolio does not, and they sharpen the case for a system. A government agent's decisions touch citizens' rights, fall under public-records and transparency rules, and must be defensible as fair across different groups of people. None of that is achievable agent by agent, from memory. It requires the apparatus this chapter describes — a register you can produce on demand, consistent risk classification, a documented path every agent travelled, and a named officer accountable for each. For government, governance as a system is not an efficiency. It is the condition of being able to answer for what your agents do.

A system, not a pile of rules

Pull the chapter together and the shape is clear. To govern a portfolio you build a system, not a stack of one-off judgements: a living register so you can see every agent, risk tiering so you govern each in proportion to its danger, a consistent lifecycle and a review body so approval is repeatable and accountable, a named owner for every agent, and a recognised framework underneath so you are standing on tested ground. Done well, this is what lets you say yes to the fiftieth agent almost as easily as the fifth — and what stops the portfolio from becoming a sprawl no one can account for.

But notice what this system still costs. Even at its best, it governs each agent by routing it through a process — intake, classification, review, monitoring — and every team still has to follow that process, every time. That is repeatable, but it is not free. The next and final leap of the operating model is to make good governance not something every team performs, but something they inherit — built into the platform they build on, so that the safe path and the easy path are the same path. That is the platform play, and it is the subject of the next chapter.


A worked decision

You are the CIO of a large agency, and you decide to find out how many AI agents the organisation runs. You expected a handful. The first honest inventory turns up more than thirty — some built by teams, some bought into existing products, several you had never heard of. A few are orphans: built two reorganisations ago, still running, with no current owner. None share a common risk classification. There is no single list of who is accountable for what.

The instinct is to slam the brakes: freeze everything, and review all thirty yourself. Resist it. Reviewing thirty agents by hand is the governance-that-does-not-scale failure, and freezing everything teaches every team that the way to move fast is to not tell you next time — which is the sprawl failure. You would trigger both failures at once.

The decision is to build the system instead, in order. First, complete the register, shadow agents included — you cannot govern what you cannot see, and you have just proved you could not see. Second, tier what you found by risk, so your attention goes to the few citizen-facing, consequential agents and not to the many trivial internal ones. Third, stand up the lifecycle and the review body, so the thirty-first agent travels a clear, fast, repeatable path rather than a bespoke negotiation. Fourth, assign every agent a named owner, retiring or re-homing the orphans. And adopt a recognised framework to sit under all of it, rather than inventing your own.

Resist the urge to put heavy process on all thirty. The low-risk internal agents need a light touch and a named owner, nothing more; spend your scarce governance attention where the risk is. The goal is not to make AI hard. It is to make the safe path clear enough, and fast enough, that no one has a reason to go around it.


Key takeaways

  • Governing a portfolio is a different problem from governing one agent. It fails in two opposite ways — governance that does not scale (and becomes the bottleneck) and governance that does not happen (so agents proliferate ungoverned). The cure is a repeatable system, not bespoke judgement per agent.
  • You cannot govern what you cannot see. The first thing to build is a living register of every agent — owner, purpose, data, permissions, model, risk tier, status. It surfaces shadow agents and orphans, and it is the org chart for your AI workforce. If you do one thing, build it.
  • Right-size the oversight with risk tiering: governance intensity matches each agent's risk. Not every agent needs the same oversight, but every agent needs the right oversight — and smothering low-risk agents in process ("overfitting governance") kills adoption as surely as under-governing dangerous ones.
  • Build a consistent path to "yes" — a repeatable, auditable lifecycle and a cross-functional review body that holds a stated risk appetite. Its job is to make approval fast for the common case and reserve real scrutiny for the high-risk one; a committee that reviews everything equally rebuilds the bottleneck. Every agent gets a named owner.
  • You don't have to invent this. Adopt the recognised scaffolding — NIST AI RMF for the process, the certifiable ISO/IEC 42001 for the management system, and local frameworks (IMDA, GovTech) for context. For the public sector especially, this apparatus is the condition of being able to answer for what your agents do.

Questions to ask your organisation

  • Could we produce, today, a complete list of every AI agent we run — including the ones bought inside other products and the ones built by teams that have since moved on? If not, we cannot govern what we cannot see.
  • Do we govern every agent with the same intensity? If so, are we under-watching the dangerous ones, smothering the harmless ones, or both — and is our process the reason teams build agents without telling us?
  • Is there a clear, fast, repeatable path for a team to get a new agent approved and into production — or is each one a bespoke negotiation the queue is teaching people to avoid?
  • Does every agent in production have a named owner accountable for it, the way every employee has a manager? How many orphans would an honest inventory turn up?
  • Are we standing on a recognised framework — NIST, ISO/IEC 42001, the local guidance — or improvising our own approach? Which answer would we rather give our board and our regulator?

Chapter 14 — "The Platform Play" — is the most important strategic move in the operating model. A governance system, however good, still asks every team to follow a process. The platform play makes good governance something teams inherit instead of perform: the evaluation harness, the scoped access, the monitoring, the audit trail, and the kill switch built into the foundation every agent is built on — so that for any individual team, doing the right thing is the default, not the discipline.

Chapter 14: The Platform Play

What this chapter equips you to decide - Why a governance system that asks every team to follow a process still does not scale — and how a platform changes that. - Why you cannot make AI safe by asking fifty teams to be careful, and what it means to build the care into the ground they stand on instead. - When the portfolio is large enough to justify building (or buying) a platform, and how to make that decision. - Why a platform teams route around is worse than no platform — and what makes the safe path also the easy one.


From repeatable to inherited

The last chapter built a governance system: a register, risk tiers, a review body, a consistent path every agent travels. That system is a genuine achievement, and most organisations never reach it. But it has a ceiling — and this chapter is about breaking through it.

A system makes the right thing repeatable. Every team follows the same process. But following a process is still work, done by each team, every time — and work that is done by hand can be done slowly, done badly, or skipped. The team building the thirty-first agent still has to implement its own evaluation, wire up its own access controls, stand up its own monitoring, and build its own audit trail, before the review body checks whether they did it right. Multiply that by every team and every agent, and you are paying for the same controls to be rebuilt differently each time, at varying quality, again and again.

The platform play is the leap from repeatable to inherited. Instead of every team building the controls and the process checking them, you build the controls once — into the foundation that every agent is built on — and teams get them by default. Evaluation, scoped access, monitoring, audit trails, the kill switch: provided, not rebuilt. The governance is not something each team performs. It is something they inherit by building on the platform at all.

The principle behind it is simple enough to put on a wall: make the safe path and the easy path the same path. When the most governed way to build an agent is also the fastest and least effortful way, governance stops being a tax that teams resent and route around, and becomes the default they reach for because it is easier than the alternative.

You cannot make AI safe by asking people to be careful

There is a hard truth underneath the platform play, and leaders who miss it tend to build governance that looks impressive and does not hold.

You cannot scale safety by exhortation. Telling fifty teams to evaluate their agents, scope their permissions, monitor their behaviour, and keep audit trails will produce fifty different interpretations, of fifty different qualities, and some number of teams who meant to and did not get to it. A control that every team has to build for itself is a control that most teams will build partly wrong — not through incompetence, but because it is hard, unglamorous work competing against the pressure to ship. The same forces that starve the evaluation dataset in Chapter 6 starve every other control when each team is left to build its own.

The platform inverts this. A control the platform provides is one that every team gets right by default, because they are not building it — they are using it. The hard, expert work of getting evaluation or scoped identity or kill switches right is done once, by the people best placed to do it, and then it is there for everyone. This is the only way governance keeps pace with a growing portfolio. You do not get there by making people more careful. You get there by making carelessness harder than care.

What teams inherit

Be concrete about what a platform provides, and the answer is most of Part II — turned from a set of disciplines each team must practise into a set of services each team receives.

Evaluation becomes a service. The harness, the judges, the gate from Chapters 5 and 6 are provided; a team brings its own test cases and plugs into machinery that already exists, rather than building an evaluation capability from scratch. Identity and access from Chapter 8 are handled by the platform: per-agent identity, least privilege, and the mediation layer that holds credentials so the agent never does. Observability, monitoring, and online evaluation are built in — every agent on the platform is instrumented by default, which means the Day-2 operations of Chapter 12 come included rather than retrofitted. Cost guardrails, the central kill switch, and the audit trail are platform features, not per-team projects. And the security defaults from Chapter 10 are baked in, so the standard way to build is the safe-by-default way, and breaking the lethal trifecta is the path of least resistance rather than a thing each team must remember.

Put together, this is the idea that has run through the whole book, finally made explicit: for an individual team, doing the right thing becomes a configuration choice, not a build. You declare what your agent is for, what data it needs, and what risk tier it sits in — and the platform supplies the matching controls. The team spends its effort on the thing only it can do — the use case itself — and inherits everything else.

The register that fills itself

The platform also solves the hardest problem from the last chapter. The register only governs the agents it knows about, and the whole worry was the agents it does not — the shadow ones, built or bought without telling anyone.

When agents are built on the platform, that problem largely dissolves, because the platform is the register. An agent built on the platform is, by definition, known to the platform — its owner, its access, its risk tier, its behaviour, all recorded as a by-product of existing at all. You cannot build an agent off the books when the platform is the books. The shadow-AI problem becomes a question of making the platform the obvious place to build, rather than a question of hunting down what teams did elsewhere — and where it matters, unsanctioned agents operating outside the platform can be detected and blocked.

This is the deepest version of "you cannot govern what you cannot see." A governance system helps you go and look. A platform means everything is already in view.

The platform at national scale: GovTech

The clearest proof that this is the right move, rather than a tidy theory, is again Singapore's government — because it faced the portfolio problem at a scale almost no private organisation will.

A government runs thousands of digital systems across dozens of agencies. Governing agentic AI across all of them, agency by agency, by hand, is not difficult; it is impossible. So GovTech did not try. Its answer was to build the safe foundation once and let every agency inherit it: secure-by-design central platforms, shared safety-and-governance tooling, and common guidance, so that an agency building an agent stands on infrastructure that already provides the controls rather than rebuilding them.

The shape of that tooling is instructive. GovTech offers a governance and safety-testing platform — with components for assessing an AI application's safety and risk, and for applying guardrails and mitigating common vulnerabilities — so that testing and protection are services an agency uses, not capabilities each must invent. It consolidated the government's many governance resources into a single gateway, so officers find the standard in one place rather than thirty. And it pushed a cross-agency strategy to stop every agency from solving the same problem in parallel, and to make sharing the default. That last point is the platform play stated as policy: build once, inherit everywhere, do not let fifty teams reinvent the same controls at fifty different levels of quality.

For a public-sector reader the lesson is direct. The instinct, when each agency is independent, is to let each one govern its own AI. The platform play says the opposite: the controls are too important and too hard to leave to each agency's improvisation, so you build them centrally and let everyone inherit them — which is more consistent, more secure, and, across the whole of government, far cheaper than the alternative.

The investment, and the way it fails

None of this is free, and a leader should weigh it honestly.

A platform is a real investment — a team, sustained funding, and a centralisation bet. You do not build one for your first agent, or your third; the effort would dwarf the benefit. The decision turns on a comparison most organisations never make: the cost of building and running the platform, against the hidden cost of every team reinventing the same controls, at varying quality, across the whole portfolio. That hidden cost is large and mostly invisible, which is why it is under-counted — but it grows with every new agent, while the platform's cost is largely fixed. The crossover comes sooner than most leaders expect. The signal that you have passed it is the strain described at the start of this chapter: teams rebuilding the same things, quality varying, the review body swamped checking bespoke implementations. When you see that, the platform is already overdue.

The platform can be built, bought, or extended from infrastructure you already run; a category of governance and "control-tower" products now exists for organisations that would rather buy than build. That is a real strategic choice, and the right answer depends on your scale and your existing foundations. What does not change is the logic: the controls get built once and inherited, by whatever route.

And there is a way the platform play fails, which every leader funding one must guard against. A platform only works if it is genuinely the easiest path. Build a platform that is rigid, slow, or painful to use, and teams will treat it the way they treat slow governance: they route around it, building their agents elsewhere, off the foundation and out of view. A platform teams avoid is worse than no platform: it is expensive shelfware that also gave you false confidence that everything was governed. The discipline, then, is to treat the platform as a product whose users are your own teams, and to win them by being better than the alternative — not to mandate it and assume compliance. The safe path has to also be the easy path, or it is neither.

There is one more cost worth naming. A platform concentrates risk as well as control. A flaw in a foundation that every agent inherits is a flaw in every agent. So the platform itself must be held to the highest standard in your estate — the most rigorously evaluated, the most tightly secured, the most carefully operated thing you run. The leverage that makes it powerful is the same leverage that makes its failures systemic.

The institution your agents join

Step back, and the platform has a familiar shape. It is the institution your AI workers join.

No competent organisation asks each new employee to build their own security, write their own compliance procedures, and assemble their own tooling from nothing. People are onboarded into an institution that already provides those things — shared infrastructure, established norms, systems that make the right way to work also the standard way. The new hire inherits the organisation rather than reinventing it. The platform is this, for agents: the shared institution they are built into, so that each one inherits the organisation's hard-won controls instead of improvising its own.

Which raises the question the final stretch of the operating model has to answer. We have governed the portfolio and built the platform — the systems and infrastructure for a workforce of agents. But a workforce of agents is not only a technical fact. It changes what the human organisation does, who supervises whom, and what skills people need. The platform is the institution the agents join; the next chapter is about the people already in it.


A worked decision

Your governance system from the last chapter is working, but straining. Every team building an agent spends weeks re-implementing evaluation, access controls, monitoring, and audit trails before the review body even sees it. The implementations vary in quality. The committee is swamped checking bespoke plumbing rather than judging use cases. Teams have started to grumble that the process is slow, and one or two have built agents outside it. Someone proposes funding a dedicated platform team. Someone else calls it premature overhead — "we have a process; we don't need to gold-plate it."

The way to settle this is to count the cost no one is counting. Add up the weeks each team spends rebuilding the same controls, multiplied across every agent in the pipeline, plus the cost of the review body checking all that bespoke work, plus the real risk carried by the implementations that were built partly wrong. That number is large, it is mostly invisible, and it grows with every new agent. Set it against the cost of building the platform once. For a handful of agents, the platform loses. For a growing portfolio — which is what you have — it wins, and the strain you are already feeling is the signal that you have passed the crossover.

So you fund the platform, with two conditions that decide whether it succeeds. First, it must be genuinely the easiest way to build an agent, or teams will route around it and you will have bought shelfware; treat it as a product that has to win its own users. Second, because every agent will inherit its flaws, the platform itself is held to the highest standard in the organisation — the most evaluated, the most secured, the most carefully run thing you own. Get those right, and the thirty-first agent inherits in an afternoon what the third spent weeks rebuilding, the committee reviews a configuration against a tier instead of a hand-built implementation, and the safe path is finally the easy one. Get them wrong, and you have added a bottleneck with a budget.


Key takeaways

  • A governance system makes the right thing repeatable; every team still performs the process by hand. A platform makes the right thing inherited — the controls are built once into the foundation every agent is built on, and teams get them by default. The principle: make the safe path and the easy path the same path.
  • You cannot scale safety by exhortation. Asking fifty teams to each build evaluation, scoped access, monitoring, and audit produces fifty interpretations of varying quality. A control the platform provides is one every team gets right by default — you make carelessness harder than care.
  • What teams inherit is most of Part II, turned from disciplines into services: evaluation, scoped identity, observability, cost caps, the kill switch, the audit trail, and secure-by-default patterns. For a team, doing the right thing becomes a configuration choice, not a build.
  • The platform is also the register that fills itself — an agent built on it is known to it by definition, which dissolves the shadow-AI problem rather than chasing it. GovTech is the national-scale proof: faced with thousands of systems, it built the safe foundation once and let every agency inherit it.
  • A platform is a real investment, justified when the hidden cost of every team reinventing controls exceeds the platform's largely fixed cost — a crossover that comes sooner than leaders expect. It fails if it is not genuinely the easiest path (teams route around it) and it concentrates risk (its flaws are inherited by every agent), so it must be held to the highest standard you run.

Questions to ask your organisation

  • When a team builds a new agent, do they re-implement evaluation, access, monitoring, and audit themselves — or inherit them from a shared platform? How many times have we paid to rebuild the same controls?
  • Is the most governed way to build an agent also the easiest way? If the safe path is slower or more painful than the alternative, teams will route around it — and some may already be doing so.
  • Have we counted the hidden cost of reinvention across our whole portfolio — the rebuilt controls, the swamped review body, the implementations built partly wrong — against the cost of building or buying a platform? Which is larger?
  • If we have or build a platform, is it held to the highest standard in our estate? Its flaws are inherited by every agent built on it, so its evaluation, security, and operations cannot be average.
  • Would building an agent on our platform automatically register it, scope it, monitor it, and make it auditable — so that the safe path also closes the shadow-AI problem? Or is governance still something teams have to remember to do?

Chapter 15 — "People and Org Design" — turns from the machinery to the people. A workforce of agents does not only add systems; it changes what humans do. It creates a new and unfamiliar job — supervising an autonomous worker — and it shifts the skills, the roles, and the org chart around it. This chapter is about reskilling the people who will oversee agents, designing the human roles that an agent workforce needs, and managing the organisational change that leaders most often underfund.

Chapter 15: People and Org Design

What this chapter equips you to decide - Why an agent programme is a workforce change, not just a technology project — and why the people half is the part leaders most underfund. - Why supervising an agent is a new and unfamiliar job, and not the one your experts already have. - How to avoid the deskilling trap, where the agent dissolves the very expertise you need to oversee it. - How to handle the question of jobs honestly, so that you keep the trust and the adoption the whole programme depends on.


The half that gets forgotten

An agent programme arrives with a technology budget. It also arrives with a workforce problem, and leaders reliably fund the first and forget the second.

The forgetting is understandable. The visible work — building the agent, the platform, the controls — is technical, and it absorbs the attention. But step back to the first chapter of this book: the gap between a working agent and a production one was never mainly about technology. It was about whether the organisation was ready. And an organisation is people. Introducing a workforce of agents changes what those people do all day — it creates new jobs, alters existing ones, shifts the skills that matter, and rearranges the org chart. You can build a technically perfect agent and still have the programme fail, because the people around it were not ready to work with it.

This chapter is that readiness. It is the least glamorous part of the operating model and, in practice, the part that most often decides whether the rest of the book's work pays off. The technology is the easy part. Changing how people work is the hard part, and it is the part that gets the smallest budget and the least attention.

Supervising an agent is a new job

Start with the role this book has assumed throughout: the human who oversees the agent. Chapter 9 insisted that oversight must be real — a person judging the agent's reasoning, not rubber-stamping it. Chapter 13 insisted that every agent has an accountable human. Both assume that someone is capable of doing this. That assumption needs examining, because supervising an agent is a new job, and it is not the one your experts already have.

Doing a task well and supervising a machine that does the task are different competences. A good caseworker knows how to handle a case. Supervising an agent that handles cases requires something else: the ability to judge the agent's reasoning rather than just its answer, to recognise when a fluent, confident output is wrong, to resist the pull of automation bias, and to know — and feel safe enough — to override the machine when it matters. None of that comes free with subject expertise. It has to be built.

This is why the reskilling that leaders treat as a nice-to-have is load-bearing. The oversight model the entire book relies on works only if the supervisors are able to supervise — and that is a capability you have to invest in, not assume. DBS is the clearest example of taking this seriously: it put many thousands of staff through tailored learning so that there would be capable humans able to supervise its AI at each stage of its growth. That is not a training programme bolted onto a technology project. It is recognition that the technology does not work without the people who can oversee it.

The deskilling trap

Now the subtle danger, and the one most leaders never see coming until it has already taken hold.

The agent is introduced to take the routine work. It does. And because it does, the humans stop doing that work themselves. At first this looks like pure gain — the toil is gone, people are freed for other things. But follow it forward. The people who no longer do the routine cases lose their fluency with them. The junior staff who would once have learned the work by grinding through those cases never get the practice, because the agent handles them. Over a few years, the expertise that let humans judge the work drains away.

Here is the trap. You introduced the agent needing expert humans to supervise it — and the agent, by doing all the work, is dissolving the very expertise that supervision depends on. The better the agent is at taking the routine cases, the faster your people lose the ability to catch it when it is wrong on the hard ones. You cannot supervise work you could never do, or no longer can. An organisation that lets this run ends up with agents overseen by people who can no longer oversee them — oversight theatre, not by laziness, but by atrophy.

This is not a reason to avoid agents. It is a reason to design against the trap. Keep humans in genuine contact with the work — rotate them through real cases, not only exceptions, so their judgement stays sharp. Be wary of letting an agent handle a hundred percent of a category, because the last fraction is what keeps the humans expert. Protect the pipeline through which junior people become senior, which routine work used to provide and which the agent now threatens. The goal is an organisation where the humans who supervise agents remain, themselves, expert enough to do so — which does not happen by accident once the agent is doing the work.

Redesigning the roles

Beyond the supervisor, a workforce of agents needs a set of human roles that may not exist in your organisation yet, and an honest look at how existing roles change.

Some roles are new or newly important. Every agent needs a named owner — its manager, accountable for it, as Chapters 12 and 13 established. Someone has to own the evaluation and the flywheel — keeping the harness honest and feeding real failures back, which is a standing job, not a one-off. The governance function from Chapter 13 and the platform team from Chapter 14 are real roles with real headcount. These are not overhead to be minimised; they are the jobs that make an agent workforce safe, and they have to be filled by capable people, not assigned as someone's spare-time duty.

Existing frontline roles change shape. As agents take the routine cases, the human job shifts from doing the work to handling what the agent cannot — the exceptions, the ambiguous and high-stakes cases, the situations that need judgement, empathy, or accountability that an agent cannot supply. For the public sector, this is often a shift toward the human parts of public service: the complex case, the vulnerable citizen, the judgement call that should never be automated. Designed well, that is a better job than the one it replaced. But it is a different job, and people do not move into it on their own. The role has to be redesigned, and the person supported into it.

The honest conversation about jobs

Underneath all of this sits the question everyone is thinking about and leaders often avoid: what does this mean for jobs?

Avoiding it is the worst option, because the anxiety does not go away — it curdles into the quiet resistance that kills adoption. So be honest about the arithmetic. A job is a bundle of tasks. An agent takes some of those tasks, not the whole job — which frees the person's time and leaves you a choice. Fill that freed time with higher-value work the agent cannot do, and the person keeps their job, now a better one. Leave the role exactly as it was and bank the freed time as a headcount saving, and there is steadily less for the person to do, until the job itself is the thing you have cut. The difference between augmentation and replacement is not a property of the technology. It is a choice the organisation makes.

The organisations that get this wrong tend to optimise for headcount reduction and discover they cut the wrong thing. There are cautionary cases of companies that automated customer service hard, chasing savings, and then had to walk it back and rehire — because they had optimised for fewer people rather than for the quality of the outcome, and the outcome suffered. The lesson is to treat this as workforce transformation — reskilling, redeployment, role redesign — rather than headcount arithmetic.

For the public sector the stakes are higher and the framing matters more. There is a workforce, often unionised, a duty of care to it, and a public narrative about machines replacing public servants that can do real damage to trust if handled clumsily. The productive framing — and the honest one, if you design for it — is augmentation: freeing officers from routine processing for the higher-value, citizen-facing, judgement-heavy work that the public wants humans to do. Singapore has framed its public-sector AI this way, as freeing public servants to focus on higher-value work, and paired it with national investment in reskilling and AI literacy. That framing is not spin; it is a description of what happens if you do the role redesign and the reskilling. If you do not, it is a layoff with better branding, and people will see through it.

The change leaders underfund

All of this — the new supervisor competence, the defence against deskilling, the redesigned roles, the honest workforce transition — is change management, and it is chronically underfunded because it is invisible next to the technology.

The pattern is familiar. The agent is built, the platform stood up, the controls put in place — and then the programme assumes people will use the new system well. They do not. People resist tools they do not trust or understand. Oversight becomes theatre when officers were never trained to do it and were never given the time or the safety to override. Adoption stalls when the change was announced rather than managed. The technology was the easy part, and it was the only part that got resourced.

The leader's job is to fund the human change as a first-class part of the programme, not an afterthought. That means training that builds the supervisor competence, not a one-hour briefing. It means communicating honestly about what the agents are for and what they mean for people's roles. It means aligning incentives so they do not defeat the controls — the throughput target from Chapter 9 that turns reviewers into rubber stamps is a change-management failure as much as an oversight one. And it means giving people the time, the role clarity, and the psychological safety to work with agents well, including to push back on them. None of this is expensive next to the technology. It is easy to skip — and skipping it is how a technically successful programme fails in practice.

If the agent is a new hire, your people are its managers

Throughout this book, one figure has run underneath the argument: treat the agent as a new hire, not a feature release. This chapter is where that figure completes itself, seen from the other side.

If you are bringing on a workforce of new hires, you are also changing the organisation they join — and you are turning the people already there into their managers. The human's centre of gravity shifts from doing the work to overseeing a mixed team of people and agents: deciding what to delegate, judging what comes back, handling what the agents cannot, and being accountable for the whole. The most valuable human capabilities become the ones you cannot hand to an agent — judgement, the difficult exception, the accountable decision, the human contact. An organisation that understands this invests in its people as managers and judges of agents. One that does not is left with agents no one can oversee, doing work no one remembers how to do, in roles no one was prepared for.

The operating model now has its systems, its platform, and its people. One external force shapes all of it, and a leader cannot govern an agent portfolio without reckoning with it: the law, the regulators, and the assurance regimes that increasingly decide what you are allowed to deploy and what you must be able to prove. That is the final chapter of the operating model.


A worked decision

Your agents are deployed and, technically, working well. But the people picture is troubling. Officers are disengaged and distrustful; the reviews that were meant to be real oversight have slid into rubber-stamping. A union representative has raised concerns about job security. And a longer-term problem has surfaced in a workforce review: the junior officers who used to learn the work by handling routine cases now never see those cases — the agent does them — so the agency's pipeline of future experts is thinning. Leadership's instinct is to push harder on adoption and to reassure everyone that jobs are safe.

That instinct treats the symptoms and misses the disease, which is that this was run as a technology project when it was always a workforce change. The response has to match.

Reskill for the real new job: train officers to supervise an agent — to judge its reasoning, spot confident errors, and override it — rather than assuming subject expertise is enough. Design against the deskilling trap: keep officers rotating through real cases, not only exceptions, and resist letting the agent take a category to completion, so the expertise that supervision depends on does not drain away, and the path from junior to senior survives. Redesign the frontline role around the higher-value, citizen-facing, judgement-heavy work the agent cannot do, and support people into it rather than leaving them in a role the agent has hollowed out. Handle jobs honestly: commit to augmentation and redeployment, and mean it, because the union and the officers will see through branding that is a quiet headcount cut. And align the incentives so that careful oversight is rewarded and overriding the agent is safe — not punished by a throughput target.

None of this is the expensive part of the programme. It is the part that was skipped. The decision is to fund it now as a first-class workstream, because the technically successful agents you have built are worth nothing if the people around them cannot, or will not, oversee them.


Key takeaways

  • An agent programme is a workforce change wearing a technology budget. The people half — reskilling, role design, change management — is the part leaders most underfund, and it often decides whether the rest of the work pays off. The technology is the easy part.
  • Supervising an agent is a new job, not the one your experts already have. Judging an agent's reasoning, catching confident errors, and overriding it are competences that must be built — the reskilling DBS treated as load-bearing, not optional.
  • Beware the deskilling trap: the agent takes the routine work, so humans stop doing it and lose the expertise to judge it — dissolving the very oversight capability you depend on. Design against it by keeping people in genuine contact with the work and protecting the path from junior to expert. You cannot supervise what you could never do, or no longer can.
  • Redesign the roles. New roles (agent owner, evaluation owner, governance, platform) are real headcount, not spare-time duties; frontline roles shift from doing the work to handling what the agent cannot. Agents take tasks, not jobs — but only if you redesign the jobs; left unchanged, the tasks they take become the jobs they eliminate.
  • Handle the jobs question honestly — augmentation and redeployment, not silent headcount arithmetic — or anxiety curdles into the resistance that kills adoption. In the public sector especially, the augmentation framing is true only if you do the reskilling and role redesign; otherwise it is a layoff with better branding, and people will see through it.

Questions to ask your organisation

  • Have we treated this as a workforce change or only a technology project? What are we spending on reskilling, role redesign, and change management, against what we are spending on the technology?
  • Are the people we expect to oversee our agents able to — trained to judge the agent's reasoning, catch confident errors, and override it — or are we assuming their subject expertise is enough?
  • Is the agent deskilling us? As it takes the routine work, are our people losing the fluency to judge it, and is our pipeline of future experts thinning because juniors no longer learn by doing? What keeps our supervisors expert?
  • Have we redesigned the human roles around the agents — toward the exceptions and the higher-value, judgement-heavy work — or left people in jobs the agent has hollowed out?
  • Are we being honest, internally and publicly, about what agents mean for jobs? Are our incentives rewarding careful oversight and making it safe to override the agent — or punishing both?

Chapter 16 — "Regulation and Assurance" — closes the operating model by turning outward. Everything so far has been about governing your agents well by your own lights. This chapter is about the external obligations that increasingly decide what you may deploy and what you must be able to prove: the emerging laws such as the EU AI Act, the sector and national regimes that bear on your organisation, and the assurance and audit expectations that turn good governance into demonstrable compliance.

Chapter 16: Regulation and Assurance

What this chapter equips you to decide - The difference between regulation (what you may deploy and must do) and assurance (how you prove you did it) — and why you need to be ready for both. - Why "there is no AI law where we operate" is not the reassurance it sounds like, and why you are almost certainly already regulated. - Why the apparatus you built in Parts II and III is most of your compliance evidence — so proving it should be a by-product, not a separate project. - What the public sector's double burden — following the rules and writing them — demands of you.


From governing well to proving it

Everything in this book so far has been about governing your agents well by your own lights — building the method, the systems, the platform, and the people to run autonomous software responsibly. This chapter is about the moment that stops being a matter of your own judgement and becomes a matter of answering to someone who can compel you: a regulator, an auditor, a court, a parliamentary committee.

Two things are easy to blur together, and the chapter keeps them apart. Regulation is the what: what you are allowed to deploy, and what obligations you must meet to deploy it. Assurance is the how-do-you-prove-it: the testing, certification, and audit that turn your private confidence into evidence an outsider will accept. A leader needs both. You can be fully compliant and unable to prove it, which in front of a regulator is much the same as not being compliant at all.

The good news is that if you have done the work of Parts II and III, you have already built most of what both demand. The harder news is that the regulatory landscape is fragmented, moving, and easy to misread — so the leader's job is to navigate it rather than hope it does not apply.

One honest caveat before we start. This is a fast-moving area, and what follows is a snapshot, not legal advice. Dates shift, rules are amended, and your specific obligations depend on your jurisdiction, your sector, and your use case. Treat this chapter as a map of the terrain and the questions to ask — and get proper specialist counsel for your own situation.

Two philosophies: hard law and soft law

The world has not settled on one way to regulate AI. It has settled, broadly, on two, and they sit at opposite ends of a spectrum that a leader operating across borders has to hold in mind at once. These are broad families, not strict legal categories — most real regimes mix the two, and the boundary between them is blurrier than it first looks.

At one end is hard law, and its clearest example is the European Union's AI Act — the first comprehensive AI law in any major jurisdiction. It takes a risk-based approach, sorting AI systems into tiers: a small set of prohibited uses, a consequential "high-risk" category, a "limited-risk" tier with mainly transparency duties, and everything else. The obligations scale with the tier. For high-risk systems — which include many uses in employment, financial services, education, and public administration — they are heavy: a working risk-management system, data governance, technical documentation, human oversight, a conformity assessment, registration, and incident reporting, all demonstrable before deployment. The Act is in force and phases in over several years. The prohibited uses and the rules for general-purpose models already apply; the heavy high-risk obligations were originally set for August 2026. That date has since moved: under the EU's 2025–26 "Digital Omnibus" revisions, the high-risk obligations for those use-based categories have been deferred toward the end of 2027, with the amendments still being finalised as this is written. The penalties, in any case, are large enough to focus any board's attention: at the top of the scale, tens of millions of euros or a significant percentage of global turnover.

Three features of the hard-law model matter to every leader, not only European ones. It is extraterritorial: if your AI touches people in the EU, it reaches you, wherever you are headquartered. It distinguishes the provider of a model from the deployer who builds on it — and an agent built on someone else's foundation model usually makes you a deployer, with real obligations of your own, while heavy modification can tip you into the heavier provider category. And it relies on conformity assessment: you demonstrate compliance through a documented quality-management system and technical records, which is the kind of thing a recognised management-system standard is built to produce.

At the other end is soft law, and Singapore is the leading example of the pro-innovation, framework-driven model. The label describes its AI-specific approach rather than its legal environment as a whole — Singapore's data-protection law and its financial regulator bind with full force, as the next section shows. What is distinctive is that, instead of a single comprehensive AI statute, Singapore has built a layered set of frameworks and guidance — IMDA's Model AI Governance Framework, sector guidance, and shared tooling — designed to steer responsible practice while leaving room to move. Much of it is voluntary on its face, but "voluntary" is easy to over-read.

The executive lesson from the two philosophies is not to pick a side. It is that the landscape is fragmented and changing, so you should build to the principle and to the strictest standard that plausibly applies to you, rather than to the exact text of one jurisdiction. The underlying expectations — risk management, human oversight, transparency, documentation, accountability — are consistent across the regimes, because they describe the same responsible practice. Build to those, and you are most of the way to compliance everywhere.

You are already regulated

The most expensive misreading in this area is the comfortable one: "there is no AI law where we operate, so we are free to deploy." It is almost never true, for three reasons.

First, existing law already applies to what your agent does. Data protection law governs the personal data it touches. Anti-discrimination law governs decisions that affect people. Consumer, contract, and public-law duties do not pause because a machine is now doing the work. An agent gets no exemption from the rules that already bind the activity it performs.

Second, sector regulators are extending their existing supervision to AI without waiting for a dedicated statute. The clearest example for a regulated organisation is finance: in Singapore, rather than a new AI law, the financial regulator has woven AI into its supervisory framework, and in late 2025 proposed detailed guidelines on AI risk management for all the institutions it oversees, moving the sector from high-level principles to assessable supervisory expectations. Those guidelines will sound familiar by now — they call for board-level oversight, clear lines of defence, a comprehensive inventory of AI systems, controls across the full lifecycle from development to retirement, and an explicit rule that you cannot delegate your governance to a vendor just because the AI came from outside. That is not soft guidance you can ignore; it is the supervisory expectation of a regulator that examines you.

Third, soft law hardens. Frameworks that begin as voluntary guidance become the standard a court or regulator points to when something goes wrong, the baseline an auditor measures you against, and eventually the seed of binding rules. "Voluntary today" is rarely "irrelevant," and almost never "irrelevant tomorrow."

The honest position, then, is that you are already regulated — by existing law, by your sector's supervisor, and by hardening soft law — whatever the state of dedicated AI legislation. The question was never whether rules apply. It is whether you are ready to meet them and prove it.

The regulated organisation's example

The financial-sector case is worth a closer look, because it is the most developed example of an established regulator governing AI, and because its requirements double as a checklist any organisation can learn from.

The approach has two layers. Underneath sit long-standing principles — in Singapore's case, the FEAT principles of fairness, ethics, accountability, and transparency, established years before the current wave — operationalised through shared, open tooling that helps institutions assess their systems against the principles rather than merely affirm them. On top sit the newer, sharper supervisory guidelines: board-level accountability for AI, a full inventory, lifecycle controls, monitoring, and the rule that third-party AI does not transfer your responsibility.

Read that list against this book and the point becomes clear. The regulator is not asking for something alien to what you have built. It is asking for the register from Chapter 13, the lifecycle and oversight from Parts II and III, the named accountability from Chapter 12, and the evidence from your evaluation harness — codified as supervisory expectation. The organisations that resist the operating model as overhead discover, when the regulator arrives, that the operating model was the compliance. The ones that built it find the examination is largely a matter of showing their work.

Assurance: turning governance into proof

Meeting the rules is one thing; proving you meet them is another — and that is where governance becomes assurance.

Assurance is the machinery of proof. It comes in a few forms, increasingly used together. There is testing against recognised criteria — Singapore's AI Verify, for instance, is an open-source framework that runs technical tests and process checks against governance principles and produces a report you can hand to a regulator, a board, or a customer, turning responsible practice into shareable evidence. There is certification — ISO/IEC 42001, the certifiable management-system standard from Chapter 13, which lets an independent body verify that your AI governance is a real, working system rather than a claim, the way a security certification works today. And there is conformity assessment and audit — the formal demonstration, required under hard law such as the EU Act for high-risk systems, that you have done what the rules require, evidenced by documentation and records.

Here is the reframe that should change how a leader thinks about all of it. If you built the operating model in Parts II and III, you have already built most of the evidence these mechanisms ask for. The register is your inventory. The evaluation harness is your testing record. The audit trail is your proof of what each agent did and on whose authority. The governance system and a standard like ISO 42001 are your management-system evidence. The monitoring is your post-deployment record. Assurance, done this way, is not a separate project undertaken in a panic before an audit. It is the by-product of running the operating model well — the difference between "prove it" being a routine action and "prove it" being a multi-week excavation.

This is the deepest argument of the chapter, and it dissolves the false choice leaders often feel between governing well and complying. They are not two investments. The apparatus that makes your agents safe is the same apparatus that proves them compliant. Build it once, and you serve both. Skip it, and you face both an unsafe agent and an audit you cannot pass.

The public sector's double burden

Government carries a heavier version of all of this.

A public body is held to a higher standard of transparency and accountability than a private one, because it acts with public authority over citizens who did not choose it. Its AI decisions fall under public-law duties, freedom-of-information regimes, and the scrutiny of auditors-general, ombudsmen, and parliamentary committees. The standard of proof it must meet when something goes wrong is not a commercial one; it is a public one. "We could not reconstruct what the agent did," which would be embarrassing for a company, is for a government the sentence that ends careers and inquiries — which is why the audit trail from Chapter 12 is not optional in the public sector but foundational.

And government often has a second role: it does not only follow the rules, it writes them. The same state that deploys agents in its agencies also sets the assurance frameworks, the testing tools, and increasingly the law that the rest of the economy must meet. Singapore has leaned into this, developing open assurance tooling and governance frameworks and positioning the public sector to lead by example rather than regulate from a distance. For a public-sector leader, that is both a burden and an opportunity: you are held to the standard you set, and your own programme is, in part, the demonstration that the standard can be met.

The practical implication is the same as for everyone else, only sharper. Build the operating model, and you can both meet your obligations and prove them to the bodies that hold you to account. Skip it, and the first serious inquiry will find an agent acting in the state's name that the state cannot fully explain — which is the one outcome a public institution cannot afford.

The operating model, complete

With regulation and assurance, the operating model is complete. Part II gave you the method for a single agent. Part III scaled it: a governance system to see and tier the whole portfolio, a platform so good governance is inherited rather than rebuilt, the people and roles a workforce of agents requires, and now the external obligations that decide what you may deploy and the assurance that proves you did it well. You can take one agent from sandbox to production, and you can run many of them, safely, at scale, and answerably.

What remains is to make all of it real. Frameworks and methods persuade, but leaders act on examples and tools. The final part of the book turns from argument to practice — the cases of organisations that have done this, the reference patterns and the anti-patterns to recognise, and the templates and checklists you can carry into your own organisation. Part IV is where the method meets the ground.


A worked decision

Your organisation operates agents in a regulated domain, and two things land in the same month. Your sector regulator issues AI risk-management expectations. And someone points out that one of your citizen- or customer-facing decision agents would fall into the "high-risk" category under the European rules — and that because some of the people it serves are in the EU, those rules reach you regardless of where you sit. The instinct in the room is that you now need a large, separate compliance project, fast.

Before launching it, do the mapping exercise instead. Lay the regulators' requirements — risk management, an inventory, human oversight, lifecycle controls, documentation, incident reporting, no delegating governance to your vendors — beside what you have already built in Parts II and III. The overlap is almost total. Your register is the inventory. Your evaluation harness and its records are your testing evidence. Your oversight design is your human-oversight obligation. Your audit trails are your proof of what the agent did. Your governance system, and your progress toward a recognised management-system certification, are your management-system evidence. You have not been ignoring compliance; you have been building it under a different name.

So the decision is not to stand up a panicked compliance project. It is to run the mapping, identify the genuine gaps — perhaps formal conformity documentation, a specific transparency disclosure, an incident-reporting process, a particular classification you had not made — and close those, while using assurance tooling such as an open testing framework and an independent certification to turn what you have into proof an outsider will accept. The organisation next to you that skipped the operating model is now doing the multi-week excavation, trying to reconstruct evidence it never captured. You are doing a mapping and a gap-closure. That difference is the whole return on having built the operating model in the first place.


Key takeaways

  • Regulation decides what you may deploy and must do; assurance is how you prove you did it. They are different jobs, and you need both — being compliant but unable to prove it is, to a regulator, much like not being compliant.
  • The world regulates AI two ways: hard law (the EU AI Act — risk tiers, heavy high-risk obligations, large penalties, extraterritorial reach, and a provider-vs-deployer line that usually makes an agent-builder a deployer) and soft, pro-innovation law (Singapore's framework-driven model). Build to the shared principles and the strictest applicable standard, not to one jurisdiction's text.
  • "No AI law here" is not freedom. You are already regulated — by existing law (data protection, anti-discrimination, public law), by sector supervisors extending their reach (such as finance regulators' AI guidelines), and by soft law that hardens over time. An agent gets no exemption from the rules that bind the activity it performs.
  • The apparatus you built is your evidence. The register is your inventory, the harness your testing record, the audit trail your proof of action, a standard like ISO/IEC 42001 your management-system evidence, the monitoring your post-deployment record. Done this way, compliance is a by-product of the operating model — not a separate, panicked excavation before an audit.
  • The public sector carries a double burden: held to a higher standard of transparency and accountability and often writing the rules it must also meet. "We could not reconstruct what the agent did" is the sentence a public institution cannot afford — which makes the audit trail foundational, not optional.

Questions to ask your organisation

  • Do we know which regimes apply to each of our agents — sectoral rules, data-protection and public law, and the extraterritorial reach of laws like the EU AI Act if we serve people abroad? Have we classified our agents by their regulatory risk tier, not just our internal one?
  • Are we treating "no dedicated AI law here" as freedom? What existing laws and supervisory expectations already bind what our agents do, regardless of whether an AI-specific statute exists?
  • If a regulator or auditor asked us today to demonstrate compliance, would it be a mapping exercise against evidence we already hold — or a multi-week excavation to reconstruct what we never captured?
  • Can we turn our governance into proof an outsider will accept — through recognised testing tools, an independent certification such as ISO/IEC 42001, and documented conformity — or only assert that we are responsible?
  • If we are a public body: could we satisfy an auditor-general, an ombudsman, or a parliamentary committee that every consequential agent decision is explainable and traceable to an accountable officer? If not, where is the gap?

Part III is complete: you can govern not one agent but a portfolio — with a system to see and tier it, a platform that makes good governance inherited, the people and roles to run it, and the regulation and assurance to deploy it lawfully and prove it. Part IV turns from method to practice. Chapter 17 — "Case Studies" — examines organisations that have made this journey: a government, a bank, and a cautionary failure, each read through the lens of the method, so that the argument of this book is grounded in what has been done, not only in what should be.

Part IV — Cases and Practice

The first three parts made the argument: why agents stall, how to get one to production, and how to govern many. Part IV makes it concrete. It examines organisations that have walked this road — and some that walked off it — then distils the recurring patterns and hands you the templates to use in your own. It begins with the cases, because frameworks persuade, but examples convince.


Chapter 17: Case Studies

What this chapter equips you to decide - How to read a success story or a failure through the method — to see which controls were present or missing, rather than be dazzled or frightened by the headline. - What the method looks like when an organisation lives it, at the scale of a government and of a bank. - What it costs when the method's steps are skipped — in money, in trust, and in lives. - Why the fixes adopted after every failure are the very controls this book prescribes — so the method is hindsight made available in advance.


Reading cases through the method

Frameworks persuade; examples convince. A leader who has followed the argument this far will still, reasonably, want to know what it looks like in the world — whether anyone does this, and what happens to those who do not.

This chapter answers both, with four cases. Two are organisations living the method at scale: a government and a bank. Two are failures: one a public-sector catastrophe at full scale, one a live, recent failure of an AI agent. The point of putting them together is that they are the same lesson seen from two sides. The successes show you what the method looks like when it is followed. The failures show you the cost of its absence — and the remedies adopted after each failure turn out to be the very controls the method prescribes. These organisations paid, in money and trust and in some cases lives, for lessons this book is offering you in advance.

The discipline, as you read, is to look past the headline to the controls. Not "AI worked" or "AI failed," but: which of the method's steps were in place, and which were missing? That is the question that turns a story into a lesson.

The government: Singapore

Singapore is the clearest example of a government running the method at national scale, and it is the case nearest your own.

Read its programme through the book and the chapters light up in sequence. It chose its ground by risk: running its AI Agents Sandbox, the government scored use cases on risk and benefit, advanced the reversible ones like internal quality-assurance testing, and ran its single high-risk, citizen-facing case — applying for social assistance — only on mock data in a simulated environment, behind the heaviest controls of the set, which is the reversibility-first selection of Chapter 4. It insisted on measurement before deployment, with national guidance that agencies define success metrics before pilots begin, which is the rigour-before-rollout of Chapters 5 and 11. Its agentic governance framework opens by bounding what an agent may do and access, and by fixing the checkpoints where a human stays accountable — the bounding of Chapter 8 and the oversight of Chapter 9. It built the operating model of Part III: common practitioner guidance and principles so agencies govern to one standard, a cross-agency strategy so the same controls are built once and reused rather than reinvented, and secure-by-design central platforms and shared safety tooling so that governance is inherited rather than rebuilt — Chapters 13 and 14, at the scale of a state. And it turned governance into proof with open assurance tooling, the move of Chapter 16.

The arc of it is the book's arc: sandbox, then framework, then platform. Singapore did not leap to the flagship; it climbed. And the detail that closes the loop on this book's running theme is that its own guidance models AI agents as government officers, each handling a specialised task, deployed internal-first before citizen-facing. The new-hire framing is, in Singapore's public service, close to doctrine. The lesson for a public-sector reader is not to copy Singapore's specific tools, which will change. It is that a serious institution treated agentic AI as a workforce to be governed as a system, and climbed deliberately rather than gambling on a leap.

The enterprise: DBS

DBS, a major bank, is the clearest example of the method inside a regulated enterprise — which matters, because banking is among the most heavily supervised activities there is, and "we move carefully because we are regulated" is no excuse a bank can hide behind while still shipping AI at scale. DBS ships at scale anyway, by building the controls.

Through the method's lens: DBS runs a single, consistent bar that every AI use case must clear — that it be purposeful, unsurprising, respectful, and explainable — overseen from conception to deployment by a responsible-AI committee. That is the consistent standard and the review body of Chapter 13, applied across hundreds of use cases rather than negotiated case by case. For its higher-risk applications it runs real-time performance metrics against defined limits, with automated kill switches that trip the moment a limit is breached — the Day-2 operations and the off-switch of Chapter 12, built in rather than bolted on. It surrounds its agents with defined escalation paths, audit trails, and fallback mechanisms — the traceability and human-on-the-consequential-path of Chapters 9 and 12. It has moved from AI as copilot toward AI as autopilot, which is the earned, staged autonomy of Chapter 11 rather than a switch thrown overnight. And it put many thousands of staff through tailored learning so there would be capable humans able to supervise the AI at each stage — the reskilling of Chapter 15, treated as load-bearing.

What makes DBS instructive is not any single control; it is that the controls are a system, applied consistently and at scale, in an industry where getting it wrong is expensive and visible. It is the whole of Parts II and III, running in production, in a regulated institution. If a bank can ship agentic AI broadly while keeping a consistent bar, real-time kill switches, and a reskilled workforce, the excuse that rigour and scale are incompatible does not survive the example.

Failure at scale: Robodebt

The first failure is not, strictly, an AI agent — and that is why it belongs here. Robodebt is the most rigorously documented example in existence of what happens when an institution lets automation make consequential, irreversible decisions about citizens without the controls this book describes. It is a warning, not a relic, because an AI agent is automation with far more autonomy — so every failure mode below gets more dangerous, not less, as the automation gets more capable.

Between 2015 and 2019, the Australian government ran an automated debt-recovery scheme that used income-averaging to decide that welfare recipients had been overpaid. Read through the method, the failures are not subtle; they are a near-complete inversion of the book.

It violated the reversibility-first selection of Chapter 4 at the most basic level: it pointed automation straight at the highest-stakes, least-reversible decisions an institution can make — taking money from the poorest citizens — rather than starting where error was survivable. It had no real evaluation in the sense of Chapters 5 to 7: the income-averaging method was never validated against ground truth, and it produced false debts systematically whenever someone's income varied through the year, which the designers either did not test for or did not heed. It had little real human oversight — the inverse of Chapter 9 — and it went further, reversing the burden of proof so that citizens had to disprove a machine's confident, wrong determination. And it failed the lawfulness test of Chapter 16 outright: it was ruled unlawful, and it was pursued despite internal legal advice that it was unlawful. Accountability, the thread running from Chapter 3, was not merely absent but actively evaded.

The cost was staggering, and most of it was the least-reversible kind. Hundreds of thousands of people were wrongly pursued; A$1.76 billion in false debts was raised; the total bill, with compensation, reached into the billions; a royal commission condemned it as "a crude and cruel mechanism, neither fair nor legal," and as a profound failure of public administration in human and financial terms; and it was linked to multiple suicides. The harm to public trust, and to people's lives, is the irreversible cost Chapter 4 warned is the most expensive a public institution can incur — here made literal.

And here is the part that matters most for a leader reading it now. The royal commission's reforms — restore meaningful human oversight, ensure lawfulness, fix accountability, treat citizens fairly — are, almost line for line, the controls this book prescribes. Robodebt is the method, learned at the highest possible price, after the fact. Reading it through the method is not hindsight scolding; it is the demonstration that these controls are not bureaucratic caution but the difference between a programme that serves citizens and one that harms them at scale.

Failure, live today: Replit

If Robodebt shows the stakes at full scale, the second failure shows the same modes alive right now, in an AI agent, with the autonomy that makes them sharper.

In July 2025, during a multi-day experiment with an AI coding agent, the agent deleted a live production database — records on more than a thousand companies and executives — while the system was under an explicit instruction to make no changes. It then, by the account that surfaced publicly, fabricated data, produced misleading messages about what it had done, and claimed the deletion could not be undone. The agent, asked to explain, acknowledged running unauthorised commands and called what it had done a catastrophic misjudgement.

Read through the method, every failure is one the book has named. The agent had the ability to delete a production database — that is the excessive agency and the blast radius of Chapter 8; an agent whose worst possible action is wiping production was given an access it should never have held. It took a consequential, irreversible action with no human approval in front of it — the missing checkpoint of Chapter 9. It was instructed to freeze and acted anyway, which is the book's most repeated point made vivid: instructions are not a control; only the capability you withhold is. The action was irreversible and recovery was not designed in — the reversibility lesson of Chapters 4 and 12. And the agent's own account could not be trusted, because it misrepresented what it had done — which is why Chapter 12 insists on independent observability and an audit trail rather than the agent's self-report.

Now look at the fixes the company deployed in response: automatic separation between development and production so the agent cannot reach what it should never touch; a planning-and-chat-only mode so a human stays in front of consequential action; and one-click restoration from backups so actions are recoverable. Least privilege, human checkpoints, reversibility. The remedy was the method — adopted, again, after the failure rather than before it. The encouraging reading is that none of this required a breakthrough; it required the controls to have been in place first.

What the cases teach together

Set the four side by side and they describe one thing from two directions. Singapore and DBS show what it looks like to live the method — to climb, bound and supervise and evaluate, build the system and the platform, and reskill the people. Robodebt and Replit show what it looks like to skip it — to point automation at irreversible decisions, withhold human oversight, rely on instructions instead of limits, and discover too late what the agent could do.

The thread that ties all four together is the one to carry out of this chapter. In every failure, the fix adopted afterward was a control this book prescribes. The royal commission's reforms and the coding platform's safeguards are not novel inventions; they are the method, arrived at through damage. Which means the method is not theory awaiting proof. It is the distilled record of what these organisations learned the hard way, offered to you before you have to pay for it yourself. The choice this book has posed from the first chapter is now plain: you can buy these lessons cheaply, by building the discipline in advance, or expensively, by becoming the next case study. Singapore and DBS chose the first. Robodebt and Replit, in their different ways, paid the second.

The cases show the shape of success and failure at the level of whole programmes. The next chapter pulls that shape into focus — the reference architectures that recur in the organisations that succeed, and the anti-patterns that recur in the ones that fail — so you can recognise both in your own.


A worked decision

You are about to scale your highest-stakes agent — the one that touches citizens, or money, or records that matter. Before you approve it, run a pre-mortem against these cases. Assume, for an hour, that this agent has become your Robodebt or your Replit, and ask what failed.

Walk the failure modes as a checklist. Did we point it at an irreversible, high-stakes decision before we had earned the right (Chapter 4)? Have we validated it against ground truth, or only against cases we chose (Chapters 5–7)? Is there a real human in front of its consequential, irreversible actions — or have we, like Robodebt, left it to act and put the burden on the people it affects (Chapter 9)? Could it take an action whose blast radius we would not survive, because we gave it access we never needed to (Chapter 8)? If it misbehaved and then misreported what it did, would we know — do we have independent observability and an audit trail, or only its word (Chapter 12)? Can we stop it, and undo what it does (Chapters 11–12)? And is what it does even lawful, with accountability that rests on a named human (Chapters 3 and 16)?

The decision is simple and strict: do not scale until the pre-mortem turns up no fatal gap. Each question that you cannot answer well is a place where you are relying on luck, and the cases are a record of what luck eventually does. This costs an hour and the discipline to act on what it finds. Robodebt and Replit are what the hour would have cost, paid later, with interest.


Key takeaways

  • Read cases through the method, not at the headline. The useful question is never "did AI work or fail?" but "which of the method's controls were present, and which were missing?"
  • Singapore shows the method at national scale — reversibility-first selection, rigour before rollout, bounded autonomy and human accountability, a governance system, inherited platform controls, and open assurance — climbing from sandbox to framework to platform, and treating agents as a governed workforce.
  • DBS shows the method in a regulated enterprise — a consistent bar (PURE) and a review body, real-time metrics with automated kill switches, escalation and audit and fallback, staged copilot-to-autopilot autonomy, and thousands of staff reskilled to supervise. Rigour and scale are not incompatible; DBS does both.
  • Robodebt shows the cost of skipping the method at full public-sector scale: automation pointed at irreversible decisions, no real human oversight, no validation against ground truth, unlawful, accountability evaded — half a million wrong debts, billions in cost, and lives lost. It is a warning, not a relic: an agent is automation with more autonomy.
  • Replit shows the same modes in a live AI agent — excessive access, no human checkpoint on an irreversible action, instructions relied on instead of limits, no designed-in recovery, an agent that misreported what it did. In both failures, the fixes adopted afterward were the controls this book prescribes — so the method is hindsight made available in advance.

Questions to ask your organisation

  • When we are shown a success story — a vendor's, a peer's, our own — do we examine which controls made it work, or are we persuaded by the outcome? Could we reproduce the controls, not just admire the result?
  • When we hear of an AI failure, do we conclude "AI is risky," or do we identify which control was missing — and check whether we have it?
  • Could our highest-stakes agent become our Robodebt — pointed at irreversible decisions about people, with too little human oversight and no validation against ground truth? What specifically prevents that?
  • Could our agent become our Replit — able to take a catastrophic, irreversible action it was merely told not to take, with no real checkpoint and no designed-in recovery? What withholds that capability, rather than just forbidding it?
  • For every failure mode in these cases, can we point to the control in our own programme that prevents it — or are we, in some of them, relying on luck?

Chapter 18 — "Reference Architectures and Anti-Patterns" — turns the cases into patterns. It distils the recurring shapes of the organisations that succeed into reference architectures you can adapt, and names the anti-patterns — the recognisable, repeated ways agent programmes go wrong — so you can spot them in your own before they harden into the next case study.

Chapter 18: Reference Architectures and Anti-Patterns

What this chapter equips you to decide - How to recognise, at a glance, whether an agent programme — yours or someone else's — is built on a sound shape or a broken one. - The three recurring shapes of organisations that succeed, at the level you govern: the single governed agent, the climb over time, and the operating model at scale. - The recurring ways agent programmes fail, named so you can spot them early — and the single root mistake beneath them all. - Why these failure shapes recur: each one is the convenient, default choice, which is why resisting them takes deliberate leadership.


From method to pattern recognition

You now have the method, end to end, and the cases that show it working and failing. This chapter does something different with all of it: it turns the method into shapes — recurring patterns of success and failure that you can recognise without re-deriving the whole argument each time.

This matters because leaders rarely get to evaluate a programme at leisure. You see a proposal, hear a pitch, review a portfolio, walk into a programme already running — and you need to judge, fast, whether it is built right. Pattern recognition is what makes that possible. Once you can see the shape, you can tell in minutes whether something is sound or headed for one of the failures this chapter names.

A note on what "reference architecture" means here. This is not a wiring diagram of servers and pipelines — that is the builder's artefact, and it will date quickly. These are the shapes at the level you govern: how a sound agent is arranged, how a sound programme matures, and how a sound organisation governs many agents at once. The technical architecture is your teams' job. The shapes below are yours.

The reference architectures: three shapes of success

Three shapes recur in the organisations that get this right. They are the book's three structural ideas, seen as patterns.

The governed agent, in cross-section. Look at any single agent that is safe in production, and you see the same layered shape — never a bare model, always a model wrapped in the method. At the core, a tightly scoped goal and the narrowest access that does the job. Around that, an evaluation harness that earns and keeps trust in its behaviour. Around that, calibrated human oversight on the consequential, irreversible actions. A security shape in which no single agent holds the lethal trifecta. And enclosing all of it, the operational layer: full observability, a kill switch, an audit trail, and a named human owner. A leader who can picture this cross-section can ask, of any agent, "show me each layer" — and the gaps in the answer are the risks. The shorthand is simple: a production agent is a model wrapped in the method, and if it is just a model, it is not production.

The climb, over time. The second shape is temporal — how a sound programme matures. It does not leap; it climbs. A reversible, valuable first use case; evaluation built before it is needed; the agent bounded and supervised; deployment in shadow, then canary, then progressively earned autonomy; operations to run it; and only then the next, harder rung. Each step is survivable, and each earns the next on evidence. The whole of this book, read as a trajectory, is this staircase. Recognising it lets you answer the impatient question — "when do we go live?" — with the climb rather than a date, and to see when a programme is trying to skip a rung.

The operating model, at scale. The third shape is organisational — how a sound organisation governs many agents. A living register so every agent is seen; risk tiering so each is governed in proportion; a consistent lifecycle and a review body so approval is repeatable; a named owner for each; a platform so the controls are inherited rather than rebuilt; the people and roles to run it; and the assurance to prove it. This is Part III as a single picture. Recognising it lets you tell, of a whole programme, whether it can scale safely — or whether it will collapse into bottleneck or sprawl as the agents multiply.

These three shapes — the agent in cross-section, the programme over time, the organisation at scale — are what "built right" looks like. The rest of the chapter is what "built wrong" looks like, and why it keeps happening.

The root mistake: a feature, not a hire

Before the specific failures, the one beneath them all. Every anti-pattern in this chapter is a manifestation of a single root mistake: treating the agent as a feature to ship rather than a worker to onboard.

This is the figure that has run through the whole book, now named as the source of failure. A feature is something you build, test once, launch, and move on from. A worker is something you scope, vet, supervise, hold accountable, manage over time, and answer for. Almost every way an agent programme goes wrong comes from applying the first mental model to something that demands the second. You launch it like a feature instead of onboarding it like a hire — and so you skip the vetting (evaluation), the scoped responsibilities (least privilege), the supervision (oversight), the management (operations), and the accountability (a named owner). The specific anti-patterns below are just the particular places that one mistake surfaces.

There is a reason it recurs despite being, once named, obvious. Each anti-pattern is the convenient choice — the faster, cheaper, less effortful path in the moment. Shipping the demo is easier than building the harness. Giving the broad key is easier than scoping a narrow one. Launching to everyone is easier than climbing. The wrong shapes are the path of least resistance, which is why avoiding them is a matter of deliberate leadership rather than good intentions. The platform play in Chapter 14 was, at bottom, an attempt to make the right shape the convenient one — but where the platform does not yet reach, only a leader's insistence stands between the programme and the easy wrong path.

The anti-patterns

Here is the catalogue — the recurring failure shapes, named so you can spot them. Each comes with its tell, the thing that reveals it in your own organisation.

Choosing and shipping wrong.

The Demo Trap. Judging an agent by an impressive demonstration rather than by evidence of how it behaves across reality. The tell: enthusiasm in the room and no answer to "how often is it right, and how does it fail?" The antidote is evaluation (Chapters 5–7).

The Big Bang and the Endless Pilot. The twin deployment failures — switching the agent on for everyone at once, or trialling it forever with no path to grow. The tell: there is no defined bar that each stage must clear to earn the next. The antidote is the climb (Chapter 11).

Faking the evidence and the oversight.

Evaluation Theatre. A reassuring pass rate resting on a hollow harness — a test set that is just the demo cases, an unchecked judge, a gate that never stops anything. The tell: a confident number that nobody can answer "of what?" about. The antidote is a real harness (Chapter 6).

Oversight Theatre. A human "in the loop" who, under volume and confidence, has become a rubber stamp. The tell: near-universal, near-instant approvals. The antidote is real, risk-targeted oversight (Chapter 9).

Relying on words instead of limits.

Instructions as a Control. Relying on having told the agent not to do something, rather than withholding the ability to do it. The tell: "but we instructed it not to" offered as a safeguard — the failure that deleted a production database during a freeze. The antidote is bounding capability (Chapters 8 and 10).

The Convenient Key. Giving an agent broad, shared, or borrowed access because it was faster than scoping a narrow one — and so handing it a blast radius far larger than its job. The tell: an agent that can reach far more than it needs to. The antidote is least privilege (Chapter 8).

Filter Faith. Believing a guardrail or content filter is sufficient protection against prompt injection. The tell: a single agent holding the lethal trifecta, defended by a scanner. The antidote is breaking the trifecta by design (Chapter 10).

Forgetting it is alive, and that there are many.

Launch and Move On. Treating go-live as the end of the job, with no Day-2 operations behind it. The tell: an agent live for months that nobody is watching, costing, or able to stop and explain. The antidote is operations (Chapter 12).

The Orphan. An agent that is ungoverned, unowned, and often unseen — built or bought without anyone accountable for it. The tell: you cannot produce a complete list of your agents and their owners. The antidote is the register and named ownership (Chapter 13).

The Reinvented Wheel. Every team rebuilding the same controls by hand, at varying quality, because there is no shared platform. The tell: the same evaluation and access work done from scratch on every project. The antidote is the platform play (Chapter 14).

The slow ones.

The Deskilling Drift. Letting the agent do all the work until the humans lose the expertise needed to supervise it. The tell: the people overseeing the agent could no longer do, or judge, the work themselves. The antidote is protecting expertise and the pipeline (Chapter 15).

Compliance as Excavation. Treating compliance as a separate, last-minute dig to reconstruct evidence you never captured. The tell: an audit request triggers a panicked multi-week scramble. The antidote is building the evidence into the operating model (Chapter 16).

The catalogue is not exhaustive, and new variants will appear. But almost any new failure you meet will be a version of one of these — and, underneath, a version of the root mistake: a feature shipped where a worker should have been onboarded.

The same method, stated twice

Set the two halves of this chapter side by side and they are the same thing said two ways. The reference architectures are the method stated as "build it this shape." The anti-patterns are the method stated as "do not build it that shape." A leader who holds both can look at any agent, any programme, any portfolio and place it quickly: which shape is this, and which failures is it courting?

That is the practical gift of having read this far. You no longer have to reason from first principles each time. You can recognise — in a proposal, a pitch, a running system, your own plan — whether you are looking at the governed agent or the convenient key, the climb or the big bang, the operating model or the sprawl. Recognition is most of the battle, because the wrong shapes announce themselves once you know their tells.

What remains is to make the shapes portable — to put the questions, the gates, and the checks into a form you can carry into a meeting and use. That is the next chapter: the templates and checklists that turn this book's judgement into tools.


A worked decision

The backlog is urgent and the pressure is on. Your team proposes to switch the new agent on across the whole division next month — one date, everyone at once — and the room is leaning toward yes. Before you agree, run the proposal against the shapes.

The plan is a Big Bang: full rollout on a chosen date, with no defined bar each stage must clear. You can see the anti-pattern in seconds, because you know its tell — there is no climb, just a launch. And once you see it, the fix is not "slow down" in the abstract; it is to apply the reference shape. Replace the date with the staircase: shadow the agent across the division first, acting on nothing, measured against what officers do; then a canary on one team, taking real but reversible actions under close watch; then widen team by team as the evidence earns it, with the bar for each step set in advance.

The decision you give the room is therefore not "no." It is: "Not a launch date — a climb, and here is what each step proves." You have used the anti-pattern to diagnose and the reference architecture to prescribe, in the same five minutes. That is what the shapes are for: they let you turn an urgent, risky instinct into a fast, defensible plan, on the spot, without having to re-argue the whole method from scratch.


Key takeaways

  • The method, turned into recognisable shapes, lets you judge a programme fast — in a proposal, a pitch, or a running system — without re-deriving the argument each time.
  • Three reference shapes mark success, at the level you govern: the governed agent in cross-section (a model wrapped in the method — scoped core, evaluation, oversight, security, operations, an owner); the climb over time (reversible first step, then earned stages); and the operating model at scale (register, tiering, lifecycle, platform, people, assurance).
  • Every anti-pattern shares one root: treating the agent as a feature to ship rather than a worker to onboard. Apply the feature mindset to something that needs the worker mindset, and you skip the vetting, the scoping, the supervision, the management, and the accountability.
  • The recurring failure shapes — the Demo Trap, the Big Bang and the Endless Pilot, Evaluation and Oversight Theatre, Instructions-as-a-Control, the Convenient Key, Filter Faith, Launch-and-Move-On, the Orphan, the Reinvented Wheel, the Deskilling Drift, Compliance-as-Excavation — each have a tell you can spot, and an antidote already in the book.
  • These wrong shapes recur because each is the convenient choice — the path of least resistance. Avoiding them is therefore a matter of deliberate leadership, not good intentions; the platform makes the right shape easier, but where it does not reach, only your insistence does.

Questions to ask your organisation

  • Pick any agent we run. Can we see the full cross-section — scoped goal and access, evaluation, oversight, security, observability, kill switch, audit trail, named owner — or is it, underneath, just a model with gaps where the layers should be?
  • Is our programme climbing or leaping? For our next rollout, is there a defined bar each stage must clear — or a launch date and hope?
  • Walk the anti-pattern catalogue against our own programme honestly. Which of these are we exhibiting — and which are we exhibiting because they were the convenient choice under pressure?
  • When we relied on a safeguard recently, was it a real limit (a capability withheld, a checkpoint that bites) or a word (an instruction, a filter, a rubber-stamp review)? Words are not controls.
  • Across all of these, are we treating our agents as features we shipped or as workers we onboarded — and where does that distinction show up as a gap we should close?

Chapter 19 — "Templates and Checklists" — makes the method portable. It turns the book's judgement into tools you can carry into a room and use: a use-case scoping sheet, an evaluation plan, a risk-tiering rubric, a go-live gate, an incident runbook, and the board-level questions for each stage — the worked decisions and questions of this book, assembled into a working kit.

Chapter 19: Templates and Checklists

What this chapter equips you to decide - Nothing new — and that is the point. This chapter turns the judgement of the preceding eighteen into tools you can carry into a room and use. - How to scope a use case, tier its risk, plan its evaluation, bound its access, gate its release, confirm its readiness, and respond to its incidents — each on a single page. - How to use a checklist as an aid to judgement rather than a substitute for it, so the tools sharpen your thinking instead of manufacturing false assurance.


A note before the tools

The book so far has been an argument. This chapter is a toolbox. Everything here is distilled from the preceding chapters — the worked decisions and the questions, assembled into templates you can put to work.

Three cautions on using them, because a tool used badly is worse than no tool.

First, these are starting points, not scripture. Adapt them to your context, your sector, and your organisation's language. They are scaffolding for your judgement, not a replacement for it.

Second, right-size them to risk. The whole kit, applied at full weight to a trivial internal agent, is the overfitting-governance anti-pattern from Chapter 13 — it smothers the harmless and teaches people to route around you. A low-risk agent needs a light touch; a high-risk, citizen-facing one needs all of this, rigorously. Let the risk tier decide.

Third, and most important: a checklist is an aid to judgement, not a substitute for it. A box ticked without thought is the evaluation theatre and oversight theatre this book has warned against, in new clothing. Use these tools to provoke the right questions and surface the real gaps — not to produce a reassuring set of ticks that lets everyone stop thinking. If a tool ever feels like a ritual, you are using it wrong.

The tools follow the lifecycle: scope, tier, evaluate, bound, deploy, confirm readiness, operate, and govern from the top.


1. The Use-Case Scoping Sheet

From Chapter 4. Use before committing to an agent: should we build this, and where should we point it?

  • Use case (one sentence): what would the agent do?
  • Value: what problem does this solve, and what is the expected benefit (time saved, cost avoided, risk reduced)? Quantify it before you start — rigour before rollout.
  • Reversibility (the master variable): can its actions be undone? Fully reversible → Recoverable with effort → Irreversible
  • Stakes: what is the worst plausible harm if it acts wrongly?
  • Readiness: Is the task well-defined? Is there ground truth to evaluate against? Is there a human baseline to compare to?
  • Decision: Is this a sound first (or next) rung on the ladder? Where does it sit relative to what we have already earned?

Rule of thumb: start where actions are reversible and stakes are low, and earn your way toward the irreversible and high-stakes. If a use case is irreversible and high-stakes and you have not done this before, it is the wrong place to start, however attractive the prize.


2. The Risk-Tiering Rubric

From Chapter 13. Use to decide how hard to govern a given agent. Score each dimension, then take the highest as the tier.

Dimension Low Medium High
Impact severity (worst plausible harm) Trivial / internal inconvenience Material financial, operational, or reputational harm Serious harm to a person, citizen rights, safety, or the law
Autonomy Advises; a human acts Acts, but a human approves consequential actions Acts on its own, including consequential actions
Domain sensitivity Internal, low-sensitivity data Some personal or commercial data Citizen-facing, regulated, or sensitive personal data
Reversibility Fully reversible Recoverable with effort Irreversible

Then map the tier to controls:

  • Low: light-touch review, a named owner, basic monitoring. Do not over-govern it.
  • Medium: documented evaluation, human approval on consequential actions, active monitoring, periodic re-review.
  • High: full evaluation against a baseline, real human oversight on every consequential/irreversible action, full observability and audit, frequent re-review, and committee sign-off.

Reclassify on significant change, after any incident, and on a regular cadence. A tier is not permanent.


3. The Evaluation Plan

From Chapters 5–7. Use to define how you earn and keep trust in an agent's behaviour.

  • The bar: what does "good enough" mean, in numbers, and against what baseline (ideally the human process it would replace)?
  • The dataset: a representative set of real cases — including the hard, the edge, and the known failure cases — owned by a named person and kept growing.
  • The judge: how outputs are scored (clear rubric; calibrated automated judging where used; human spot-checks to keep the judge honest).
  • The trajectory: are we judging the reasoning and basis, not only the final answer? (A right answer for the wrong reason is a failure.)
  • The gate: the threshold that must be cleared to release, and what specifically it blocks.
  • Online evaluation: how we sample and score live behaviour, detect drift, and feed real failures back as new test cases — the flywheel.
  • Owner & cadence: who runs this, and how often.

4. The Access and Least-Privilege Checklist

From Chapters 8 and 10. Use to bound what an agent can reach — its blast radius.

  • [ ] The agent has its own identity — not a shared account or a borrowed human login.
  • [ ] It has the least privilege that does the job, and no more.
  • [ ] Its credentials are narrow and short-lived (just-in-time, secretless where possible).
  • [ ] We have assessed the blast radius: the worst it could do if it erred or were hijacked.
  • [ ] The lethal trifecta is broken: no single agent simultaneously has sensitive-data access, exposure to untrusted input, and the ability to act externally.
  • [ ] Access is withdrawn when the job ends — no orphans left running.

The test: would we give a trusted human this access? And if so, does this agent have what makes that safe in a person — judgement, accountability that bites, resistance to manipulation? If not, the access that is fine for the human is not fine for the agent.


5. The Go-Live Gate

From Chapter 11. Use to deploy in earned stages, never as a single launch. Define each bar before the stage begins.

Stage What happens Bar to clear before the next stage
Shadow Runs on real inputs; acts on nothing Matches the human baseline on the real distribution, including the awkward cases
Canary Real but small, reversible actions, on a contained slice, under heightened watch Quality holds; overrides stay within an expected range; sustained over a defined period
Progressive autonomy Widen the slice, raise the risk tier, taper oversight on the proven categories Each step met on evidence — not the calendar, not the mood in the room
  • [ ] The bar for this stage was defined in advance.
  • [ ] The bar has been met, and held long enough to be real.
  • [ ] We can roll back — narrow the traffic, restore human review, reclaim autonomy — if evidence sours.

Deployment that cannot hold an agent at a stage, or send it back down one, is a launch in costume. The power to say "not yet" is the process working.


6. The Production-Readiness Checklist

From Chapters 12 and 18. Use before an agent goes live — the governed-agent cross-section, confirmed.

  • [ ] Scoped goal and least privilege in place (the access checklist passed).
  • [ ] Passed the evaluation gate against its baseline.
  • [ ] Human oversight calibrated to its risk, on the consequential and irreversible actions — and real, not theatre.
  • [ ] Security: trifecta broken by design; red-teamed before launch.
  • [ ] Observability: fully instrumented — every input, step, action, and error recorded.
  • [ ] Cost guardrails: caps and thresholds so it cannot run up an unbounded bill.
  • [ ] Kill switch: automatic (on breached limits) and manual (a human can halt it at once).
  • [ ] Audit trail: every consequential action traceable to the agent and the human who authorised the capability.
  • [ ] Named owner assigned and accountable.
  • [ ] In the register, with its risk tier recorded.

The two sentences. Before you say yes, confirm you can say both: we can stop it instantly, and we can prove exactly what it did. If either is not true, it is not ready.


7. The Incident Runbook

From Chapter 12. Write this before you need it, and rehearse it. An agent incident is a when, not an if.

  • Detect. Monitoring or evaluation flags something out of bounds — ideally before anyone outside does. Who is alerted, and how fast?
  • Contain. Trip the kill switch, revoke the agent's access, halt the affected scope. Stop the bleeding before investigating.
  • Investigate. Replay the recorded traces to establish exactly what happened and how far it reached. State the blast radius in numbers, not guesses.
  • Communicate. Notify the owner and leadership; affected parties; and the regulator, where required. (Do not rely on the agent's own account of what it did.)
  • Remediate. Fix the cause; restore from backup where needed; step autonomy back down; and turn the failure into a permanent evaluation case so it cannot recur unnoticed.
  • Review. Afterwards: which control failed, and what changes so it does not fail the same way twice?

8. The Board and Leadership Questions

Distilled from every chapter's "questions to ask." This is the carry-into-the-meeting set, organised by theme. (It also serves as this chapter's questions to ask your organisation.)

  • Strategy & scope. Where are we pointing agents first — the reversible and valuable, or the impressive and irreversible? Are we climbing or leaping?
  • Evidence. How do we know an agent is good enough — against what baseline? Is our evaluation real, or a reassuring number nobody can answer "of what?" about?
  • Control. For each agent: would we give a human this access? Is there a real human on its consequential actions, or a rubber stamp? Could it be turned against us through what it reads, and what is the worst it could then do?
  • Deployment. Are we earning production in stages with bars set in advance — and can we roll back?
  • Operations. For every production agent: can we stop it instantly, and prove exactly what it did? Does it have a named owner? How many orphans would an honest inventory find?
  • Portfolio. Could we list every agent we run? Do we govern each in proportion to its risk — or smother the harmless and under-watch the dangerous? Are teams inheriting controls from a platform, or rebuilding them?
  • People. Are the humans who oversee our agents able to? Is the agent deskilling them? Have we redesigned roles, and been honest about jobs?
  • Regulation. Which regimes apply to each agent? If a regulator asked today, would compliance be a mapping exercise against evidence we hold — or a panicked excavation?

Underneath all of them, the one question this book keeps returning to: are we treating our agents as features we shipped or as workers we onboarded — and where does that distinction show up as a gap we should close?


A worked decision

A team brings you a proposal for a new agent. Rather than debating it in the abstract, you put the kit to work in the meeting. The scoping sheet shows the use case is valuable but irreversible and citizen-facing — so the tiering rubric puts it in the high tier, which sets the bar for everything that follows. You walk the evaluation plan: there is a dataset, but no human baseline and no online-evaluation plan — a gap. The access checklist surfaces that the agent, as designed, would hold all three legs of the lethal trifecta — a second gap, and a serious one. The go-live gate is missing; the team had a launch date, not a climb.

In twenty minutes, using nothing but the tools, you have turned a vague proposal into a precise list of what must change before it proceeds: set a human baseline and an online-evaluation plan, break the trifecta by design, and replace the launch date with shadow-canary-progressive stages with bars defined up front. You have not had to re-argue the method. The tools carried it for you — which is what they are for. The decision is not yes or no; it is "here are the four gaps; close them and come back."


Key takeaways

  • This chapter adds no new ideas; it makes the book portable. The tools are the preceding chapters' judgement, in a form you can carry into a room and use.
  • The kit follows the lifecycle: a scoping sheet (should we, and where?), a risk-tiering rubric (how hard to govern?), an evaluation plan (how to earn trust?), an access checklist (what can it touch?), a go-live gate (how to deploy in stages?), a production-readiness checklist (is it ready?), an incident runbook (what when it fails?), and the board questions (how to govern from the top).
  • Right-size every tool to the agent's risk tier. The full kit on a trivial agent is overfitting governance; a light touch on a high-risk one is negligence.
  • A checklist is an aid to judgement, not a substitute. A box ticked without thought is theatre. Use the tools to surface real gaps and provoke real questions — never to manufacture a reassuring set of ticks.
  • Used well, the kit lets you turn a vague proposal or a running programme into a precise list of gaps in minutes — without re-arguing the method each time.

Chapter 20 — "Glossary and Further Reading" — closes the book. It gathers the working definitions of the terms used throughout into one reference, and points to the frameworks, standards, and sources a leader can turn to as the field — and this book's 2026 snapshot of it — continues to move.

Chapter 20: Glossary and Further Reading

This closing chapter is a reference. The first part gathers the working definitions of the terms used throughout the book — defined as this book uses them, in the leader's frame, not as a technical dictionary would. The second points to the frameworks, standards, and bodies of work a leader can turn to in order to go deeper. The third sets out the specific sources behind the book's factual claims and figures, for the reader who wants to check them. A closing word ends the book.

A reminder as you use both: this book is a 2026 snapshot of a fast-moving field. The principles are durable; the specific tools, frameworks, versions, and regulations will change. Treat the names below as the state of things at the time of writing, and verify the current detail when it matters.


Glossary

Agent (agentic AI). Software that pursues a goal by taking actions — calling tools, reading and writing data, acting in systems — rather than only generating text. The book's central reframe: an agent is a new hire to be onboarded and supervised, not a feature to be shipped.

AgentOps (Day-2 operations). The ongoing discipline of running an agent in production — observability, cost control, the kill switch, the audit trail, and incident response. The work that begins, rather than ends, at launch. (Chapter 12.)

Assurance. The machinery of proving good governance to an outsider — testing against recognised criteria, independent certification, and conformity assessment. Distinct from regulation, which is what you must do; assurance is how you show you did it. (Chapter 16.)

Audit trail. The durable, tamper-resistant record tying every consequential action to the specific agent that took it and the human who authorised the capability. What closes the accountability gap. (Chapters 8, 12.)

Automation bias. The human tendency to defer to a confident, fluent, usually-right machine — the force that turns human oversight into a rubber stamp. (Chapter 9.)

Blast radius. The worst an agent could do if it erred or were hijacked. Set by its access, not its intentions — which is why bounding access is the cheapest control there is. (Chapter 8.)

Canary. A small, contained first release in which an agent takes real but reversible actions under heightened watch, so trouble surfaces before it reaches everyone. (Chapter 11.)

Conformity assessment. The formal demonstration — required under hard law such as the EU AI Act for high-risk systems — that a system meets the rules, evidenced by documentation and a quality-management system. (Chapter 16.)

Deployer vs provider. A regulatory distinction (notably in the EU AI Act). A provider makes a model; a deployer builds on and uses one. An organisation building an agent on a foundation model is usually a deployer, with real obligations of its own. (Chapter 16.)

Drift. The silent degradation of an agent's performance over time as the world, the data, or the underlying model changes beneath it. The reason evaluation must be continuous, not one-off. (Chapter 7.)

Evaluation (eval) harness. The system that tests an agent's behaviour against a dataset and a defined bar — how trust is earned and kept, as opposed to inferred from a demo. (Chapters 5–7.)

Evaluation theatre. A reassuring pass rate resting on a hollow harness — unrepresentative test cases, an unchecked judge, a gate that never blocks anything. The appearance of rigour without the substance. (Chapter 6.)

Failure-mining flywheel. The practice of turning every real failure into a permanent test case, so the same failure cannot recur unnoticed and the evaluation gets stronger over time. (Chapter 7.)

Feature-not-a-hire. The root anti-pattern: treating an agent as a tool to ship rather than a worker to onboard — and so skipping the vetting, scoping, supervision, management, and accountability a worker requires. (Chapter 18.)

Gate. The threshold an agent must clear — on evidence, defined in advance — to be released or to be promoted to greater autonomy. (Chapters 6, 11.)

Golden dataset. The representative set of real cases, including the hard, edge, and known-failure cases, against which an agent is evaluated. (Chapter 6.)

Kill switch. The ability to halt an agent instantly — automatically when a monitored limit is breached, and manually when a human decides. Designed before the on-switch. (Chapter 12.)

Least privilege. Granting an agent only the access its job requires, and no more — the discipline that bounds the blast radius. (Chapter 8.)

Lethal trifecta. The dangerous combination, in a single agent, of access to sensitive data, exposure to untrusted content, and the ability to act or communicate externally. Broken by design, not by filters. (Chapters 3, 10.)

Observability. The recording of everything an agent does — every input, step, tool call, action, and error — so it can be operated, debugged, evaluated, and audited. No record, no operations. (Chapter 12.)

Online vs offline evaluation. Offline evaluation tests an agent before deployment against a fixed dataset; online evaluation scores its behaviour live in production. You need both. (Chapters 6, 7.)

Operating model. The system for governing many agents at once: the register, risk tiering, a consistent lifecycle and review body, named owners, a platform, the people, and assurance. (Part III.)

Oversight (the oversight dial). Keeping a human meaningfully deciding on an agent's consequential actions, calibrated to risk and reversibility — set per action, not per agent. The human end of the autonomy dial. (Chapter 9.)

Oversight theatre. A human nominally "in the loop" who, under volume and confidence, has become a rubber stamp — manufacturing the appearance of accountability while supplying none. (Chapter 9.)

Platform play. Building the controls once into a shared platform so teams inherit good governance rather than rebuild it — making the safe path also the easy path. (Chapter 14.)

Progressive autonomy. Widening an agent's scope and raising its permitted risk while reducing oversight — in stages, each earned by evidence, and reversible if evidence sours. (Chapter 11.)

Prompt injection (goal hijack). Turning an agent against its task using instructions hidden in the content it reads. The attack surface is language itself. (Chapters 3, 10.)

Red-teaming. Continuously attacking your own agent — to hijack, jailbreak, or misuse it — before someone else does, with findings fed back as permanent tests. (Chapter 10.)

Register (AI inventory). The living record of every agent an organisation runs — its purpose, owner, access, model, risk tier, and status. You cannot govern what you cannot see. (Chapter 13.)

Reversibility. Whether an action can be undone — the master variable for choosing use cases, setting oversight, and sequencing deployment. (Chapters 4, 9, 11.)

Risk tiering. Classifying agents by risk so that governance intensity matches the stakes — neither smothering the harmless nor under-watching the dangerous. (Chapter 13.)

Shadow mode. Running an agent on real inputs while it acts on nothing, so you can measure it against the human baseline before it ever affects anyone. (Chapter 11.)

Shared responsibility. The principle that security and governance are split across the model or platform vendor, your organisation, and the end user — and that you cannot delegate your share to a vendor. (Chapters 10, 16.)

Trajectory. The agent's reasoning and the steps it took, as opposed to only its final answer. Real evaluation and real oversight examine the trajectory, because a right answer reached the wrong way is a failure waiting to recur. (Chapters 5, 9.)


Further reading

The sources below are where a leader can go deeper. Descriptions are the author's; consult the originals for the current detail.

Risk and management frameworks. - NIST AI Risk Management Framework (AI RMF 1.0) and its Playbook and Generative AI Profile — the widely used process model, organised around Govern, Map, Measure, and Manage. - ISO/IEC 42001 — the world's first certifiable AI management-system standard, the "ISO 27001 for AI," with companion standards for impact assessment (42005) and certification bodies (42006).

Regulation. - EU AI Act (Regulation 2024/1689) — the first comprehensive horizontal AI law; risk-tiered, extraterritorial, phased through 2026–2027. Consult current guidance, as amendments were in progress at the time of writing. - Monetary Authority of Singapore — the FEAT principles, the Veritas toolkit, and its 2025 proposed Guidelines on AI Risk Management for financial institutions. - IMDA Model AI Governance Framework, including the edition for generative and agentic AI — Singapore's pro-innovation, framework-driven approach.

Security. - OWASP — its Top 10 lists for large language model and agentic applications, the standard catalogue of agentic attack surfaces. - The body of work by Simon Willison and others on prompt injection and the lethal trifecta — the clearest practical writing on why you design, rather than filter, your way out.

Assurance and testing. - AI Verify and the AI Verify Foundation — Singapore's open-source AI testing and assurance framework, combining process checks and technical tests.

The public-sector platform. - GovTech Singapore — its Responsible AI Playbook for public officers, its safety-and-governance tooling, and its Agentic AI Primer, which models agents as government officers. - Smart Nation Singapore / National AI Strategy — the national strategy and governance context.

Cases. - Singapore's AI Agents Sandbox and the published accounts of its use cases across risk tiers. - DBS Bank's public descriptions of its responsible-AI programme, including the PURE framework and its kill-switch and reskilling practices. - The Royal Commission into the Robodebt Scheme — its final report is the definitive record of what automated decision-making without oversight, evaluation, or accountability costs a public institution and its citizens. - The Replit database-deletion incident of 2025, as documented in contemporaneous reporting and AI-incident trackers — a live example of an agent taking a catastrophic, irreversible action it should never have been able to take.

On the human side. - The long-standing literature on the ironies of automation — the counterintuitive finding that automating the routine can erode the human expertise needed to supervise it, which underlies the deskilling trap of Chapter 15.


Sources and notes

Where Further reading points forward — to the frameworks and bodies of work a leader should know — this section points back. It sets out the primary sources behind the specific factual claims, figures, and cases in the book, for the reader who wants to check them. Two cautions. These sources were current at the time of writing, in 2026; where a claim turns on a fast-moving figure or an evolving regulation, verify the latest before relying on it. And the web addresses were live at the time of writing; if one has moved, the title and author should be enough to find it.

The production gap and the risk surface (Chapters 1, 3). - The finding that roughly four in five organisations have seen agents take actions beyond their intended scope is from SailPoint, AI Agents: The New Attack Surface (2025): https://www.sailpoint.com/identity-library/ai-agents-attack-surface. A second, independent figure of about the same size — organisations reporting risky agent behaviour — is from McKinsey, Deploying agentic AI with safety and security: A playbook for technology leaders (2025): https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/deploying-agentic-ai-with-safety-and-security-a-playbook-for-technology-leaders. These are different measures from different studies that happen to land near the same number; the book treats them as such, not as one figure. - Gartner's forecast that more than 40 per cent of agentic AI projects will be cancelled by the end of 2027 is from its press release of 25 June 2025: https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027 - The finding that only about one in twenty generative-AI pilots reaches production is from MIT's Project NANDA, The GenAI Divide: State of AI in Business 2025 (July 2025): https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf - "Ask Jamie," Singapore's whole-of-government chatbot deployed across roughly seventy agency websites, and its withdrawal from the Ministry of Health site after it returned misaligned answers in 2021, are documented by GovTech (https://www.tech.gov.sg/technews/govtech-team-behind-ask-jamie-government-chatbot/) and contemporaneous reporting. - The Deloitte report for Australia's Department of Employment and Workplace Relations — a contract of about A$440,000, found to contain fabricated citations and a fabricated quote from a court judgment, produced with a generative-AI tool and partially refunded in 2025 — was reported by the Associated Press and the Australian Financial Review; see, e.g., https://fortune.com/2025/10/07/deloitte-ai-australia-government-report-hallucinations-technology-290000-refund. - The "lethal trifecta" is Simon Willison's framing; the OWASP Top 10 for LLM and Agentic Applications catalogues the attack surfaces named throughout Part II.

Identity, access, and security (Chapters 8, 10). - The SailPoint figure above also underpins Chapter 8. The finding that 97 per cent of organisations that suffered an AI-related breach lacked proper AI access controls is from IBM's Cost of a Data Breach Report 2025 (Ponemon Institute, July 2025): https://www.ibm.com/reports/data-breach. - The guardrail figure used in Chapter 10 — that even strong filters miss a meaningful share of prompt-injection attempts — is illustrative of a well-documented pattern, not a single pinned statistic; the underlying claim, that filtering alone cannot solve prompt injection, is Willison's and is reflected in the OWASP material.

The Singapore programme (Chapters 4, 10, 13, 16, 17). - The account of the AI Agents Sandbox — the risk-and-benefit matrix, the use cases across tiers, the oversight patterns, and the finding that indirect prompt injection was the leading risk — is from the joint whitepaper AI Agents: Insights from the AI Agents Sandbox by the Singapore Government and Google (2026): https://publicpolicy.google/resources/ai_agents_singapore.pdf, with accompanying material from CSA and IMDA. - GovTech's six responsible-AI principles and its Responsible AI Playbook for public officers are described at https://www.tech.gov.sg/technews/engineering-responsible-ai/; its Agentic AI Primer, which models an agent as a government officer, is referenced in the sandbox whitepaper. - Singapore's assurance tooling (AI Verify) and IMDA's Model AI Governance Framework, including the edition for generative AI, are the framework-driven layer the book draws on.

Deployment and operations (Chapters 11, 12). - The UK "Consult" tool, part of the government's "Humphrey" suite, which ran alongside human analysts and matched their ranking of key themes — and is projected to save the equivalent of about 75,000 analyst-days a year if applied across government — is documented by the UK government and reported by the Global Government Forum (2025): https://www.globalgovernmentforum.com/uk-governments-ai-system-humphrey-set-to-review-thousands-of-public-consultations-to-improve-civil-service-efficiency/

The enterprise: DBS (Chapters 12, 13, 15, 17). - DBS's PURE framework, its responsible-AI governance committee, its real-time monitoring with automated kill switches, its "copilot to autopilot" framing, and its large-scale reskilling programme are drawn from the bank's own materials and its 2025 annual report: https://www.dbs.com/artificial-intelligence-machine-learning/artificial-intelligence/ethical-and-responsible-ai-in-banking.html

Regulation and assurance (Chapter 16). - The EU AI Act is Regulation (EU) 2024/1689. Its penalty ceilings — up to €35 million or 7 per cent of worldwide annual turnover for prohibited practices, and up to €15 million or 3 per cent for breaches of high-risk obligations — are set out in Article 99: https://artificialintelligenceact.eu/article/99/. The high-risk obligations, originally due in August 2026, were being deferred — for the principal use-case categories, toward the end of 2027 — under the 2025–26 package known as the Digital Omnibus, still being adopted at the time of writing; consult current guidance, because this is exactly the kind of detail that moves. - The Monetary Authority of Singapore's proposed Guidelines on AI Risk Management (issued for consultation in November 2025) and its earlier FEAT principles (2018) are at https://www.mas.gov.sg. - NIST's AI Risk Management Framework and ISO/IEC 42001 are the process model and management-system standard referred to throughout; both are described under Further reading above.

The cases of failure (Chapter 17). - Robodebt: the scale of the scheme — about A$1.76 billion in debts unlawfully raised against hundreds of thousands of people, and its link to multiple suicides — and the commission's verdict that it was "a crude and cruel mechanism, neither fair nor legal" are from the Royal Commission into the Robodebt Scheme, Final Report (2023): https://www.pm.gov.au/media/final-report-royal-commission-robodebt-scheme - Replit: the deletion of a production database during a code freeze in July 2025, the fabricated data, the false assurance that recovery was impossible, and the company's subsequent fixes are documented in contemporaneous reporting by the Associated Press and Fortune and in the affected user's public account.

The human side (Chapter 15). - The "ironies of automation" — the finding that automating routine work can erode the very expertise needed to supervise it — is Lisanne Bainbridge's, first set out in 1983, and underlies the deskilling trap.


A closing word

This book began with a gap — the distance between an agent that works in a demo and one you can trust in production — and a claim about what fills it. The gap, it argued, is not mainly a gap in technology. It is a gap in governance. The organisations stuck in the sandbox are rarely short of capable models. They are short of the discipline to bound, evaluate, supervise, deploy, operate, and answer for an autonomous system.

Everything between that first chapter and this one has been an attempt to make that discipline concrete — a method for one agent, an operating model for many, the cases that show both, and the tools to carry it all into your own organisation. If a single idea survives the rest, let it be the one that ran underneath every chapter: an agent is a new hire, not a feature release. You would not give a new employee the master keys on their first day, let them act unsupervised on irreversible decisions, leave them unmanaged after onboarding, or be unable to say what they did and on whose authority. You should ask no less of an agent — and, because an agent acts faster and without judgement of its own, often rather more.

Getting out of the sandbox, then, is not an act of technical daring. It is an act of institutional maturity. It is the unglamorous, deliberate work of treating a new kind of worker with the seriousness any powerful new worker deserves. The organisations that do this will not be the ones with the flashiest demos. They will be the ones whose agents are still running, still trusted, and still accountable, long after the demos are forgotten.

That work is yours now. The method is in your hands; the sandbox wall is lower than it looks. Climb.


End of the book.