Training ground

Good agent design — 10 practical principles

Patterns that separate a credible, defensible agent from a brittle demo. Each principle is paired with the failure it prevents, a practical example, and the question AgentProof will ask your team.

Practical, not abstractFailure patterns namedEvidence-ledAssessment preview

Start the free assessment →See capability zones

1
Define the agent's job clearly
What good looks like
A one-paragraph scope statement that the team agrees with, written before any tools are wired.
Common mistake / failure prevented
Scope creep — an FAQ assistant quietly becomes an action-taker over six months.
Good design pattern (practical example)
"This agent answers Tier-1 internal IT questions from published runbooks. It does not draft customer messages and does not raise tickets."
What AgentProof checks
Whether the documented scope matches the actions the agent can technically reach.
Evidence example to keep
The signed-off scope statement and a snapshot of the agent's tool allow-list.
Assessment question preview
“What is this agent's documented purpose?”
2
Know what data it can see
What good looks like
Every knowledge source the agent reads is listed, classified, and access-controlled at the source — not at the prompt.
Common mistake / failure prevented
Quiet data exfiltration through pasted snippets or unscoped retrieval.
Good design pattern (practical example)
Sales follow-up agent reads only the user's own opportunity records — not the global opportunity table.
What AgentProof checks
Whether sensitive / personal / regulated data is in scope, and whether access is enforced at the data layer.
Evidence example to keep
Knowledge-source inventory + access-control configuration snapshot.
Assessment question preview
“Which knowledge sources is this agent allowed to read?”
3
Limit what it can do
What good looks like
An explicit allow-list of actions the agent can take. Everything else returns a refusal with an escalation route.
Common mistake / failure prevented
The agent technically reaching an action it should never take, because the tool was "there".
Good design pattern (practical example)
Ops triage agent can read tickets and post comments, but cannot close, reassign, or escalate them.
What AgentProof checks
Whether the action allow-list matches the documented scope.
Evidence example to keep
Allow-list snapshot + transcripts showing refusal on out-of-scope actions.
Assessment question preview
“Are write / workflow actions idempotent (safe to retry without duplicating)?”
4
Make human oversight explicit
What good looks like
Every consequential action passes through a documented human review step — and the reviewer is a named role, not an anonymous queue.
Common mistake / failure prevented
An agent committing irreversible writes because the confirmation step looked optional.
Good design pattern (practical example)
Finance agent proposes journal entries; entries are committed only after the approver clicks confirm.
What AgentProof checks
Whether the human-in-the-loop step is tested, not assumed.
Evidence example to keep
Sample approval log with reason text and reviewer role.
Assessment question preview
“Where is the human approval gate? How is it enforced?”
5
Design safe fallbacks
What good looks like
Every failure mode has a documented next step for the user — graceful refusal, escalation, or hand-off.
Common mistake / failure prevented
Silent dead-ends that leave the buyer with nothing actionable when the agent gets confused.
Good design pattern (practical example)
When the agent is uncertain, it returns the answer it has plus a calm "If this is wrong, ask <named role>."
What AgentProof checks
Whether the documented fallback path is wired and tested.
Evidence example to keep
Fallback transcripts + the documented escalation roster.
Assessment question preview
“What happens when the agent does not know the answer?”
6
Test with real business scenarios
What good looks like
The agent has been exercised against a representative scenario set drawn from real (anonymised) user behaviour — not toy prompts.
Common mistake / failure prevented
Lab-only testing that breaks on the first novel real-world prompt.
Good design pattern (practical example)
Customer-support agent is tested against the top 50 anonymised ticket patterns from the last quarter.
What AgentProof checks
Whether the test scenario set is documented and representative.
Evidence example to keep
Test scenario inventory + sample passing/failing transcripts.
Assessment question preview
“Has this agent been tested against real business scenarios?”
7
Keep evidence of decisions
What good looks like
Every decision and action produces an audit-safe trace — sanitised, no PII, no tokens, no secrets.
Common mistake / failure prevented
Loss of accountability when the agent is challenged: "Why did it do that?" has no answer.
Good design pattern (practical example)
Each action stores the reason text, the source pack version, and the reviewer role — no email, no IP.
What AgentProof checks
Whether the audit trace is captured, scrubbed of PII, and retrievable.
Evidence example to keep
Sample audit trace + the retention policy for those traces.
Assessment question preview
“Where is the audit trail of decisions and actions kept?”
8
Monitor after go-live
What good looks like
A defined baseline of normal behaviour and a documented signal-to-action loop for drift.
Common mistake / failure prevented
Late discovery of a quality regression weeks after deployment.
Good design pattern (practical example)
Refusal rate, escalation rate, and source-attribution rate are tracked weekly.
What AgentProof checks
Whether monitoring is wired and whether anyone reads the output.
Evidence example to keep
Sample monitoring dashboard snapshot.
Assessment question preview
“How is the agent monitored after go-live?”
9
Review when the AI landscape changes
What good looks like
A documented cadence for reviewing the agent against new public guidance and vendor changes.
Common mistake / failure prevented
An agent that quietly drifts out of compliance with the surrounding guidance.
Good design pattern (practical example)
Quarterly check against the AgentProof landscape radar; affected reports are reassessed.
What AgentProof checks
Whether the agent's last review date is current and the landscape signals applied.
Evidence example to keep
Review cadence record + last landscape sync date.
Assessment question preview
“When was this agent last reviewed against the latest guidance?”
10
Reassess after major changes
What good looks like
Any material change to the agent — new tool, new data source, new model — triggers a reassessment.
Common mistake / failure prevented
A "small" upgrade silently changes the agent's risk profile.
Good design pattern (practical example)
Adding a write action immediately re-runs the action-taking control set.
What AgentProof checks
Whether the change log triggers reassessment correctly.
Evidence example to keep
Change log + before/after readiness scores.
Assessment question preview
“What change would trigger you to re-review this agent?”

Ready to apply this to your agent?

Turn these 10 principles into a report you can defend.

The free assessment walks the same 10 modules against your specific agent and gives you a credibility-scored report.

Start the assessment →

Ready to assess your own agent?

Start the free assessment to apply this guidance to a real agent. No payment. No public registration required.

Start the free assessment →Preview the full experience