AAgentProof

Training ground

Good agent design — 10 practical principles

Patterns that separate a credible, defensible agent from a brittle demo. Each principle is paired with the failure it prevents, a practical example, and the question AgentProof will ask your team.

Practical, not abstractFailure patterns namedEvidence-ledAssessment preview
  1. 1

    Define the agent's job clearly

    What good looks like

    A one-paragraph scope statement that the team agrees with, written before any tools are wired.

    Common mistake / failure prevented

    Scope creep — an FAQ assistant quietly becomes an action-taker over six months.

    Good design pattern (practical example)

    "This agent answers Tier-1 internal IT questions from published runbooks. It does not draft customer messages and does not raise tickets."

    What AgentProof checks

    Whether the documented scope matches the actions the agent can technically reach.

    Evidence example to keep

    The signed-off scope statement and a snapshot of the agent's tool allow-list.

    Assessment question preview

    What is this agent's documented purpose?

  2. 2

    Know what data it can see

    What good looks like

    Every knowledge source the agent reads is listed, classified, and access-controlled at the source — not at the prompt.

    Common mistake / failure prevented

    Quiet data exfiltration through pasted snippets or unscoped retrieval.

    Good design pattern (practical example)

    Sales follow-up agent reads only the user's own opportunity records — not the global opportunity table.

    What AgentProof checks

    Whether sensitive / personal / regulated data is in scope, and whether access is enforced at the data layer.

    Evidence example to keep

    Knowledge-source inventory + access-control configuration snapshot.

    Assessment question preview

    Which knowledge sources is this agent allowed to read?

  3. 3

    Limit what it can do

    What good looks like

    An explicit allow-list of actions the agent can take. Everything else returns a refusal with an escalation route.

    Common mistake / failure prevented

    The agent technically reaching an action it should never take, because the tool was "there".

    Good design pattern (practical example)

    Ops triage agent can read tickets and post comments, but cannot close, reassign, or escalate them.

    What AgentProof checks

    Whether the action allow-list matches the documented scope.

    Evidence example to keep

    Allow-list snapshot + transcripts showing refusal on out-of-scope actions.

    Assessment question preview

    Are write / workflow actions idempotent (safe to retry without duplicating)?

  4. 4

    Make human oversight explicit

    What good looks like

    Every consequential action passes through a documented human review step — and the reviewer is a named role, not an anonymous queue.

    Common mistake / failure prevented

    An agent committing irreversible writes because the confirmation step looked optional.

    Good design pattern (practical example)

    Finance agent proposes journal entries; entries are committed only after the approver clicks confirm.

    What AgentProof checks

    Whether the human-in-the-loop step is tested, not assumed.

    Evidence example to keep

    Sample approval log with reason text and reviewer role.

    Assessment question preview

    Where is the human approval gate? How is it enforced?

  5. 5

    Design safe fallbacks

    What good looks like

    Every failure mode has a documented next step for the user — graceful refusal, escalation, or hand-off.

    Common mistake / failure prevented

    Silent dead-ends that leave the buyer with nothing actionable when the agent gets confused.

    Good design pattern (practical example)

    When the agent is uncertain, it returns the answer it has plus a calm "If this is wrong, ask <named role>."

    What AgentProof checks

    Whether the documented fallback path is wired and tested.

    Evidence example to keep

    Fallback transcripts + the documented escalation roster.

    Assessment question preview

    What happens when the agent does not know the answer?

  6. 6

    Test with real business scenarios

    What good looks like

    The agent has been exercised against a representative scenario set drawn from real (anonymised) user behaviour — not toy prompts.

    Common mistake / failure prevented

    Lab-only testing that breaks on the first novel real-world prompt.

    Good design pattern (practical example)

    Customer-support agent is tested against the top 50 anonymised ticket patterns from the last quarter.

    What AgentProof checks

    Whether the test scenario set is documented and representative.

    Evidence example to keep

    Test scenario inventory + sample passing/failing transcripts.

    Assessment question preview

    Has this agent been tested against real business scenarios?

  7. 7

    Keep evidence of decisions

    What good looks like

    Every decision and action produces an audit-safe trace — sanitised, no PII, no tokens, no secrets.

    Common mistake / failure prevented

    Loss of accountability when the agent is challenged: "Why did it do that?" has no answer.

    Good design pattern (practical example)

    Each action stores the reason text, the source pack version, and the reviewer role — no email, no IP.

    What AgentProof checks

    Whether the audit trace is captured, scrubbed of PII, and retrievable.

    Evidence example to keep

    Sample audit trace + the retention policy for those traces.

    Assessment question preview

    Where is the audit trail of decisions and actions kept?

  8. 8

    Monitor after go-live

    What good looks like

    A defined baseline of normal behaviour and a documented signal-to-action loop for drift.

    Common mistake / failure prevented

    Late discovery of a quality regression weeks after deployment.

    Good design pattern (practical example)

    Refusal rate, escalation rate, and source-attribution rate are tracked weekly.

    What AgentProof checks

    Whether monitoring is wired and whether anyone reads the output.

    Evidence example to keep

    Sample monitoring dashboard snapshot.

    Assessment question preview

    How is the agent monitored after go-live?

  9. 9

    Review when the AI landscape changes

    What good looks like

    A documented cadence for reviewing the agent against new public guidance and vendor changes.

    Common mistake / failure prevented

    An agent that quietly drifts out of compliance with the surrounding guidance.

    Good design pattern (practical example)

    Quarterly check against the AgentProof landscape radar; affected reports are reassessed.

    What AgentProof checks

    Whether the agent's last review date is current and the landscape signals applied.

    Evidence example to keep

    Review cadence record + last landscape sync date.

    Assessment question preview

    When was this agent last reviewed against the latest guidance?

  10. 10

    Reassess after major changes

    What good looks like

    Any material change to the agent — new tool, new data source, new model — triggers a reassessment.

    Common mistake / failure prevented

    A "small" upgrade silently changes the agent's risk profile.

    Good design pattern (practical example)

    Adding a write action immediately re-runs the action-taking control set.

    What AgentProof checks

    Whether the change log triggers reassessment correctly.

    Evidence example to keep

    Change log + before/after readiness scores.

    Assessment question preview

    What change would trigger you to re-review this agent?

Ready to apply this to your agent?

Turn these 10 principles into a report you can defend.

The free assessment walks the same 10 modules against your specific agent and gives you a credibility-scored report.

Start the assessment →

Ready to assess your own agent?

Start the free assessment to apply this guidance to a real agent. No payment. No public registration required.

A few honest things about AgentProof

  • · AgentProof is a readiness assessment, not an official audit.
  • · Every recommendation cites the intelligence pack version it came from.
  • · Intelligence updates go through a human review gate.
  • · AgentProof does not speak on behalf of Microsoft or any vendor.