AgentProof

Admin preview · Product blueprint

What AgentProof is meant to become

The full product framework, the complete customer journey, and a readiness matrix. Every status label is honest. No fake operational claims. No buried defects. Pick a journey, build it until operational, prove it, only then move on.

Active defect audit · review_answer_preservation

Why reviewed agents still show "Not captured in this review" on the live URL

Fix applied — awaiting live proof

Symptom: After a buyer scores an agent, signs out, and signs back in, the previously-captured answers render as the "Legacy answer unavailable. Not captured in this review." badge instead of the saved values. Affects both NEW (1.1.0) and OLD (1.0.0) reviews, but for different reasons.

Audited modules

  • lib/agentproof/review/answer_recovery_v1.ts
  • lib/agentproof/review/legacy_answer_recovery_v1.ts (S34AH legacy cascade)
  • lib/agentproof/review_persistence.ts (CapturedAnswerSnapshot)
  • components/form/JsonPasteScoreCard.tsx (LOAD effect + render-time recovery + legacy fallback wiring)

Findings (7)

  1. F1_recovery_module_correct. The recovery module's six strategies (exact_id → canonical_id → raw_id_reverse → deduped_id → text_hash → legacy_alias) are individually correct and deterministic. The 26 S34AD unit tests pass.

    Evidence: lib/agentproof/review/answer_recovery_v1.ts L164-L228 · tests/unit/phase_1g_s34ad_answer_recovery.test.ts

  2. F2_hydration_effect_correct. The reviewed-summary hydrate effect surfaces the snapshot's rich_answers verbatim. The per-agent LOAD effect correctly merges in-progress storage UNDER the snapshot (snapshot wins). State management is sound.

    Evidence: components/form/JsonPasteScoreCard.tsx L1579-L1666 (hydrate effect) · components/form/JsonPasteScoreCard.tsx L1690-L1735 (LOAD merge effect)

  3. F3_text_hash_index_dead. The text-hash recovery index was built with a label resolver scoped to ACTIVE question ids only: `(k) => s34adActiveQuestionLabels.get(k)`. So it could only resolve a label for a snapshot key that was ALSO a currently-active question id — which made the text-hash strategy redundant with exact_id and structurally unable to recover answers whose persistence-time id no longer matched any current id.

    Evidence: components/form/JsonPasteScoreCard.tsx L5786-L5789 (pre-fix)

  4. F4_snapshot_carried_no_labels. `CapturedAnswerSnapshot` (snapshot_version 1.0.0) carried `rich_answers` (id → answer) but no `question_labels` (id → label). Without the labels persisted at write-time, no read-time strategy could resolve a historical id to its label, breaking the text-hash fallback entirely.

    Evidence: lib/agentproof/review_persistence.ts L131-L136 (pre-fix CapturedAnswerSnapshot)

  5. F5_diagnostic_panel_only_for_reviewed. The S34AD diagnostic panel renders only when `s34adIsReviewedReadOnly` is true. If `loadLatestReviewSummaryForAgent` returns `reviewed: false` (e.g., env/agent id mismatch between save-time and read-time, or signing in on a different device where localStorage is empty), the panel is hidden and the founder sees no clue about what failed.

    Evidence: components/form/JsonPasteScoreCard.tsx L5819-L5822 (s34adIsReviewedReadOnly gate) · components/form/JsonPasteScoreCard.tsx L6266-L6319 (panel only renders when gate is true)

  6. F6_old_1_0_0_reviews_have_no_labels. The S34AG fix protects NEW reviews (snapshot_version 1.1.0 with question_labels) but does nothing for OLD reviews already on disk at 1.0.0. Those records carry rich_answers + legacy_answers but no labels, so the text-hash strategy cannot fire even after S34AG. Without a separate path that does NOT depend on question_labels, every pre-0.176.0 review keeps showing the "Not captured" badge.

    Evidence: lib/agentproof/review_persistence.ts L131-L156 (snapshot shape — labels optional) · components/form/JsonPasteScoreCard.tsx L5875-L5879 (S34AG-only path)

  7. F7_multiple_legacy_sources_already_persist_answers. Even when captured_answers is empty / mismatched, the buyer's answers usually live in OTHER stores: agentproof.rich_answers.v1::<env>::<agent> (per-agent rich answers — written live during the wizard), agentproof.answers.v1::<env>::<agent> (per-agent legacy tri-state), the report_markdown_body field on the review record (renders `- **<label>:** <value>` for every answer), and every historical entry in agentproof.report_history.v1::<env>::<agent>. The S34AD cascade ignored all of these.

    Evidence: lib/agentproof/per_agent_answers.ts (per-agent stores) · lib/reporting/agentproof_readiness_markdown_report.ts L346-L359 (markdown answer lines) · lib/agentproof/review_persistence.ts L308-L312 + L634+ (report history)

Root cause

Two compounding failures. PRIMARY: text-hash recovery was structurally broken because snapshots persisted no question_labels and the read-time label resolver was scoped to active ids only — S34AG fixed this for NEW (1.1.0) reviews. LEGACY: the recovery cascade only consulted the captured_answers snapshot. Every other persisted source (per-agent rich answers, per-agent legacy answers, latest report markdown body, report history with its own captured_answers / markdown bodies) was ignored — so OLD (1.0.0) reviews where the snapshot's ids no longer match active ids still cascade to `unrecoverable`. S34AH adds the legacy fallback cascade so OLD reviews recover from whichever source has the answer. Cross-device sync via Supabase remains outstanding.

Fix applied (6 steps)

  1. 1. Extend CapturedAnswerSnapshot (S34AG)

    lib/agentproof/review_persistence.ts

    Added optional `question_labels?: Record<string, string>` and bumped allowed snapshot_version to also accept "1.1.0". Legacy "1.0.0" records remain readable.

  2. 2. Add buildTextHashIndexFromSnapshotLabels() (S34AG)

    lib/agentproof/review/answer_recovery_v1.ts

    New helper that builds the text-hash index using the snapshot's OWN question_labels map as the primary source, with active-question labels as a fallback for legacy 1.0.0 snapshots.

  3. 3. Stamp labels at both save points (S34AG)

    components/form/JsonPasteScoreCard.tsx

    Both the modern persist path and the legacy-fallback persist path now write snapshot_version: "1.1.0" AND a `question_labels` map built from the current deduped question list. New snapshots are text-hash recoverable.

  4. 4. Use snapshot labels at render-time (S34AG)

    components/form/JsonPasteScoreCard.tsx

    The text-hash index is now built via buildTextHashIndexFromSnapshotLabels(), threading the snapshot's question_labels (if present) and falling back to active labels otherwise. This makes the text-hash strategy actually capable of recovering cross-id matches.

  5. 5. Add legacy_answer_recovery_v1 module (S34AH)

    lib/agentproof/review/legacy_answer_recovery_v1.ts

    New module exporting recoverRichAnswerWithLegacyFallbacks() + recoverLegacyAnswerWithLegacyFallbacks() that extend the standard cascade with three new strategies — per_agent_storage, report_markdown, history_walk. Includes extractAnswersFromReportMarkdown() that parses `- **<label>:** <value>` lines out of the saved report Markdown body, and coerceMarkdownAnswerToLegacy() for the yes/no/not_sure path. None of these strategies require question_labels — they work on legacy 1.0.0 records.

  6. 6. Wire S34AH legacy cascade into the render-time recovery

    components/form/JsonPasteScoreCard.tsx

    Imports the new S34AH helpers. Assembles `s34ahLegacyInputs` from loadPerAgentRichAnswers + loadPerAgentAnswers + microsoftReviewedAgentSummary.report_markdown_body + loadReportHistory. Both the pre-compute loop and the per-question renderer now call recoverRichAnswerWithLegacyFallbacks / recoverLegacyAnswerWithLegacyFallbacks instead of the bare recoverRichAnswer / recoverLegacyAnswer. The "Not captured in this review" badge shows ONLY when EVERY primary AND legacy strategy returns unrecoverable.

Outstanding before we call this complete

  • Live walk-through on the Railway URL with a real signed-in user, opening an OLD (pre-0.176.0) reviewed agent and seeing the prior answers restored — confirms the S34AH legacy cascade works against real data on the live deploy.
  • Cross-device persistence: this fix solves the SAME-BROWSER case (localStorage retained across sign-out / sign-in). Cross-device requires Supabase as the source of truth for captured_answers + per-agent answers + report markdown body. Currently localStorage-only. Tracked as the next pending defect.
  • Truly unrecoverable answers — when an old review has no per-agent storage, no report markdown body, no history, and no captured_answers — must show the explicit "Legacy answer unavailable" badge instead of a blank control. Verified by the S34AH unrecoverable-path test.

Next action: Founder opens an OLD reviewed agent on the live URL. Expected: the answers reappear via the S34AH legacy cascade (most likely from the report_markdown_body or per_agent_storage strategy). Claude captures a screenshot of restored answers + the diagnostic panel showing 0 unrecoverable. Status flips to fix_proven_live. If any answer remains unrecoverable, the founder sees the explicit "Legacy answer unavailable" badge — which is the honest state for that one question only.

Product map — the 11 areas

Every area of the intended product, with its sub-modules, an honest operational status, the route it lives at, and the next action required. No fake operational claims.

Public site

The public-facing surface a buyer or evaluator visits before signing in. Explains what AgentProof is, what readiness means, and how the assessment works. No buyer login required.

Partially operational

8/9 sub-modules operational

  • Home / landing/

    OperationalOwner: Claude

    Hero, four Learn cards, public-vs-workspace panel, trust strip, footer. Mounted via AppHeader route-aware shell.

    Next action: None — this surface is shipped and stable.

  • Readiness explanation/agentic-ai-readiness

    OperationalOwner: Founder

    Explains the AgentProof readiness model to a public visitor.

    Next action: Founder to confirm copy is aligned with current methodology.

  • Learn centre/learn

    OperationalOwner: Claude

    Polished gradient hero, seven journey stages, six training tracks, sticky section ribbon. Wired into 7 sub-routes.

    Next action: None — Learn centre is shipped.

  • Capability zones (Informational / Assisted / Action-taking)/learn/capability-zones

    OperationalOwner: Claude

    Three-zone framing with risk profile, examples, what-good-includes.

    Next action: None.

  • Good agent design/learn/good-agent-design

    OperationalOwner: Claude

    Ten modules, each with what-this-is, why-it-matters, signal-of-good-design.

    Next action: None.

  • Controls and oversight/learn/controls-and-oversight

    OperationalOwner: Claude

    Six control families, maturity ladder, before-go-live checklist.

    Next action: None.

  • AI Radar overview (public)/learn/ai-landscape-radar

    Partially operationalOwner: Founder

    R13-A replaced the static preview with the RadarOperationalStatusPanel that shows three honest states (engine off / configured no run / active). Until the founder runs the first source check, the live state will be 'configured — no successful run yet'.

    Next action: Founder runs the first radar check from /admin/intelligence-ops to flip the state to 'active'.

  • Demo entry/demo

    OperationalOwner: Founder

    R13 made demo default-on. R14 added an explicit honest diagnostic to /demo so if the founder DOES see the OFF state, the page names the exact Railway env var that's set to false and tells them what to remove.

    Next action: If /demo shows OFF, remove AGENTPROOF_DEMO_MODE_ENABLED=false from Railway Variables.

  • Trust / help / setup status/beta/trust

    OperationalOwner: Claude

    Trust page, help page, and the /setup-status diagnostic page are all mounted.

    Next action: None.

Demo journey

A buyer-runnable, no-login, sample-data flow that shows AgentProof end-to-end before they invest in a real assessment. Demo data must be clearly isolated from real customer data.

Founder preview

0/5 sub-modules operational

  • Sample agents catalogue

    Partially operationalOwner: Founder

    Sample agent data exists in lib/agentproof/demo/. Surfaces require AGENTPROOF_DEMO_MODE_ENABLED=true.

    Next action: Founder enables the demo flag, OR Claude makes demo safe-by-default.

  • Guided sample assessment

    Founder previewOwner: Claude

    Trial assessment page exists at /trial/assessment with a sample three-question slice.

    Next action: Promote to operational once full guided flow is wired to actions.

  • Sample report/trial/report

    Founder previewOwner: Founder

    Canonical sample report with ReadinessScoreRing, findings, evidence panel, radar panel.

    Next action: Founder to confirm the sample reflects buyer-grade polish standard.

  • Sample improvement guidance/trial/improvement

    Founder previewOwner: Claude

    ImprovementGuidancePanel with three sample cards.

    Next action: Promote to operational when improvement cycles are persisted.

  • Demo data isolation

    Partially operationalOwner: Claude

    Demo data lives under separate types and lib/agentproof/demo/. No demo write hits the live workspace tables. Isolation verified at code level; not yet proven by a live walk-through.

    Next action: Run a live demo walk-through and document the isolation proof.

Login / account

Buyer-facing authentication. Magic link only, no password. Supabase EU auth with a workspace-scoped account record.

Blocked by configuration

0/4 sub-modules operational

  • Login (magic link)/login

    Blocked by configurationOwner: Founder

    Magic link flow is implemented. Auth is gated by AGENTPROOF_SUPABASE_AUTH_ENABLED + the Supabase env vars. Currently AMBER on /system-health.

    Next action: Founder sets AGENTPROOF_SUPABASE_AUTH_ENABLED=true on Railway and runs a real signed-in walk-through.

  • Auth callback/auth/callback

    Partially operationalOwner: Founder

    S34AC-R9 patched to use the proxy-forwarded origin so Railway callbacks work. Awaiting a live signed-in cycle to confirm.

    Next action: Founder runs the signed-in walk-through.

  • Signed-in state + sign out

    Blocked by configurationOwner: Founder

    Workspace pages render the signed-in workspace header when AGENTPROOF_SUPABASE_AUTH_ENABLED is true. Untested end-to-end.

    Next action: Founder enables auth flag and signs in once.

  • Account → workspace entry/workspace

    Blocked by configurationOwner: Founder

    Routes through workspace home after successful auth.

    Next action: Founder enables auth flag.

Workspace

The signed-in surface where a buyer manages their agents, environments, and connectors. Microsoft is the first real connector path; others are provider-agnostic placeholders.

Blocked by configuration

0/12 sub-modules operational

  • Workspace home/workspace

    Blocked by configurationOwner: Founder

    EstateHomeDashboard renders inside workspace. Requires auth + Supabase persistence.

    Next action: Founder enables auth.

  • Agents list/dashboard/estate

    Partially operationalOwner: Founder

    Estate view renders with the unified ReadinessScoreRing. Behaviour against real Supabase data still unverified.

    Next action: Live walk-through with auth enabled.

  • Environments (per connector)

    PlannedOwner: Claude

    Microsoft Power Platform environments discovery exists in lib/connectors/microsoft. UI surface not yet user-facing.

    Next action: Build /workspace/environments surface in next cycle.

  • Add manual non-Microsoft agent/workspace/manual-agent/new

    Partially operationalOwner: Founder

    R7 wired the manual-agent form to a server action. Persistence path exists. Not yet proven by a live signed-in walk-through.

    Next action: Live walk-through that creates an agent, signs out, signs back in, sees it.

  • Microsoft connector (first real provider path)/workspace/microsoft-readiness

    Blocked by configurationOwner: Founder

    Six connector libs (microsoft_auth_config, power_platform_client, dataverse_client, copilot_studio_discovery, to_canonical_footprint, errors). Five API routes. Tokens never leave the server.

    Next action: Founder configures the six MICROSOFT_* env vars on Railway and tests the read-only discovery flow.

  • OpenAI connector

    PlannedOwner: Deferred

    Placeholder in the connector-agnostic registry. Not built.

    Next action: Deferred until Microsoft path is proven end-to-end with a real customer.

  • Anthropic connector

    PlannedOwner: Deferred

    Placeholder. Not built.

    Next action: Deferred.

  • Google connector

    PlannedOwner: Deferred

    Placeholder. Not built.

    Next action: Deferred.

  • AWS Bedrock connector

    PlannedOwner: Deferred

    Placeholder. Not built.

    Next action: Deferred.

  • Salesforce connector

    PlannedOwner: Deferred

    Placeholder. Not built.

    Next action: Deferred.

  • ServiceNow connector

    PlannedOwner: Deferred

    Placeholder. Not built.

    Next action: Deferred.

  • Custom in-house agent

    PlannedOwner: Deferred

    Handled today via the manual non-Microsoft agent path. Dedicated connector surface not built.

    Next action: Defer dedicated connector until manual path is proven.

Assessment / review journey

The core scoring flow. Classify the agent's capability zone, answer the questions, attach evidence, generate a deterministic score, and preserve every previously-given answer.

Partially operational

3/7 sub-modules operational

  • Agent classification (capability zone)

    OperationalOwner: Claude

    Four-question classifier maps the agent to Informational / Assisted Work / Action-taking. Used by the scoring engine.

    Next action: None.

  • Questions + evidence capture

    OperationalOwner: Claude

    Question bank, evidence fields, confidence pickers all wired.

    Next action: None.

  • Review summary

    Partially operationalOwner: Founder

    Review summary surface renders. Display of previously-scored answers is the active defect being fixed by R13 + R13-A recovery layer.

    Next action: Founder confirms previously-scored answers now appear after sign-out → sign-in. (Currently STILL FAILING per founder report.)

  • Review history (per agent)

    Partially operationalOwner: Claude

    ReviewSnapshot model supports multiple committed snapshots with re-score clones. UI history list not yet a dedicated route.

    Next action: Build /workspace/reports/[agentId]/history.

  • Re-score journey

    Partially operationalOwner: Claude

    cloneSnapshotForRescore() exists. UI affordance to start a re-score from a prior report is not yet a button on the report page.

    Next action: Add 'Re-score this agent' CTA to /workspace/reports/[reportId].

  • Previous answer preservation

    Partially operationalOwner: Claude

    R13 S34AD multi-strategy recovery + S34AG snapshot-label fix (NEW 1.1.0 reviews) + S34AH legacy fallback cascade (OLD 1.0.0 reviews). The S34AH layer pulls from per-agent rich answers, per-agent legacy answers, the saved report markdown body, and the full report history. "Legacy answer unavailable" appears ONLY when every primary AND legacy strategy returns unrecoverable. See the Active Defect Audit panel on /founder/product-blueprint for the full audit.

    Next action: Founder opens an OLD reviewed agent on the live URL; answers should appear via the S34AH legacy cascade. Claude captures screenshot proof. Until then this stays partially_operational.

  • Review snapshot status (committed / immutable)

    OperationalOwner: Claude

    ReviewSnapshot model carries committed_at, snapshot_status, immutability refusal in saveReviewSnapshot.

    Next action: None.

Reports

The buyer-grade artefact AgentProof produces. Executive summary, score, findings linked to evidence, recommendations, improvement plan, methodology version. Print + share ready.

Partially operational

6/10 sub-modules operational

  • Report dashboard (per workspace)/workspace/reports

    Blocked by configurationOwner: Founder

    Lists reports per workspace. Requires auth + persistence.

    Next action: Founder enables auth.

  • Report detail page

    Partially operationalOwner: Founder

    Detail page renders for the sample/demo path. Live-data path unverified.

    Next action: Live walk-through that opens a real report.

  • Executive summary

    OperationalOwner: Claude

    Summary block ships with score ring + band + headline statement.

    Next action: None.

  • Score + band (unified ring)

    OperationalOwner: Claude

    Canonical ReadinessScoreRing used on every surface.

    Next action: None.

  • Findings linked to Learn + control family + improvement action

    OperationalOwner: Claude

    ReportFindingsWithGuidancePanel renders all findings with related_learn + control_family + improvement_action.

    Next action: None.

  • Evidence expectations

    OperationalOwner: Claude

    EvidenceExpectationsPanel shows good-example / weak-example / how-AgentProof-uses-it.

    Next action: None.

  • Recommendations

    OperationalOwner: Claude

    Renders in report alongside improvement guidance panel.

    Next action: None.

  • Improvement plan (in report)

    Partially operationalOwner: Claude

    Sample improvement cards render. Persisted improvement cycles per agent are partially wired (see Improvement cycles area).

    Next action: Wire live improvement cycle persistence.

  • Methodology / version block

    OperationalOwner: Claude

    Each report carries methodology_version + scoring_version + intelligence_pack_version + report_version.

    Next action: None.

  • Export / print readiness

    Partially operationalOwner: Founder

    S33L/S33M shipped @page CSS + cover page + section dividers + editorial blocks. Browser print confirmed locally; live confirmation pending.

    Next action: Founder runs print from /workspace/reports/[id] and confirms layout.

Improvement cycles

Turn report findings into trackable improvement actions with evidence-to-collect, owner, status, and reassessment guidance.

Partially operational

4/5 sub-modules operational

  • Recommended actions per finding

    OperationalOwner: Claude

    ImprovementGuidancePanel surfaces the why-it-matters / what-good-looks-like / evidence-to-collect / reassess-after / related-Learn structure.

    Next action: None.

  • Owner / status / date fields

    Founder previewOwner: Claude

    Sample card structure exists. Per-customer persistence of owner + status + date is not yet a live surface.

    Next action: Wire improvement_cycle table writes from the workspace UI.

  • Evidence to collect

    OperationalOwner: Claude

    Documented per improvement card.

    Next action: None.

  • Reassessment guidance

    OperationalOwner: Claude

    Each card states reassess_after + ties to radar status if relevant.

    Next action: None.

  • Link back to report findings

    OperationalOwner: Claude

    Each improvement card references the originating finding via related_learn + related_control_family.

    Next action: None.

AI Radar

Controlled source-watchlist monitoring with change detection, human review queue, and versioned intelligence packs. Not broad web scraping. Not real-time live monitoring.

Partially operational

5/7 sub-modules operational

  • Current honest status (3-state banner)/learn/ai-landscape-radar

    OperationalOwner: Claude

    RadarOperationalStatusPanel renders one of three states: engine off / configured no run / active monitoring.

    Next action: None.

  • Source watchlist (16 sources, 7 categories)

    OperationalOwner: Claude

    RADAR_SOURCE_REGISTRY: EU AI Act, NIST AI RMF, ISO 42001, OpenAI/Anthropic/Google/AWS Bedrock blogs, OWASP LLM, MITRE ATLAS, MS Copilot Studio docs, LangChain/LlamaIndex releases, AUPs.

    Next action: None.

  • Change detection (SHA-256 + allowlist)

    OperationalOwner: Claude

    source_check_engine_v1 hashes content, detects no-change / first-check / change-detected. Allowlist-only.

    Next action: None.

  • Human review queue/admin/intelligence-ops

    Blocked by configurationOwner: Founder

    RadarReviewQueue with approve / reject / pick-up. Requires AGENTPROOF_ADMIN_TOOLS_ENABLED=true.

    Next action: Founder enables the admin tools flag.

  • Intelligence pack versioning (semver)

    OperationalOwner: Claude

    intelligence_pack_version_v1: append-only, bump rules, initial 1.0.0.

    Next action: None.

  • Report impact warnings

    Partially operationalOwner: Claude

    computeReportImpact() emits warnings for zone + control-family overlap. UI surface that injects these into reports is not yet wired.

    Next action: Wire report_impact_v1 output into /workspace/reports/[id].

  • Clear preview/real separation

    OperationalOwner: Claude

    Three-state panel always tells the buyer exactly which mode is live. No fake live claim possible.

    Next action: None.

Methodology / version governance

Every report records the methodology pack version, scoring model version, intelligence pack version, and report version. Old reports stay immutable. Compatibility warnings surface when a newer pack would change the answer.

Partially operational

4/6 sub-modules operational

  • Methodology pack version

    OperationalOwner: Claude

    Tracked on every committed review_snapshot.

    Next action: None.

  • Scoring model version

    OperationalOwner: Claude

    engine_version pinned across the deterministic scoring path.

    Next action: None.

  • Intelligence pack version

    OperationalOwner: Claude

    INITIAL_PACK_VERSION_LABEL = 1.0.0, bumpPackVersionLabel handles minor/patch.

    Next action: None.

  • Report version

    OperationalOwner: Claude

    Each report carries an explicit report_version field.

    Next action: None.

  • Change history

    Founder previewOwner: Claude

    Methodology changelog exists in content/. UI surface that renders the history for a buyer is not yet a dedicated page.

    Next action: Build /learn/methodology-history.

  • Compatibility warnings (newer pack → reassess)

    Partially operationalOwner: Claude

    report_impact_v1 computes reassessment_recommended_overall. Surfacing it into the report page is the next step.

    Next action: Wire into report detail page.

Admin / founder operations

Founder-only operational surfaces: system health, setup status, intelligence ops, release readiness, evidence of live checks, and the product completion matrix.

Partially operational

3/6 sub-modules operational

  • System health (active probes)/system-health

    OperationalOwner: Claude

    R12 shipped active probes against Supabase + the deploy. Verdict groups: working / configured-untested / missing / needs-founder-action / needs-claude-action / sample-only.

    Next action: None.

  • Setup status diagnostic/setup-status

    OperationalOwner: Claude

    Flag-free diagnostic surface — always loads. Tells founder exactly what is configured.

    Next action: None.

  • Intelligence ops (admin review)/admin/intelligence-ops

    Blocked by configurationOwner: Founder

    Mounts RadarReviewQueue + admin tooling. Gated by AGENTPROOF_ADMIN_TOOLS_ENABLED.

    Next action: Founder enables the admin tools flag.

  • Release readiness

    Partially operationalOwner: Claude

    S34J/S34K shipped a release-gate enforcement layer. Production_launch_decision module gates the public_beta_ready signal. Operational, not yet surfaced as a single founder-readable page.

    Next action: Build /admin/release-readiness summary page.

  • Evidence of live checks

    Founder previewOwner: Claude

    Active probes ship; their evidence is shown on /system-health. A consolidated 'evidence vault' surface exists in lib but is not founder-routable.

    Next action: Expose evidence vault contents at /admin/evidence-vault.

  • Product Completion Matrix/founder/product-blueprint

    OperationalOwner: Claude

    Visible matrix lives on the Founder Preview / Product Blueprint page. Source of truth for what is real, partial, planned, or deferred.

    Next action: None — shipped in R13-FRAMEWORK.

Commercial / private pilot readiness

What is ready for a real private pilot with a real buyer. What still needs founder action. What needs Claude action. What is deferred. No payments, no public registration in this cycle.

Deferred

0/5 sub-modules operational

  • Ready for pilot (proven)

    Partially operationalOwner: Claude

    Public Learn surface, AI Radar 3-state panel, system-health, and the deterministic scoring engine are pilot-ready.

    Next action: None for these surfaces.

  • Not yet ready for pilot

    Partially operationalOwner: Claude

    Auth, persistence end-to-end proof, manual non-MS persistence proof, demo default-on, review answer preservation visible after sign-out/in. These are the active gating items.

    Next action: See sub-modules in Assessment, Workspace, and Login areas.

  • Needs founder action

    Partially operationalOwner: Founder

    Flip the auth flag. Configure the Microsoft env vars. Run the live signed-in walk-through. Confirm visible answer preservation. Decide whether demo is default-on.

    Next action: Founder works the punch list on /system-health.

  • Needs Claude action

    Partially operationalOwner: Claude

    Prove the answer-preservation fix on the live URL, finalise improvement-cycle persistence UI, wire report-impact warnings into the report page.

    Next action: Claude works the build punch list once founder unblocks auth.

  • Deferred (intentional)

    DeferredOwner: Deferred

    Payments. Pricing pages. Public registration. Non-Microsoft connectors (OpenAI/Anthropic/Google/AWS/Salesforce/ServiceNow). Custom-agent dedicated connector. Public beta launch.

    Next action: Revisit only after first paid pilot is committed.

Target buyer journey (15 stages)

Home → Learn → Demo or Trial → Login → Workspace → Add agent → Classify → Assess → Generate report → Read findings → Improvement plan → Return later → Re-score → Compare history → Radar / methodology prompts reassessment.

  1. Stage 1

    1. Arrive at Home

    Surface: /

    Operational
    Buyer does
    Lands on the public site, sees what AgentProof is, and decides whether to Learn more, run the Demo, or Start Trial.
    AgentProof does
    Renders the public landing with hero, four Learn cards, trust strip, and provider-agnostic positioning chip.

    Next action: None.

  2. Stage 2

    2. Read the Learn centre

    Surface: /learn

    Operational
    Buyer does
    Reads the capability-zone framing, good-agent-design modules, controls + oversight, AI Radar overview, and the reference library.
    AgentProof does
    Surfaces all 7 Learn pages with consistent gradient hero, sticky section ribbon, prev/next, and cross-links.

    Next action: None.

  3. Stage 3

    3. Choose Demo or Start Trial

    Surface: /demo or /trial

    Partially operational
    Buyer does
    Picks Demo (sample-data preview, no login) or Start Trial (real assessment, login required).
    AgentProof does
    Demo path requires AGENTPROOF_DEMO_MODE_ENABLED. Trial path leads to login.

    Next action: Founder decides whether demo is default-on, or whether trial is the primary path.

  4. Stage 4

    4. Sign in (magic link)

    Surface: /login → /auth/callback

    Blocked by configuration
    Buyer does
    Enters email, opens magic link, lands signed-in.
    AgentProof does
    Sends Supabase magic link, exchanges code for session at /auth/callback using the proxy-forwarded origin (S34AC-R9 fix).

    Next action: Founder enables AGENTPROOF_SUPABASE_AUTH_ENABLED and runs the live signed-in cycle.

  5. Stage 5

    5. Land in the workspace

    Surface: /workspace

    Blocked by configuration
    Buyer does
    Sees their workspace home with the estate dashboard.
    AgentProof does
    Loads workspace-scoped agents, environments, and reports. Renders unified ReadinessScoreRing.

    Next action: Founder enables auth.

  6. Stage 6

    6. Add an agent

    Surface: /workspace/manual-agent/new (or Microsoft connector)

    Partially operational
    Buyer does
    Either adds a manual non-Microsoft agent or connects via the Microsoft Copilot Studio connector.
    AgentProof does
    Manual: posts to the server action and persists to Supabase. Microsoft: discovery flow asks Microsoft for environments + agents.

    Next action: Founder proves manual non-MS persistence end-to-end with a live signed-in walk-through.

  7. Stage 7

    7. Classify the agent

    Surface: /workspace/[agentId]/classify

    Operational
    Buyer does
    Answers the four classification questions to map the agent to a capability zone.
    AgentProof does
    Maps responses to Informational / Assisted Work / Action-taking via the deterministic classifier.

    Next action: None.

  8. Stage 8

    8. Assess (answer the question bank)

    Surface: /workspace/[agentId]/review

    Partially operational
    Buyer does
    Goes through the question bank, attaches evidence, sets confidence per answer.
    AgentProof does
    Stores every answer in the workspace-scoped review snapshot. Saves progress between sessions.

    Next action: Founder confirms previously-scored answers actually reappear on sign-out → sign-in (currently failing per founder report).

  9. Stage 9

    9. Generate report

    Surface: /workspace/reports/[reportId]

    Partially operational
    Buyer does
    Clicks generate, reads the report.
    AgentProof does
    Runs the deterministic scoring engine, mints a methodology+pack-versioned report, locks the snapshot.

    Next action: Confirm live report path end-to-end with auth on.

  10. Stage 10

    10. Read findings + evidence

    Surface: /workspace/reports/[reportId]

    Operational
    Buyer does
    Reads the executive summary, score band, findings with related Learn + control family + evidence + improvement action.
    AgentProof does
    Renders ReportFindingsWithGuidancePanel + EvidenceExpectationsPanel + the methodology version block.

    Next action: None (sample path proven; live path follows when auth is on).

  11. Stage 11

    11. Work the improvement plan

    Surface: /workspace/improvement/[reportId]

    Partially operational
    Buyer does
    Picks improvement actions, records owner / status / date, collects evidence.
    AgentProof does
    Surfaces improvement cards with reassess-after + evidence-to-collect. Persists per-customer state when wired.

    Next action: Wire live improvement-cycle persistence UI.

  12. Stage 12

    12. Return later

    Surface: /login → /workspace

    Partially operational
    Buyer does
    Signs in again, expects everything to be where it was.
    AgentProof does
    Restores workspace-scoped state, including every previously-scored answer.

    Next action: PROVE the answer-preservation path on the live URL with a signed-in cycle.

  13. Stage 13

    13. Re-score

    Surface: /workspace/reports/[reportId] → Re-score CTA

    Partially operational
    Buyer does
    Decides to re-score (new methodology, agent changed, periodic refresh).
    AgentProof does
    cloneSnapshotForRescore() produces a NEW draft review_id while the prior snapshot stays immutable.

    Next action: Add the Re-score CTA + history list UI.

  14. Stage 14

    14. Compare report history

    Surface: /workspace/reports/[agentId]/history

    Planned
    Buyer does
    Compares old vs new scores, sees what changed and why.
    AgentProof does
    Lists committed snapshots per agent. Shows methodology / pack version per snapshot.

    Next action: Build the history list page.

  15. Stage 15

    15. Radar / methodology prompts reassessment

    Surface: /workspace/reports/[reportId] (warning banner)

    Partially operational
    Buyer does
    Sees a warning that a new approved radar item or a new methodology pack version may affect this report.
    AgentProof does
    report_impact_v1 computes overlap; emits a banner on the report page when reassessment is recommended.

    Next action: Wire the report-impact banner into the report detail page.

Product completion matrix (14 rows)

The master scorecard. Each product area, its target end-state, current status, operational proof required, persistence requirement, visual acceptance, next action, and owner. Known active defects (review answer preservation, demo, manual non-MS persistence, AI Radar first-run, reports polish) appear here and are NOT buried.

Public navigation

OperationalOwner: Claude
Target end-state
Sticky top nav + four-column footer reach every public page. No orphan routes.
Operational proof required
PublicSiteHeader + Footer mount via AppHeader on every public route.
Data / persistence requirement
None.
Visual acceptance required
Founder confirms nav links work from every public page.

Next action: None — shipped at S34AA.

Login (magic link)

Blocked by configurationOwner: Founder
Target end-state
Buyer enters email, opens magic link, lands signed-in in the workspace.
Operational proof required
Live walk-through: enter email → click link → land in /workspace as the signed-in user.
Data / persistence requirement
Supabase auth tables + workspace tables wired.
Visual acceptance required
Workspace header shows signed-in identity, sign-out works.

Next action: Founder sets AGENTPROOF_SUPABASE_AUTH_ENABLED=true on Railway.

Supabase persistence

Partially operationalOwner: Founder
Target end-state
Every workspace write (agents, reviews, reports, improvement cycles, radar) lands in Supabase EU and survives a sign-out / sign-in.
Operational proof required
R14 (S34AI) shipped 14 declarative probes + data-source chip + migration planner. R15 (S34AJ) shipped the live persistence probe (/api/system-health/persistence) + signed-in resolver + migration executor. R16 (S34AK) ships the per-surface compliance registry: every live read/write surface is honestly declared as Supabase-first, supabase-via-server-action, local-only-intentional (demo/trial), local-only-defect (awaiting R17 rewire), or scheduled_for_rewire. /system-health renders the compliance panel with violation count. R16 wires the manual-agent server action + radar API + persistence probe as the operational compliant surfaces; JsonPasteScoreCard hydrate + dashboard/estate + workspace home + per-agent report history are honestly declared as remaining defects with named next-action.
Data / persistence requirement
All 7 migrations applied (0001-0007). RLS verified. AGENTPROOF_SUPABASE_AUTH_ENABLED=true on Railway.
Visual acceptance required
Founder sees the persistence proof panel on /system-health flip from 'configured — untested' to 'operational' once the live walk-through runs. Data-source chips on workspace + manual agent + review surfaces show 'Supabase-backed'.

Next action: Founder flips auth flag, runs the live walk-through (create manual agent → refresh → sign out → sign back in → review → report). Claude captures the screenshot proof.

Manual non-Microsoft agent

Partially operationalOwner: Founder
Target end-state
Buyer adds a manual agent, signs out, signs in, sees it. Persisted in Supabase, scoped to workspace.
Operational proof required
R7 wired form → server action → repository. Live signed-in walk-through still needed.
Data / persistence requirement
manual_agents table with workspace_id scoping.
Visual acceptance required
Agent appears in /dashboard/estate after re-login.

Next action: Founder runs the live walk-through once auth is on; Claude documents the screenshot proof.

Demo

OperationalOwner: Founder
Target end-state
Buyer can run a sample-data flow end-to-end with no login. Demo data clearly isolated.
Operational proof required
R13 made demo default-on. R14 added an explicit honest diagnostic to the /demo page that names the Railway env var and tells the founder exactly what to do if disabled.
Data / persistence requirement
None — demo writes do NOT hit the live workspace tables.
Visual acceptance required
Founder confirms /demo loads sample flow without login. If founder sees the disabled state, the diagnostic on the page explains the exact Railway action.

Next action: If /demo shows the OFF state, remove AGENTPROOF_DEMO_MODE_ENABLED=false from Railway Variables.

Trial

Founder previewOwner: Founder
Target end-state
Buyer can walk a full trial workspace, sample assessment, sample report, sample improvement.
Operational proof required
Trial workspace pages render. Sample data only — no real persistence.
Data / persistence requirement
None — sample data only.
Visual acceptance required
Founder confirms trial pages reflect the buyer-grade polish standard.

Next action: Founder reviews /trial, /trial/workspace, /trial/report, /trial/improvement.

Review answer preservation

Partially operationalOwner: Claude
Target end-state
Every previously-scored answer is visible after sign-out / sign-in. Never blank, never lost.
Operational proof required
R13-FRAMEWORK / S34AG fixed NEW reviews via snapshot_version 1.1.0 + question_labels. S34AH adds the legacy fallback cascade for OLD (1.0.0) reviews — per_agent_storage, report_markdown, and history_walk strategies pull from per-agent localStorage, the saved report markdown body, and the report history. The "Not captured in this review" badge now appears ONLY when EVERY primary AND legacy strategy genuinely returns unrecoverable. See the Active Defect Audit panel on this page for the full audit record.
Data / persistence requirement
review_snapshots + per-agent rich-answer rows in Supabase. Cross-device sync is a separate pending defect (currently localStorage-only).
Visual acceptance required
Screenshot of restored answers on the live Railway URL after sign-out → sign-in, with the diagnostic panel showing 0 unrecoverable.

Next action: Founder enables auth flag on Railway and signs in. Claude captures the live screenshot proof. Until then this row stays partially_operational — the fix is real but unproven on the live URL.

Reports

Partially operationalOwner: Founder
Target end-state
Buyer-grade report: exec summary, score ring, findings linked to evidence + Learn + improvement action, methodology block, print-ready.
Operational proof required
R14 added the ReportVersionMetadataBanner component for explicit methodology / scoring / intelligence-pack / report-version display + reassessment-recommended banner. Sample report path proven. Live report path waits on auth + persistence.
Data / persistence requirement
reports + review_snapshots tables.
Visual acceptance required
Founder approves the polish on both sample and live reports. Old reports show their version metadata banner; reassessment-recommended banner appears without mutating the original report.

Next action: Founder reviews /trial/report polish; confirms standard for live reports.

AI Radar

Partially operationalOwner: Founder
Target end-state
Controlled source-watchlist monitoring with change detection, human review, and versioned packs. No fake live claim.
Operational proof required
R13-A shipped the engine + registry + state machine. First real check run still pending.
Data / persistence requirement
radar tables from migration 0007.
Visual acceptance required
Banner on /learn/ai-landscape-radar flips from 'configured — no run yet' to 'active' once founder runs first check.

Next action: Founder enables admin tools flag and runs first radar check from /admin/intelligence-ops.

Microsoft connector

Blocked by configurationOwner: Founder
Target end-state
Buyer connects via Microsoft, AgentProof discovers environments + Copilot Studio agents, reads only metadata.
Operational proof required
Six MICROSOFT_* env vars + a real test tenant. Real OAuth round-trip.
Data / persistence requirement
microsoft_connections table with workspace scoping.
Visual acceptance required
Founder runs the Microsoft connect flow against a real tenant.

Next action: Founder configures the six MICROSOFT_* env vars or defers to post-pilot.

Provider-agnostic connector model

OperationalOwner: Claude
Target end-state
Connector-agnostic architecture lets us add OpenAI / Anthropic / Google / AWS / Salesforce / ServiceNow without rewriting the assessment.
Operational proof required
lib/agentproof/provider/provider_agnostic_positioning_v1.ts declares SUPPORTED_PROVIDERS. Each provider has a registry entry.
Data / persistence requirement
None at the architecture level.
Visual acceptance required
Landing page chip says 'Microsoft is one connector path'.

Next action: None — shipped at S34AC-R1 Defect 5.

System health

OperationalOwner: Claude
Target end-state
Founder can open one page and see honestly what works, what is configured, what is missing.
Operational proof required
R12 active probes + R13 product-completion 12-item dashboard + R14 persistence proof panel with 14 founder-named checks and explicit 'Requires founder live login test' badges.
Data / persistence requirement
Read-only agentproof_healthcheck row (migration 0006).
Visual acceptance required
Founder opens /system-health → sees R12 health groups + R13 completion dashboard + R14 persistence proof panel.

Next action: None.

Report export / print

Partially operationalOwner: Founder
Target end-state
Buyer can print or export a report to a clean, branded PDF with cover, dividers, no browser chrome.
Operational proof required
S33L/S33M shipped the print CSS, cover page, dividers, editorial blocks. Founder still needs to confirm on live data.
Data / persistence requirement
Nothing extra — uses the live report.
Visual acceptance required
Founder prints a real report from /workspace/reports/[id] and confirms layout.

Next action: Founder confirms print preview once auth is on.

Commercial readiness

Partially operationalOwner: Founder
Target end-state
AgentProof is ready for a real private pilot with a real buyer. No payments, no public registration in this cycle.
Operational proof required
Auth + persistence + review preservation + manual agent persistence all proven on the live URL.
Data / persistence requirement
All migrations applied + auth on + first pilot data created.
Visual acceptance required
Founder walks the journey end-to-end as a buyer and signs off.

Next action: Work the punch list on /system-health top to bottom; then run a single full pilot walk-through.

Private pilot readiness matrix (R18)

Where the product stands, honestly

Fourteen areas the founder asked about. Each row carries one of six honest statuses, the proof surface, and the single named next action. Nothing is hidden behind "future work."

6 operational·5 partial·1 founder preview·2 blocked by config·0 planned·0 deferred·14 rows
  • Public site

    Operational

    What buyer sees: Home, Learn, Demo, AI Radar, Founder blueprint, Sign-in.

    Proof surface: / + /learn + /demo + /learn/ai-landscape-radar + /login

    Next action: None — shipped S34AA.

    Wired by S34AA

  • Navigation

    Operational

    What buyer sees: Sticky AppHeader + workspace shell. No duplicate chrome. No orphan pages.

    Proof surface: Header mounted at app/layout.tsx

    Next action: None — shipped S34AC-R6 / R11.

    Wired by S34AC-R6 + R11

  • Login (Supabase magic link)

    Blocked by configuration

    What buyer sees: Magic-link form sends a real Supabase auth email when enabled.

    Proof surface: /login + Supabase auth callback

    Next action: Founder sets AGENTPROOF_SUPABASE_AUTH_ENABLED=true on Railway.

    Wired by S34C + R8 + R9

  • Supabase persistence

    Partially operational

    What buyer sees: Manual agents, review answers, report history all read from Supabase first when signed in.

    Proof surface: /system-health → SourceOfTruthCompliancePanel + LiveSupabaseProbePanel

    Next action: Founder enables auth on Railway, then walks the 13-step journey.

    Wired by R14 + R15 + R16 + R17

  • Manual non-Microsoft agent

    Partially operational

    What buyer sees: Create agent → save → refresh → sign-out / sign-in → still there.

    Proof surface: /workspace/manual-agent/new

    Next action: Founder runs the live walk-through after sign-in.

    Wired by R7 + S34I + R17

  • Demo

    Operational

    What buyer sees: Sample agents + capability zone + sample assessment + sample report. Reachable from Home, Learn, Trial.

    Proof surface: /demo (default-on)

    Next action: If demo shows OFF state, remove AGENTPROOF_DEMO_MODE_ENABLED=false from Railway.

    Wired by R13 + R14 diagnostic + R18 sample flow

  • Trial (signed-in guided)

    Founder preview

    What buyer sees: Public preview clearly labelled; signed-in guided trial creates a sample workspace + agent + assessment + report.

    Proof surface: /trial + /trial/workspace + /trial/assessment + /trial/report

    Next action: Founder walks /trial → /trial/workspace → /trial/report; reports any path that breaks.

    Wired by S34O + S34P + R18

  • Review answer preservation

    Partially operational

    What buyer sees: Every captured answer is visible after sign-out / sign-in. Never blank.

    Proof surface: Review wizard + report → snapshot recovery + legacy fallback cascade

    Next action: Founder enables auth, signs in, opens a reviewed agent.

    Wired by R13 + R13-A + R13-LEGACY (S34AH) + R17

  • Reports (buyer-grade)

    Operational

    What buyer sees: Seven-section professional layout, methodology + version banner, honest disclaimer, print/export action.

    Proof surface: ReportProfessionalLayout mounted on trial report + workspace reports; ReportExportPrintAction visible.

    Next action: Founder reviews the new layout on /trial/report after R18 ships.

    Wired by S33L + S33M + R14 banner + R17-EXPANDED + R18

  • AI Radar MVP

    Partially operational

    What buyer sees: Controlled source watchlist with check engine, items with status, admin review queue. Public page reports honest operational state.

    Proof surface: /learn/ai-landscape-radar + /admin/intelligence-ops + radar API routes

    Next action: Founder enables admin tools flag and runs first radar check.

    Wired by R13-A + R18 system-health surfacing

  • Microsoft connector

    Blocked by configuration

    What buyer sees: Connect Microsoft, discover environments, discover Copilot Studio agents, read metadata only.

    Proof surface: /api/connectors/microsoft/* + /connect/microsoft (real tenant required)

    Next action: Founder configures the six MICROSOFT_* env vars or defers to post-pilot.

    Wired by 1G initial + S34B

  • Provider-agnostic connector model

    Operational

    What buyer sees: Architecture supports adding more connectors without rewriting the assessment.

    Proof surface: lib/agentproof/provider/provider_agnostic_positioning_v1.ts

    Next action: None — shipped S34AC-R1 Defect 5.

    Wired by S34AC-R1

  • System health (acceptance dashboard)

    Operational

    What buyer sees: Active probes + product completion + R14/R15/R16/R17 panels + R17-EXPANDED 13-step journey + R18 Live Acceptance Runner.

    Proof surface: /system-health

    Next action: None.

    Wired by R12 + R14 + R15 + R16 + R17 + R18

  • Commercial / private pilot readiness

    Partially operational

    What buyer sees: Buyer-grade product: real persistence, real reports, honest demo, real AI Radar foundations, founder-runnable acceptance.

    Proof surface: Founder walks the 13-step journey on the live URL; smoke evidence pack reports public-route green.

    Next action: Founder runs the live walk-through; uses the R18 evidence pack to capture proof.

    Wired by R12..R18 cascade

Product completion matrix v2 (R20)

End-product framework — what each area looks like complete

Each area carries a target end-state, current honest status, operational proof surface, remaining gap, named next action, and owner.

7 operational·5 partial·1 founder preview·3 blocked by config·0 planned·0 deferred·16 rows
  • Public site

    Operational

    Target end-state: Home, Learn, Demo, AI Radar, Founder blueprint, Sign-in — all reachable from sticky AppHeader and four-column footer. No orphan pages.

    Operational proof: Smoke script asserts every public route renders 2xx + contains expected substrings (no dead-page text, no localhost, no SERVICE_ROLE).

    Remaining gap: None.

    Next action: None — shipped S34AA + S34AC-R6.

    Owner: claude · Wired by S34AA + S34AC-R6 + R11

  • Navigation

    Operational

    Target end-state: AppHeader route-aware shell. PublicSiteHeader for public pages, WorkspaceSiteHeader for workspace, AdminSiteHeader for admin. No duplicate chrome.

    Operational proof: S34AC-R6 unified shell tests pass; R11 removed duplicate ProductShell chrome.

    Remaining gap: None.

    Next action: None.

    Owner: claude · Wired by S34AC-R6 + R11

  • Login (Supabase magic link)

    Blocked by configuration

    Target end-state: Buyer enters email, opens magic link, lands signed-in on /workspace. R8 cookie-options patch ensures the session cookie is set under the production domain.

    Operational proof: Magic-link form renders. SSR Supabase client wired. R8 cookie patch applied.

    Remaining gap: AGENTPROOF_SUPABASE_AUTH_ENABLED=true must be set on Railway. Until then login is preview-only.

    Next action: Founder sets AGENTPROOF_SUPABASE_AUTH_ENABLED=true on Railway.

    Owner: founder · Wired by S34C + R8 + R9

  • Supabase persistence

    Partially operational

    Target end-state: Every workspace write (agents, reviews, reports, improvement, radar) lands in Supabase EU and survives a sign-out / sign-in.

    Operational proof: R14 declarative probes + R15 live anon SELECT probe per table + R16 per-surface compliance registry (0 signed-in localStorage-first violations) + R17 Supabase-first hydration into the 4 R16 defect surfaces. R20 adds explicit write/read proof manifest on /system-health.

    Remaining gap: End-to-end persistence proof on the live URL still requires the founder signed-in walk-through to flip the operational verdict from 'configured — untested' to 'operational'.

    Next action: Founder enables auth flag, signs in, walks the R18 Live Acceptance Runner on /system-health.

    Owner: founder · Wired by R14 + R15 + R16 + R17 + R20

  • Manual non-Microsoft agent

    Partially operational

    Target end-state: Buyer adds a manual agent, signs out, signs in, sees it. Persisted in Supabase, scoped to workspace.

    Operational proof: R7 wired form → server action → repository. R17 added Supabase-first hydration probe.

    Remaining gap: Live signed-in walk-through still needed.

    Next action: Founder runs the walk-through once auth is on; Claude captures the screenshot proof.

    Owner: founder · Wired by R7 + S34I + R17

  • Demo

    Operational

    Target end-state: Buyer runs the sample agent flow end-to-end with no login. Demo data clearly isolated. Reachable from Home, nav, footer, Learn, Trial.

    Operational proof: Demo default-on (R13). R14 added the explicit OFF-state diagnostic with the named env action. /system-health reports demo status.

    Remaining gap: If the founder sees the OFF state, the diagnostic explains the exact Railway action.

    Next action: If /demo shows the OFF state, remove AGENTPROOF_DEMO_MODE_ENABLED=false from Railway Variables.

    Owner: founder · Wired by R13 + R14 + R20 nav sweep

  • Trial (signed-in guided)

    Founder preview

    Target end-state: Buyer walks /trial → /trial/workspace → /trial/agent/new → /trial/assessment → /trial/improvement → /trial/report. Signed-in trial persists a sample workspace + agent.

    Operational proof: 9 trial routes exist with services. Sample data path renders. R18 added /trial/report/professional preview with the buyer-grade ReportProfessionalLayout.

    Remaining gap: Trial persistence is still sample-only — signed-in trials do not yet persist a per-user trial workspace.

    Next action: Founder reviews the trial path on the live URL; Claude promotes sample → persisted in a follow-on release.

    Owner: both · Wired by S34O + S34P + R18 + R20

  • Review answer preservation

    Partially operational

    Target end-state: Every captured answer visible after sign-out / sign-in. Never blank.

    Operational proof: R13 snapshot recovery + R13-A snapshot v1.1.0 + R13-LEGACY fallback cascade. Diagnostic panel shows 0 unrecoverable when running on the live URL.

    Remaining gap: Cross-device sync is a separate pending defect (currently localStorage-only when offline).

    Next action: Founder enables auth, signs in, opens a reviewed agent — Claude captures the screenshot proof.

    Owner: founder · Wired by R13 + R13-A + R13-LEGACY (S34AH) + R17

  • Reports (buyer-grade)

    Operational

    Target end-state: Seven-section professional layout, methodology + version banner, honest disclaimer, print/export action.

    Operational proof: ReportProfessionalLayout component live + ReportExportPrintAction visible + methodology/version block. Mounted on /trial/report and /trial/report/professional. Print CSS shipped S33L/S33M.

    Remaining gap: Workspace reports still use existing widget set with unified score ring (consistent visual). The full ReportProfessionalLayout wrapper is mounted on the sample trial surface — workspace surfaces can be migrated after first live walk-through.

    Next action: Founder reviews /trial/report/professional layout; approves migration of workspace surfaces in a follow-on.

    Owner: founder · Wired by S33L + S33M + R14 banner + R17-EXPANDED + R18

  • Improvement cycle

    Operational

    Target end-state: Per-finding improvement guidance card with why-it-matters, what-good-looks-like, evidence-to-collect, reassess-after, related-Learn.

    Operational proof: ImprovementGuidancePanel renders sample improvement cards. /trial/improvement route exists. Improvement service wired.

    Remaining gap: Persisted improvement-cycle state for real workspace agents needs the live walk-through.

    Next action: Same as the persistence walk-through above.

    Owner: founder · Wired by S34F + S34O + R18

  • AI Radar

    Partially operational

    Target end-state: Controlled source-watchlist monitoring with change detection, human review, versioned packs. No fake live claim.

    Operational proof: R13-A shipped engine + registry + state machine + admin review queue + operational status panel. R18 added /system-health surfacing.

    Remaining gap: First real check run still pending.

    Next action: Founder enables AGENTPROOF_ADMIN_TOOLS_ENABLED=true and runs first radar check from /admin/intelligence-ops.

    Owner: founder · Wired by R13-A + R18

  • Microsoft connector

    Blocked by configuration

    Target end-state: Buyer connects via Microsoft, AgentProof discovers environments + Copilot Studio agents, reads only metadata.

    Operational proof: Six MICROSOFT_* env vars + OAuth callback + Power Platform / Dataverse client wired.

    Remaining gap: Founder must configure the six env vars OR defer to post-pilot.

    Next action: Founder configures MICROSOFT_TENANT_ID, MICROSOFT_CLIENT_ID, MICROSOFT_CLIENT_SECRET, MICROSOFT_REDIRECT_URI, MICROSOFT_POWER_PLATFORM_SCOPE, MICROSOFT_SESSION_SECRET — or defers to post-pilot.

    Owner: founder · Wired by 1G initial + S34B

  • Provider-agnostic connector model

    Operational

    Target end-state: Architecture supports adding OpenAI / Anthropic / Google / AWS / Salesforce / ServiceNow connectors without rewriting the assessment.

    Operational proof: lib/agentproof/provider/provider_agnostic_positioning_v1.ts declares SUPPORTED_PROVIDERS. Landing-page chip says 'Microsoft is one connector path'.

    Remaining gap: None at the architecture level.

    Next action: None — shipped S34AC-R1 Defect 5.

    Owner: claude · Wired by S34AC-R1

  • System health (acceptance dashboard)

    Operational

    Target end-state: Founder opens one page and sees the live operational state of every product area.

    Operational proof: R12 active probes + R13 completion + R14 persistence proof + R15 live probe + R16 compliance + R17-EXPANDED journey + R18 live runner + R18 private-pilot matrix + R19 Basic Auth status + R20 Supabase persistence operational proof + R20 core-journey panel.

    Remaining gap: None.

    Next action: None.

    Owner: claude · Wired by R12 + R14..R20

  • Admin / founder operations

    Blocked by configuration

    Target end-state: Founder can open the intelligence-ops queue, run the radar check, manage invites, and read the live admin readiness without exposing buyer surfaces.

    Operational proof: /admin/intelligence-ops + admin pages exist. AdminSiteHeader gates the chrome. R19 Basic Auth optional on /admin/* when explicitly enabled.

    Remaining gap: AGENTPROOF_ADMIN_TOOLS_ENABLED must be set on Railway for the founder to access the admin tools.

    Next action: Founder enables AGENTPROOF_ADMIN_TOOLS_ENABLED=true on Railway.

    Owner: founder · Wired by S34G + R13-A + R19

  • Commercial / private pilot readiness

    Partially operational

    Target end-state: AgentProof is ready for a controlled private pilot with a real buyer — real persistence, real reports, honest demo, real AI Radar foundations, founder-runnable acceptance, no payments / public registration in this cycle.

    Operational proof: R18 Private Pilot Readiness Matrix + R20 Pilot Readiness Pack at /founder/pilot-readiness with go / stop criteria + recommended pilot script.

    Remaining gap: Auth flag + live walk-through + first AI Radar check + report acceptance run.

    Next action: Work the punch list on /founder/pilot-readiness top to bottom; then run a single full pilot walk-through.

    Owner: founder · Wired by R12..R20

Private pilot readiness pack

The full readiness scorecard, security/trust posture, and the recommended pilot script live at /admin/pilot-readiness.

Open /admin/pilot-readiness →

Build policy

  1. Choose one journey or module from the matrix above.
  2. Build it until operational.
  3. Prove it with an active probe in /system-health.
  4. Prove it with a live walk-through on the live URL.
  5. Only then mark it complete in this matrix.
  6. Do not abandon half-built modules unless the admin explicitly defers them.