<!-- Pricing copy mirrored from apps/web/app/pricing/page.tsx (PAYG_PRICES_CENTS + PLANS in packages/shared/src/pricing.ts). Update there first; this doc is a mirror. -->
# Testorax

> Autonomous AI browser testing: give it a URL, get back a bug report with screenshots — no test code to write.

## One-flow audit (the simplest agent path)

Agents who want one entry point for "audit this app":

```
testorax capabilities --json                                            # what can/can't Testorax do
testorax audit preview https://your-app.example.com --json > preview.json   # one-flow plan + signed run token
testorax audit run --from-preview preview.json --json --no-open         # Batch 2 — orchestrated run
testorax proof <runId> --json                                           # fetch proof once the run completes
```

`audit run` accepts the HMAC-signed `previewRunToken` from `audit preview`. It re-validates URL safety, auth profile ownership, and origin at run time, then starts `fast_bug_scan` or `authenticated_smoke` via the existing run-start path. Underlying run-credit billing preserved. Returns `status: started | blocked | expired | unsupported | tampered`. Token TTL 30 minutes.

Batch 3 — deep audit: pass `--workflow-config <id>`, `--crud-config <path|json>`, or `--campaign-config prv_<22>` + `--qa-prefix qa_ --allow-create --cleanup` to surface a `deepAudit` block on the preview describing which deep modes can run. `full_crud_e2e` dispatch requires both `safeMutationPolicy` AND `cleanupPolicy` at preview time AND re-supplied inline `crudConfig` on `audit run` (never stored in the token). `hard_delete` requires `destructiveAllowed=true`. `campaign_execution` and `workflow_test` (which needs caller-supplied template inputs) still refuse with structured `unsupported` envelopes carrying the right next command — Testorax never auto-confirms a campaign and never fabricates workflow inputs.

Launch Readiness Batch 3 — proof-pack / report fetch sources. `POST /api/launch/readiness` + CLI `testorax launch readiness --fetch-proof-details` + MCP `launch_readiness` schema gains `fetchProofDetails`, `proofSources`, `maxSourceBytes`. When agents pass `runIds`, Testorax fetches `/compact-proof` by default; with `fetchProofDetails=true` it also fetches `/proof-pack.json` + `/report.json` summaries. Raw proof bodies NEVER embedded — only summary fields (contractVersion, runStatus, proofStrength, trustScore, cleanupTelemetryStatus, falsePositiveClassification, proofScopes[]). Per-source statuses extended with `too_large` and `skipped`. Per-source size cap default 32 KB, ceiling 64 KB. Merge logic: proof_packet `cleanupTelemetry.status=failed/blocked/partial` → NO_GO; `unverified` → INSUFFICIENT_PROOF; `completed+verified` satisfies cleanup proof. Target mismatch from any source surfaces in deploymentAwareness.warnings. Malformed/missing/oversized sources never crash readiness.

Proof Packet Cleanup Telemetry Batch 1 — Stable top-level `cleanupTelemetry` field on every proof packet (`GET /api/runs/:id/proof-pack.json`, `PROOF_PACKET_CONTRACT_VERSION=1.10.0`). Closed-set status (`not_required | not_started | completed | partial | failed | unverified | blocked | unknown`), verificationStatus (`verified | unverified | not_required | unknown`), cleanupVerified (`yes | no | unknown`). Pure projection from operational_e2e cleanupVerified + audit_run cleanupResult + audit_preview cleanupPlan — never invents proof, never carries entity bodies. Cleanup plan is intent; cleanup telemetry is proof. Launch readiness consumes telemetry: `completed + verified` satisfies cleanup proof; `failed`/`blocked`/`partial` are launch blockers; `unverified`/`not_started` are INSUFFICIENT_PROOF. Helper exported as `@testorax/shared/cleanupTelemetry::buildCleanupTelemetry`.

False-Positive Classifier (Hardening Batch 1) — `POST /api/intelligence/classify` + CLI `testorax classify` + MCP `classify_evidence`. Pure decision logic over normalized evidence. Closed-set 15 classifications: real_bug | likely_false_positive | inconclusive | expected_behavior | test_design_issue | testorax_capability_gap | needs_chrome_confirmation | needs_auth_profile | stale_deploy_suspected | external_navigation_skipped | protocol_handler_skipped | auth_gate_expected | selector_drift_suspected | runner_environment_issue | weak_evidence_only. NEVER reads cookies/passwords/headers/storageState (forbidden fields return 400). Conservative — returns inconclusive when evidence is thin. Same-origin 5xx with assertion → real_bug. Selector missing in runner but present in Chrome → selector_drift_suspected. Login wall without auth profile → needs_auth_profile (not a bug). Empty console errors stay inconclusive.

Launch-Readiness Verdict Engine Batch 1 + Batch 2 — `POST /api/launch/readiness` + CLI `testorax launch readiness` + MCP `launch_readiness`. Pure decision logic over caller-supplied evidence. Closed-set verdict: GO | NO_GO | GO_WITH_ACCEPTED_RISKS | GO_AFTER_DEPLOY | GO_AFTER_PRODUCT_DECISION | GO_AFTER_EXTERNAL_INTEGRATIONS | INSUFFICIENT_PROOF. NEVER claims GO without proof. NEVER claims authenticated coverage from public-only evidence. INSUFFICIENT_PROOF is a real blocker, not a pass. Batch 2 adds fetch-by-runId (cap 10) — pass `runIds + fetchProof=true` and Testorax fetches compact proof in-process — plus a `fetchedEvidence` block (`proofSources`, `fetchedRunIds`, `failedRunIds`, stale detection via `maxEvidenceAgeHours`) and a `deploymentAwareness` block (`status`: `unknown | same_target | target_mismatch | stale_proof | production_unconfirmed | caller_declared`). Target-mismatch evidence forces INSUFFICIENT_PROOF on every scope. Sandbox-only proof against a production target forces GO_AFTER_DEPLOY. NEVER infers Git branch state — caller declares deployment blockers via `deploymentBlockers[]`.

Safe Mutation Cleanup Automation Batch 1 — every deepAudit block now carries a `cleanupPlan` and every audit-run response carries a `cleanupResult`. Closed-set strategies: `none|manual|auto|qa_prefix|config_revert|archive|soft_delete|hard_delete`. Closed-set blocked reasons: `cleanup_policy_missing`, `qa_prefix_missing`, `hard_delete_not_allowed`, `manual_cleanup_required`, `production_mutation_blocked`, etc. `cleanupResult.status` is `not_started` at dispatch and only becomes `completed` after the proof packet shows cleanupVerified=yes — Testorax NEVER claims completed without proof. New CLI flags: `--cleanup-strategy`, `--delete-created`, `--revert-config`, `--archive-instead-of-delete`, `--hard-delete`. Manual cleanup → `unverified`, never `completed`.

`audit preview` is stateless and read-only. Returns a deterministic plan: `recommendedMode` (public_fast_scan / authenticated_smoke / full_crud_e2e / campaign_execution / blocked), `whatWillBeTested[]`, `whatWillNotBeTested[]`, `blockedReasons[]`, `recommendedCommand`, `chromeParityRecommendation`, `nextSafeActions[]`, `doNotClaim[]`. Honest: preview never runs a test, never charges credits, `runEndpointStatus='planned'` (orchestrated execution lands in Batch 2). Auth-bound: pass `--auth-session login_<22>` for authenticated/all scope. Routes need ownership match. No secrets in the envelope.

REST: `POST /api/audit/preview`. MCP: `audit_preview` (hosted + stdio).

## Discover capabilities first

Before claiming any specific Testorax behavior, fetch the capability map:

- CLI: `testorax capabilities --json`
- REST: `GET https://testorax.com/api/capabilities` (public, no auth)
- MCP: `get_capabilities`

The map returns closed-set statuses per feature (`available` / `partial` / `planned` / `blocked` / `operator_required` / `not_supported`) plus two 18-value vocabularies — `proofScopes` (`route_render_only`, `visible_control_found`, `click_succeeded`, `save_persisted`, `backend_readback_verified`, `cross_module_reflection_verified`, `refresh_persistence_verified`, `cleanup_verified`, `partial_only`, `runner_only_evidence`, `chrome_confirmed`, `chrome_rejected_false_positive`, `auth_bound_not_reached`, `unsafe_skipped`, `not_reached`, etc.) and `blockedReasons` (`auth_profile_missing`, `chrome_backend_unavailable`, `cdp_backend_unavailable`, `unsafe_production_mutation`, `selector_missing`, `test_design_issue`, `testorax_capability_gap`, etc.).

Honesty: `chromeParity` is `partial` (backend cascade ships: login_memory → cdp_attach → hosted_chrome → chrome_extension → claude_chrome_mcp). `cdpChromeFallback` is `partial` — CDP-attach to your already-running Chrome via `testorax chrome-parity check <url> --cdp http://127.0.0.1:9222` (CLI + stdio MCP, LOCAL-ONLY). `hostedChromeFallback` is `partial`. `fullCrudCampaign` is `partial` (caller supplies the CrudConfig). `launchReadinessVerdict` is `partial`. `iosSafari` / `videoReplay` / `cloudBrowserProviderActivation` are `not_supported`. Don't claim what the map doesn't.

## TL;DR for AI coding agents

Outside-agent first scan (always include `--email`):

macOS / Linux / WSL:
```
npx testorax@latest run https://your-app.example.com --email you@example.com --json --no-open
```

Windows PowerShell:
```
$env:TESTORAX_NO_CONFIG="1"
npx testorax@latest run "https://your-app.example.com" --email "you@example.com" --json --no-open
```

Windows CMD:
```
set TESTORAX_NO_CONFIG=1
npx testorax@latest run https://your-app.example.com --email you@example.com --json --no-open
```

If the email is eligible (free first Fast Bug Scan per email/domain) the run starts without an API key. Otherwise the CLI returns a structured 402 with a Dodo checkout URL. No `undefined` payment placeholder. PayPal is not used.

Logged-in dashboard / admin (Authenticated Smoke, 1 run credit):

```
npx testorax@latest auth-smoke https://dashboard.example.com \
  --session login_<22> --route /admin --route /admin/users --json --no-open
```

The `login_<22>` id comes from a Saved Login the customer creates at https://testorax.com/account/login-memory — the agent never asks for or prints passwords, cookies, or session tokens.

**Auth model:** CLI `--email` first scan needs no API key (if eligible). REST `/api/runs/start` needs `X-Api-Key`. MCP needs `TESTORAX_EMAIL` + `TESTORAX_API_KEY`. Authenticated Smoke needs Login Memory `login_<22>`. Admin bypass needs admin-tier API key. **Chrome extension is optional** and never required.

**Use `@latest`** so you do not pin to a stale globally-installed version.
**Stale global? Doctor catches it.** `npx testorax@latest doctor --json` emits a `cli_shadow` check + structured fields (`globalInstallDetected`, `globalVersion`, `globalPath`, `staleGlobalWarning`, `recommendedFix`). If `staleGlobalWarning:true`, run `npm uninstall -g testorax` then `npx testorax@latest --version` to verify clean resolution.
A previously-installed global `testorax` shadows `npx testorax` on PATH;
`npx testorax@latest` always fetches the published version. Check drift via
`npx testorax@latest doctor --json` and read the `cli_version` detail line.

`npx` downloads the CLI on demand. No `npm install`. Works in any sandboxed
shell with network access.

**No terminal?** Read these public-by-runId surfaces:
```
https://testorax.com/sample-report
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/report.json
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/proof-pack.json
https://testorax.com/api/runs/wUi-37fuagm4KcrkrU3h3/compact-proof
```
report.json / proof-pack.json / compact-proof all share the public-by-runId model. The runId IS the secret. No API key needed.

If `npx` is unavailable, plain HTTP works identically:

```
curl -X POST https://testorax.com/api/runs/start \
  -H "X-Api-Key: <KEY>" -H "Content-Type: application/json" \
  -d '{"url":"https://your-app.example.com","mode":"fast_bug_scan"}'
```

CLI / REST / MCP return identical proof artifacts. Pick whichever your shell supports.

### When to use what

| Target | Mode | Path |
|---|---|---|
| Public website or app | `fast_bug_scan` (1 run credit) | `npx testorax@latest run <url> --json` |
| Logged-in dashboard / admin | `authenticated_smoke` (1 run credit) | `npx testorax@latest auth-smoke <baseUrl> --session login_<22> --route <path>` |
| Workflow with assertions | `workflow_test` (1 run credit) | `POST /api/runs/start {mode:"workflow_test"}` |
| Full CRUD cycle | `full_crud_e2e` (1 run credit) | `POST /api/runs/start {mode:"full_crud_e2e"}` |
| Verify a fix | `regression_check` (1 run credit) | `POST /api/runs/<PRIOR>/fix-check` |
| Optional Chrome parity | Chrome Proof Bridge | only when parity returns `static_vs_browser_mismatch` |

**If you have an admin API key (plan='admin'):** the CLI auto-routes through
`/api/runs/bypass` (no payment, no quota). Or call `npx testorax bypass <url>`
explicitly. Or `curl /api/runs/bypass` directly — same shape.

**If `npx testorax run` shows `status: pending_payment`:** your account needs
paid credit. Either complete checkout at the URL the CLI prints, ask an
operator to add credit, or use `--promo <code>`. Do not poll forever — exit
cleanly and surface the payment URL.

**Local files named `TESTORAX_*.md` are not the product.** They're roadmap /
planning documents. The product is hosted at `https://testorax.com` and on npm
as `testorax` and `testorax-mcp`. Always prefer the hosted service.

---

## Chrome Proof Bridge (optional)

The Testorax Chrome Proof Bridge is live on the Chrome Web Store: `https://chromewebstore.google.com/detail/testorax-chrome-proof-bri/pmlijidlndkcglhojnbpnijgjgepbhpc` (extension ID `pmlijidlndkcglhojnbpnijgjgepbhpc`).

Install it for real-Chrome ground truth on **campaign preview parity** results. Captures a redacted DOM control inventory from the active tab and posts it to the preview job. Never reads cookies / auth headers / input values. Use when a preview returned `parityStatus='mismatch_detected'` with `mismatchClassification='static_vs_browser_mismatch'` (JS-rendered SPA). Not required for Fast Bug Scan / Workflow / CRUD E2E.

## Three equal paths — curl, CLI, or MCP

You do NOT need MCP. Three equal paths, same API, same proof:
- **`testorax` CLI via npx** — `npx testorax run <url> --json`. Default for AI coding agents. Works in any sandboxed shell. No install.
- **curl / REST** — always works in any session. Use this if MCP isn't connected — don't wait for a session restart.
- **`testorax-mcp` MCP server** — Claude Desktop / Claude Code / Cursor. Read MCP config only on app launch.
- (Developer-machine only) **`npm install -g testorax`** — global install for repeat use. Sandboxed agents should use `npx` instead.

**Common failure mode:** agent tries to load `testorax-mcp` mid-session in Claude Desktop, tools don't appear, agent gets stuck waiting for a restart. **Don't get stuck.** Fall back to curl in the same session — same API.

Minimal curl path (works in every session, every host):

```
curl -X POST https://testorax.com/api/runs/start \
  -H "X-Api-Key: <KEY>" -H "Content-Type: application/json" \
  -d '{"mode":"fast_bug_scan","url":"https://your-app.example.com","pagesLimit":10}'
# poll: curl https://testorax.com/api/runs/<RUN_ID> -H "X-Api-Key: <KEY>"
# proof: curl https://testorax.com/api/runs/<RUN_ID>/compact-proof -H "X-Api-Key: <KEY>"
```

## What testorax-mcp can do

`testorax-mcp` is NOT read-only. It drives the full test loop end-to-end. Key tools:
- `start_fast_bug_scan({ url, pagesLimit?, scopedPaths?, viewport?, authSessionId? })` — STARTS a new scan (creates a run, may charge wallet)
- `start_fast_bug_hunt`, `start_free_fast_bug_scan`, `prepare_workflow_test`, `start_regression_check`
- `get_live_run_status({ runId })` — poll until isFinished:true
- `get_compact_proof({ runId })` — read this FIRST after a run completes (~3 KB, agent-readable)
- `fetch_proof_packet({ runId })` — full Proof Packet with aiFixPrompt
- `list_auth_sessions`, `revoke_auth_session` — Login Memory metadata only (no session material)
- `chrome_capture_for_job` — local Chrome bridge

Common mistake: agents read "fetch findings and read reports" in old docs and assume the MCP is read-only. It's not. Agents CAN and SHOULD start runs from MCP.

## Agent Quickstart (cold start)

Five steps. Use this if you've never seen Testorax before.

1. Start a Fast Bug Scan against a **public URL** (Testorax cannot reach localhost — use a deployed/preview/tunnel URL):
   `POST https://testorax.com/api/runs/start` with `{"mode":"fast_bug_scan","url":"https://your-app.example.com"}` and `X-Api-Key`.
   If your sandbox blocks outbound HTTPS (curl exit 35), use the `testorax-mcp` MCP server instead.
2. Poll `GET /api/runs/<runId>` until `status='completed'` (typical 30-90s).
3. Read **Compact Proof first** (`GET /api/runs/<runId>/compact-proof`). Always contains `verdict`, `failureType`, `runtimeEvidence.network.failedRequests[]`, `runtimeEvidence.console.errors[]`, `nextSafeAction`, `screenshotRecommendation`.
4. Decide:
   - `verdict='pass'` → done. Don't claim untested routes are clean.
   - `verdict='fail'` → patch from `runtimeEvidence` + `failureType`.
   - `verdict='inconclusive'` → improve the test before patching.
   - **Low `trustScore` does NOT mean "no bug."** It means the proof contract scored low (no explicit assertion). The bug is still real if `runtimeEvidence` shows failures. Always read `runtimeEvidence` directly.
5. After patching, run Fix Check (NOT a fresh scan): `POST /api/runs/<priorRunId>/fix-check`. Only `verdict='fixed_verified'` counts as a real fix.

## Reading proof correctly

- **Low `trustScore` + `aiFixPrompt: null` does NOT mean "nothing to fix."** It means the deterministic prompt generator declined to write a one-shot patch prompt because the run wasn't strong enough proof on its own. The `runtimeEvidence` (failed requests, console errors) is still real and actionable.
- **`redacted: true` is a guarantee, not a limitation.** Testorax never stores response bodies, cookie values, auth headers, or session tokens. You still get url + method + status + duration on HTTP failures — enough to act on.
- Always read `runtimeEvidence.network.failedRequests[]` and `runtimeEvidence.console.errors[]` directly before dismissing a run.

## When NOT to use Login Memory

Login Memory is for tests that need to be signed in. It is the **wrong** fix when:
- The failure is on a public page (landing, pricing, signup, login). If the page should be public and an auth-shaped call fails, fix the caller.
- The failure is a leftover endpoint from a previous auth provider (Clerk → Lucia migration, etc). Find the caller and decide whether the endpoint should exist.
- The endpoint should return `{authenticated:false}` 200 for unauthenticated visitors but is returning 401. That's a server contract bug.

Decision rule: if the route is supposed to be public, Login Memory is NEVER the answer.

## Discovery

Quick map for agents discovering Testorax:

- `https://testorax.com/agents.md` — long-form agent guide (markdown).
- `https://testorax.com/llms.txt` — this file.
- `https://testorax.com/sitemap.xml` — public URL inventory.
- `https://testorax.com/robots.txt` — crawler policy; safe agent JSON endpoints explicitly allowed.
- `https://testorax.com/api/agents/index.json` — machine-readable consolidated agent surface index (~3 KB; public, no auth).
- `https://testorax.com/api/docs/index.json` — 30-entry documentation index (per-slug fetch via `/api/docs/<slug>.json`).
- `https://testorax.com/docs` — human-readable docs hub.

Public agent JSON endpoints (no auth):
`/api/pricing`, `/api/templates`, `/api/scenario-templates`, `/api/browser-capabilities`, `/api/docs/index.json`, `/api/docs/<slug>.json`.

Authenticated agent endpoints (`X-Api-Key`): `/api/runs/start`, `/api/runs/bypass`, `/api/runs/:id/proof-pack.json` (also public-by-runId), `/api/runs/:id/fix-check`, `/api/runs/:id/fix-check/result`.

Agent integrations:
- MCP server: `npm install -g testorax-mcp` (stdio transport; set `TESTORAX_EMAIL` + `TESTORAX_API_KEY`).
- CLI: `npm install -g testorax`.
- REST API: see `/api-docs`.

What Testorax publishes for discovery:
- OpenAPI 3.1.0 at `/openapi.json` (minimal spec for core endpoints).
- MCP discovery at `/.well-known/mcp.json` (302 → `/.well-known/mcp/server-card.json`).
- Stable public demo at `/api/runs/free-public-demo` (JSON pointer to a frozen demo run — read this when your environment cannot run terminal commands).
- Hosted MCP at `POST /mcp` (streamable-http, JSON-RPC 2.0, requires `X-Api-Key`).

What Testorax does NOT publish:
- No ChatGPT plugin manifest at `/.well-known/ai-plugin.json`.
- No agent OAuth — authentication is API key only (`X-Api-Key` header for REST/hosted MCP; `TESTORAX_EMAIL` + `TESTORAX_API_KEY` env vars for stdio MCP).
- No PWA `/manifest.json`.

`/.well-known/security.txt` is shipped (RFC 9116 format).

## About Testorax

Testorax is a cloud testing service. You give it the URL of a deployed web app; it crawls that app, generates test scenarios using AI, runs them in real Chromium browsers on a cloud server, and delivers a structured bug report with per-step screenshots and error messages. There is no SDK to install, no test files to write, and no CI configuration required.

The service is conversational. If it cannot reach a page because it needs login credentials, encounters an ambiguous product flow, or generates zero useful scenarios, it pauses the run and posts a message back to the user. Once the user replies (through the web UI, CLI, or MCP), the run resumes from where it stopped. This makes it safe to run on apps with auth-gated flows without baking credentials into config files.

Testorax is designed to be called from an AI coding agent. The MCP server exposes 27 tools covering the full run lifecycle: create a run, track progress, read the report, inspect individual failing scenarios with step-level diagnostics, retry failures, cancel, and more. The CLI provides the same surface for terminal workflows. The REST API is available for direct integration.

## MCP Server (Claude Code, Cursor, Windsurf, Cline, any MCP-compatible agent)

Install:
```bash
npm install -g testorax-mcp
```

Add to your MCP config (Claude Code: `~/.claude/claude_desktop_config.json`, Cursor: MCP settings panel):
```json
{
  "mcpServers": {
    "testorax": {
      "command": "testorax-mcp",
      "env": {
        "TESTORAX_EMAIL": "your@email.com",
        "TESTORAX_API_KEY": "ttx_..."
      }
    }
  }
}
```

Get your API key at https://testorax.com/account/api-key

**Key MCP tools:**
- `create_run` / `create_bypass_run` — start a new test run
- `run_status` — check if a run is in progress, paused, or complete
- `get_report` — full bug report for a completed run
- `failure_context` — step-by-step diagnostic for a failing scenario (screenshots, selectors, network failures)
- `list_runs` — recent runs for your account
- `run_suite` — re-run a saved test suite by name
- `list_messages` / `send_message` — read/reply when Testorax pauses and asks a question
- `retry_failed` — re-execute only the scenarios that failed in a previous run
- `regressions` — compare two runs and surface newly-failing scenarios
- `get_dashboard` — KPIs and trend data for your account
- `run_app_group` — trigger runs on a group of URLs in parallel

## CLI

Install:
```bash
npm install -g testorax
```

Configure:
```bash
testorax login
```

Common commands:
```bash
testorax run https://acme-corp.com --tier full_test
testorax tail <run-id>
testorax report <run-id>
testorax suites run <suite-name>
testorax messages list <run-id>
testorax messages send <run-id> "username is admin@acme-corp.com, password is demo"
```

## REST API

Base URL: `https://testorax.com`

Authentication: pass `X-Api-Key: ttx_...` header on all API requests. Get your key from `/account/api-key`.

**Start a run:**
```
POST /api/runs/bypass
X-Api-Key: ttx_...
Content-Type: application/json

{
  "url": "https://acme-corp.com",
  "tier": "full_test",
  "email": "you@example.com"
}
```

Tiers: `quick_scan` | `standard` | `full_test` | `deep_dive`

**Poll status:**
```
GET /api/runs/:id
```

**Get report:**
```
GET /api/runs/:id/report
```

**Get step-level failure detail:**
```
GET /api/runs/:id/scenarios/:scenarioId/steps
```

**Re-run a saved suite:**
```
POST /api/account/suites/:slug/run
X-Api-Key: ttx_...
Content-Type: application/json

{ "email": "you@example.com" }
```

Full reference: https://testorax.com/api-docs

## Common prompts

Use these verbatim with any MCP-connected agent:

- "Run a quick scan on https://acme-corp.com and show me the bug report"
- "Test my deployed app at https://acme-corp.com using full_test tier"
- "Re-run my saved suite called checkout-flow"
- "Why did scenario [ID] fail in run [ID]? Give me the full step-by-step diagnostic"
- "Create a Testorax run for https://acme-corp.com with these custom scenarios: [paste scenario JSON]"
- "Review this PR — run Testorax on https://staging.acme-corp.com and report any regressions"
- "List my recent runs and show me which ones had failures"
- "Retry only the failing scenarios from run [ID]"

**Note:** if Testorax needs login credentials or gets stuck on a flow, it will pause the run and send you a message. Check `list_messages` and reply with `send_message`. The run resumes automatically once you reply.

## Agent Fix Loop

Testorax Proof Packets are authoritative for fixing. Coding agents must obey doNotClaim. Only Fix Check verdict=fixed_verified allows a "fixed" claim. Do not weaken assertions, do not delete the failing scenario, and do not modify production code when aiFixPrompt is null. CLI: `testorax proof <runId>` to read the packet, `testorax fix <runId>` to print the AI Fix Prompt.

**Workflow (5 steps):**
1. Start a Vibe Test (Fast Bug Scan, Workflow Test, or Deep CRUD E2E).
2. Fetch the Proof Packet: MCP `fetch_proof_packet(runId)`, CLI `testorax proof <runId>`, or `GET /api/runs/<runId>/proof-pack.json`.
3. Read whatHappened, whatIsProven, whatIsNotProven, doNotClaim. If `aiFixPrompt` is null, do NOT modify production code.
4. Apply minimal patch following the AI Fix Prompt; do NOT weaken assertions, change selectors, or delete the failing scenario.
5. Trigger Fix Check: `POST /api/runs/<runId>/fix-check` (1 run credit). Read `GET /api/runs/<verifyRunId>/fix-check/result`. Only verdict=`fixed_verified` is a real fix.

**Fix Check verdicts:** fixed_verified, still_failing, cannot_verify_no_lock, scope_shrunk, proof_disappeared, unable_to_rerun, inconclusive, pending. assertion_weakened is reserved.

**Endpoints:**
- GET /api/runs/:id/proof-pack.json — agent-readable Proof Packet (1 MB cap, ~12 KB compact)
- POST /api/runs/:id/fix-check — trigger same-test verification (1 credit; idempotent on in-flight)
- GET /api/runs/:id/fix-check/result — agent-facing verdict envelope

Full walkthrough at https://testorax.com/docs/agent-fix-loop

## Login Memory (authenticated apps)

Authenticated dashboards need a saved login. **Two creation paths** as of 2026-05-12:

1. **Customer-driven (dashboard):** customer signs in at /account/login-memory and uploads a Playwright storageState. Unchanged.
2. **Agent-driven (CLI / MCP / API):** the agent calls `safe-login create` / MCP `create_auth_profile` / `POST /api/runs/start` with `mode=auth_profile_create`. Testorax drives a real browser login on the cloud-side Hetzner runner against a sandbox/staging URL, captures the resulting session encrypted at rest, and returns the `authSessionId`. The password is consumed once and destroyed — never stored in D1, never echoed in any response, never logged.

The agent then references the saved login by id only.

**Agent-driven create flow:**

```bash
op read "op://Private/sandbox/password" | testorax safe-login create --password-stdin \
  --login-url https://app-sandbox.example.com/login \
  --label "Sandbox dashboard" \
  --username alice@example.com --json
# → { authSessionId: "login_xxxxxxxxxxxxxxxxxxxxxx", profileStatus: "pending_capture", ... }

testorax authenticated-smoke --auth-session login_xxxxxxxxxxxxxxxxxxxxxx \
  --routes /admin --routes /admin/users
testorax safe-login revoke login_xxxxxxxxxxxxxxxxxxxxxx
```

**Agent rules:**

- Do not pass passwords on the command line (`--password` rejected; only `--password-stdin`).
- Do not put passwords in env vars.
- Do not use `echo "secret" | ...` if shell history is enabled.
- Do not target production-looking hosts (only `*-sandbox.*`, `*-staging.*`, `*.local`, `*.dev`, `*.test`, `localhost`, `example.com`, `herokuapp.com`).
- Do not ask the user to paste cookies / headers / passwords / saved-session JSON into chat.
- Do not echo, print, or log any session value.
- Do not use `authSessionId` from someone else's account.

**MCP tools:** `list_auth_sessions`, `get_auth_session`, `revoke_auth_session`, `create_auth_profile`. No decrypt tool. No export tool. `create_auth_profile` returns metadata only — never the password, cookies, headers, storageState, ciphertext, or capture token.

**Reserved:** `credentials.passwordRef` (`fixture://…`) is reserved for a future operator-bound resolver. In Batch 1 the runner refuses with `passwordRef_resolver_not_implemented`.

**Scopes:** `auth_sessions:read` for list/get; `auth_sessions:write` for create/revoke/create_auth_profile. Neither in `AGENT_RECOMMENDED_SCOPES`. `proof:read` does NOT grant Login Memory access.

Full walkthrough at https://testorax.com/docs/login-memory

## Authenticated Smoke Test

Test mode for logged-in dashboards / admin panels / behind-auth surfaces. Loads a saved Login Memory profile, visits each route, captures per-route classification + screenshot + console errors + failed network requests. Read-only — never clicks, submits, or sends. 1 run credit per audit. Up to 25 routes.

**CLI:** `npx testorax auth-smoke <baseUrl> --session login_<22> --route /admin --route /admin/users --json --no-open`
**MCP:** `start_authenticated_smoke({authSessionId, routes[], baseUrl?})`
**REST:** `POST /api/runs/start {mode: "authenticated_smoke", authSessionId, routes[]}`

Per-route classification (closed set): `ok / blank_content / page_error / fetch_failed / auth_lost / timeout / navigation_error / console_error_only`.
Run-level verdict (closed set): `all_routes_ok / some_routes_failed / auth_lost_early / profile_invalid / no_routes_provided`.
Refusals: `invalid_authSessionId / no_routes / too_many_routes / invalid_route / auth_session_not_owned_or_not_found / payment_required`.

Agent rules: never ask the user to paste a password — direct them to `https://testorax.com/account/login-memory`. Treat `auth_lost_early` as "session expired, ask user to refresh profile" not "app is broken." Treat `console_error_only` as advisory unless evidence is conclusive.

Full walkthrough at https://testorax.com/docs/authenticated-smoke

## Watch Test Live (live observation)

Watch a Testorax run as it executes — current status, current step, recent proof events, latest screenshot. **Live observation, not video replay. Video replay is not enabled.**

**Endpoint:** `GET /api/runs/:id/live` — **authenticated only, no public-by-runId**. Accepts X-Admin-Key, dashboard session cookie, or X-Api-Key with `proof:read` scope. Non-admin scoped keys cross-tenant guarded. Missing/invalid creds → 401; stranger key on another customer's runId → 403. Stronger than the public Proof Packet because the envelope can include current URL and step of an in-progress run.

**MCP:** `get_live_run_status({ runId })` (hosted + stdio).

**CLI:** `testorax run-live <runId>` (single-shot) / `--watch --interval 3` / `--json`.

**Agent rules:**

- Do not claim "fixed" from a passing live status. Only Fix Check verdict `fixed_verified` counts.
- Do not claim coverage from event counts. Read `whatIsProven` in the Proof Packet.
- Do not patch production code from in-flight events. Wait for `isFinished: true`.
- `latestScreenshot.url` is auth-proxied. Never a raw R2 public link.

This surface NEVER exposes cookies, headers, tokens, passwords, saved-session contents, encryption material, or provider IDs.

Full walkthrough at https://testorax.com/docs/watch-test-live

## Vibe Browser Engine (capability matrix)

Testorax declares the deterministic list of browser engines + capabilities at `GET /api/browser-capabilities` (public, no auth).

**Default engine:** `chrome_managed` — Hetzner-hosted Chromium running Playwright. Used for every accepted Vibe Test today.

**Supported today:** screenshots, cookies + storage state, Login Memory injection, request timeline, console error capture, visual proof, mobile emulation, live view polling.

**NOT supported:**

- `ios_safari` — false (planned, not implemented)
- `extension_testing` — false
- `video_replay` — false
- `cloud_provider_activated` — false (no Browserbase / Browserless / Kernel)
- `cdp_attach` — false
- `file_upload` / `file_download` — false

**Engine selection on run start:** pass `engine` and/or `capabilitiesRequired` to `POST /api/runs/start`. Selector runs BEFORE billing. Unsupported request → HTTP 400 `browser_capability_unsupported` with `capabilitiesMissing` list. **No run created. No credit deducted. No silent downgrade.**

**Agent rules:** if a capability is `false` in the matrix, do not claim Testorax does it. MCP `get_browser_capabilities()`. CLI `testorax browser-capabilities`.

Full walkthrough at https://testorax.com/docs/vibe-browser-engine

## Authenticated scenarios (proof-flow hardening)

Run authenticated dashboard tests on sandbox/staging hosts. **Two paths:**

- **Path A** — log in inside the scenario via `navigate` → `fill` → `click` → `wait_for_response` → `clickByTestId`. Use the `authenticated-dashboard-flow` template.
- **Path B** — inject a sandbox session cookie via `setCookie` (requires `auth.allowSessionInjection: true` + sandbox allowlist).

**Refusal before billing:** unsafe `setCookie` / `clearSession` (production-looking domain, missing opt-in, value > 4 KB, etc.) → HTTP 400 `auth_scenario_action_unsafe` with structured code. **No run created. No credit deducted.**

**Stable selectors:** `clickByTestId` and `fillByTestId` are first-class actions. Report carries `selectorStrategy="data-testid"`, `testId`, `resolvedSelector`, found/visible/enabled, near-miss candidates.

**Proof bundle:** `GET /api/runs/:id/export.json` returns a single JSON envelope: metadata + proof.md + steps.json + console.json + network.json + screenshots[] (auth-proxied) + proof-packet.json. Same auth model as Proof Packet. 1 MB cap.

**Cookie value safety:** masked everywhere. Never echoed in any output.

Full walkthrough at https://testorax.com/docs/authenticated-scenarios

## Run History Intelligence (Proof Memory)

Deterministic read-only classifier over Testorax run history. **No LLM calls. No mutation of historical data.** Reads existing run rows + scenario steps + click_audit_findings to surface recurring issues, weak passes, false-pass risks, stale deploys, selector failures, proof-quality patterns.

**Endpoints:**

- `GET /api/runs/:id/intelligence` — single run.
- `GET /api/account/run-intelligence?domain=&days=` — caller history (default 30 days, max 90; cap 200 runs).

**Authenticated only** — `X-Admin-Key`, dashboard session cookie, or `X-Api-Key` with `proof:read` scope. Cross-tenant guarded by email match.

**Classifications:** product_bug, scenario_bug, selector_missing, assertion_missing, weak_pass, false_pass_risk, stale_deploy_suspected, auth_issue, network_issue, console_error_issue, missing_testid, build_marker_missing, proof_weakened, inconclusive_weak_proof, provider_bug, clean_with_sufficient_proof, clean_but_low_proof.

**Risk scores (0..100):** proofQuality, weakPassRisk, falsePassRisk, staleDeployRisk.

**MCP:** `get_run_intelligence({runId})` / `get_history_intelligence({domain?, days?})`. **CLI:** `testorax run-intelligence <runId>` / `testorax history-intelligence`.

**Agent rules:** weak_pass is NOT a proven product bug. stale_deploy_suspected is a diagnostic, not a verdict. Cluster signatures are hints, not proofs. Use Fix Check to validate — run-intelligence is observation only.

Full walkthrough at https://testorax.com/docs/run-history-intelligence

## Compact Proof Summary (Agent Compact Mode)

Small (~1–3 KB) agent-first JSON summary of a run. Read this BEFORE downloading the full report or proof packet — it tells you the verdict, failure type, failed step, trust warning, and what to fetch next.

**Endpoint:** `GET /api/runs/:id/compact-proof` — authenticated only (X-Admin-Key, dashboard session cookie, or X-Api-Key with proof:read). Cross-tenant guarded. Hard size cap 6 KB; truncation adds truncated:true + next:[…] pointers.

**Recommended agent loop:**

1. Fetch compact proof.
2. If verdict=fail and reason is clear, patch the exact issue.
3. If UI / selector / auth / visual uncertainty exists, fetch latest/marked screenshot.
4. If evidence is weak, fetch Proof Packet.
5. After patching, run Fix Check.
6. Do not claim fixed until Fix Check passes.

**MCP:** `get_compact_proof({runId})`. **CLI:** `testorax compact-proof <runId>` (alias `testorax proof-summary`).

**Agent rules:** Compact proof is NOT a replacement for a full audit. Do not claim fixed before Fix Check. Do not weaken assertions or delete the failing scenario. Do not patch production code from inconclusive proof.

Full walkthrough at https://testorax.com/docs/compact-proof

## Agent Visual Proof Loop (Screenshot-on-Demand)

Compact proof now carries a `screenshotRecommendation` block: {recommended, reason, latestUrl, markedUrl, why, budget:{maxScreenshotsForAgentLoop=2, fetchLatestFirst=true, fetchMarkedIfSelectorRelated=true}, noScreenshotAvailable?, noScreenshotAdvice?}.

**Loop:**

1. Fetch compact proof.
2. Read screenshotRecommendation.recommended.
3. If reason ∈ {selector_missing, testid_missing, page_mismatch}, fetch the marked screenshot.
4. For auth_state_unclear / stale_deploy_suspected / visual_issue, fetch latest.
5. Stop after at most 2 screenshots per agent loop.
6. If noScreenshotAvailable, do not retry.

**Endpoints:** `GET /api/runs/:id/screenshots/latest`, `GET /api/runs/:id/screenshots/latest/marked`, `GET /api/runs/:id/screenshots/:nameWithExt/image`. Cross-tenant guarded. No raw R2 public links.

**MCP:** get_compact_proof, get_screenshot_recommendation, get_latest_screenshot, get_marked_screenshot.

**CLI:** testorax screenshot-latest / screenshot-marked / compact-proof.

**Agent rules:** Do not spam screenshot fetches. No video replay. No session recording. No browser takeover.

**Inline rendering (Claude Code Desktop):** Pass `returnImage: true` to `get_latest_screenshot` or `get_marked_screenshot` to receive an MCP image content block (base64 PNG + mimeType image/png) alongside the metadata. Claude Code Desktop renders inline. Default is metadata-only to save tokens.

**CLI local-file fallback:** `testorax screenshot-latest <runId> --open` (or `--output <file.png>`) saves under `.testorax/screenshots/` and opens; `testorax screenshot-marked <runId> --open` does the same for marked. Output is `image saved to: <absolute path>`.

Full walkthrough at https://testorax.com/docs/agent-visual-proof-loop

## Preflight / Build Freshness Guard

Scenario-level preflight catches stale-deploy / wrong-route / unauthenticated / missing-testid problems BEFORE deep scenario steps run.

**Verdicts:** preflight_passed / preflight_failed / stale_deploy_suspected / required_testid_missing / build_marker_missing / asset_hash_mismatch / auth_issue / route_mismatch / critical_api_failed / critical_console_error / blank_page / inconclusive_preflight.

**Refusal:** invalid preflight configs return HTTP 400 `preflight_config_unsafe` BEFORE billing — no run created, 0 credit deducted.

**Agent rules:** A preflight failure is NOT automatically an app bug. stale_deploy_suspected means redeploy cleanly + verify build BEFORE changing app logic. Do not patch business logic until preflight passes.

Full walkthrough at https://testorax.com/docs/preflight-build-freshness

## Scenario Template Library

41 ready-made, agent-first scenario templates. Use these BEFORE writing custom scenarios by hand.

**Endpoints:**

- `GET /api/scenario-templates` (optional `?appFamily=`, `?category=`)
- `GET /api/scenario-templates/:id`
- `POST /api/scenario-templates/:id/generate` body `{"variables": {...}}` — validates required inputs, refuses forbidden secret-shape values, refuses no-final-assertion / evaluate-only, blocks DELETE on non-sandbox hosts. 400 on refusal — no run created.

**Loop:** list → get → generate → submit via `/api/runs/bypass` (executeOnly:true) → compact-proof → Fix Check.

**MCP:** `list_scenario_templates`, `get_scenario_template`, `generate_scenario_from_template`. **CLI:** `testorax scenarios list|show|generate`.

**Agent rules:** do NOT remove assertions; do NOT paste real cookies / passwords / auth tokens; do NOT skip preflight; fetch compact proof first; only `verdict=fixed_verified` counts as a fix.

Full walkthrough at https://testorax.com/docs/scenario-template-library

Browsable UI gallery: https://testorax.com/scenario-templates — filter by app family / category / Fix Check compatibility, preview templates, fill inputs, generate scenarios, and start a Flow Test in-page through the standard `/api/runs/start` path (mode=workflow_test, customer X-Api-Key with `runs:start` scope, full validation + plan eligibility + run-credit gating). Server-side scenario lint runs AFTER workflow_test wrapping so no-final-assertion / destructive-on-production scenarios are refused before run creation. `Idempotency-Key` header is honored — same key+body→replay, same key+different body→409 `idempotency_conflict`. Credentialed flows use `/account/test` or Login Memory; the gallery launcher does not collect passwords / cookies / storageState / API keys for the target app. Refused launches deduct 0 run credits.

## Fix Check Same-Test Lock

Makes `fixed_verified` impossible to fake.

**Endpoint:** `POST /api/runs/:id/fix-check/lock` body `{verificationRunId, reconcilerVerdict?}`. Owner/admin only. Cross-tenant guarded. NO run created, NO credit deducted.

**Verdicts (12, closed):** fixed_verified / still_failing / proof_weakened / scope_changed / assertion_weakened / preflight_weakened / testids_weakened / route_changed / template_changed / proof_disappeared / inconclusive / unsafe_to_claim_fixed.

**Blocked from fixed_verified:** templateId / route / assertionsHash / required assertion / required preflight / required testId changed; backend / screenshot / cleanup proof present on original but absent on verification.

**One-way fail:** the lock can DOWNGRADE fixed_verified but never PROMOTE a non-fixed verdict.

**Agent rules:** do NOT remove assertions or skip preflight or change route or remove proof in the verification run. Any non-`fixed_verified` outcome means `unsafeToClaimFixed=true` — do NOT claim "fixed".

Full walkthrough at https://testorax.com/docs/fix-check-same-test-lock

## Code-Aware Fix Hints

Strictly advisory. When a run proves a failure, generates safe read-only hints showing likely source files / symbols / routes / API handlers to inspect. NO LLM. NO auto-patch. NO raw code in output.

**Endpoint:** `POST /api/runs/:id/code-hints` body `{files:[{path, source},...]}`. Owner/admin only. Cross-tenant guarded. Body cap 256 KB; per-file cap 200 KB. Excludes node_modules / .next / dist / build / .env* / lockfiles / binaries automatically.

**Confidence:** high (exact testId/API-path match + runtime evidence aligns) / medium (approximate) / low (single weak signal — inspect only).

**Agent rules:** Do NOT auto-apply. Do NOT claim "root cause" from static match alone. Inspect candidates first; verify via Compact Proof + Run Intelligence; then patch and run Fix Check; only `verdict=fixed_verified` counts.

**MCP:** `get_code_hints({runId, files})`. **CLI:** `testorax code-hints <runId> [--root <dir>] [--max-files N]`.

Full walkthrough at https://testorax.com/docs/code-aware-fix-hints

## Run Submission Schema Contract (Fast Bug Scan auth reliability)

Closed wire shape for `/api/runs/start` and `/api/runs/bypass`. Auth strategies: `storage_state | cookies | login_form | session_injection | login_memory | steps`. Pick one — mixing fields with the declared strategy returns `400 auth_mixed_strategies`.

Cookies / storageState / login_form / steps are ALL applied before the first protected goto (`authDiagnostics.appliedBeforeFirstGoto=true`).

Every authenticated FBS report carries top-level `authEvidence`: `{used, strategyAttempted, strategySucceeded, appliedBeforeFirstGoto, targetUrl, lastSeenUrl, redirectedToLogin, scopedPathsAttempted[], pagesCrawled, failureReason}`. Booleans + counts + strategy names only — no raw cookies / passwords / headers / storageState / CSRF tokens.

Refusal codes (closed set): `invalid_run_mode | url_required | invalid_url | auth_strategy_unknown | auth_login_form_required | auth_cookies_required | auth_storage_state_required | auth_unsupported_field | auth_mixed_strategies | auth_session_injection_unsafe | invalid_scoped_paths | invalid_scoped_path_entry | executeOnly_requires_scenarios | skipCrawl_requires_scenarios_or_steps | invalid_scenarios_array | invalid_template_inputs | invalid_preflight | invalid_viewport | payload_too_large`. All return 400 BEFORE billing — no run created, 0 credits deducted.

Machine-readable schema: `GET /api/docs/run-submission-schema.json`.

Agent rules: Read `authEvidence.strategySucceeded` first. `pagesCrawled=0` is normal for LCA — use `scopedPathsAttempted`. Do NOT patch app code from `redirectedToLogin=true` alone.

Full walkthrough at https://testorax.com/docs/run-submission-schema

## Report Evidence Contract (Passing Run Evidence + Console/Network Evidence)

PASS is NOT automatically strong proof. report.json carries `passingRunEvidence` with `proofStrength` ∈ `{strong, moderate, weak, inconclusive}`, plus `stepEvidence[]` and `runtimeEvidence{console, network}`.

`runtimeEvidence.network` request items: `{id, method, host, pathname, status, durationMs, initiatedByStepIndex, requestBodyShape, responseBodyShape, authCookiePresent, csrfPresent, authorizationHeaderPresent, redacted: true}`. Body shapes show key names + types + counts only — values NEVER stored. Auth/CSRF/Authorization presence is boolean-only — never raw headers.

`runtimeEvidence.console` items: `{id, level, preview (clipped 240 + redacted), pageUrl, initiatedByStepIndex, redacted: true}`.

Stable Evidence IDs: `step_NNN` / `net_NNN` / `console_NNN`. Agents should reference these when making claims.

**proofStrength rules**: PASS without any assertion → weak. PASS + assertion + no console errors + no failed requests → strong. PASS with auth issue → weak.

**Agent rules**: Read `passingRunEvidence.proofStrength` first. Weak / inconclusive PASS = re-run with a real assertion before patching. No persistence claim without `readback_assert`. No cleanup claim without `verify_cleanup`.

**Compact Proof embeds**: `passingRunEvidence` summary block. **Proof Packet contractVersion 1.7.0** embeds full passingRunEvidence + stepEvidence + runtimeEvidence.

Schema JSON: `GET /api/docs/report-evidence-contract.json`.

Full walkthrough at https://testorax.com/docs/report-evidence-contract

## Fast Bug Scan Reality Grounding (no hallucinated controls)

FBS only tests **real controls observed in the live DOM / accessibility snapshot**. Reports carry a closed-shape `coverageGrounding` block: `{applicable, contractVersion, observedControlsCount, groundedActionsExecuted, ungroundedActionsBlocked, shellControlsCount, moduleBodyControlsCount, controlsClickedByZone, moduleBodyCoverageStatus, coverageStatus ∈ {real_coverage | partial_coverage | shell_only_coverage | no_real_coverage | inconclusive}, observedControls[], blockedUngroundedActions[], whatIsProven, whatIsNotProven, doNotClaim}`.

Every observed control: `{id:"ctrl_NNN", textPreview (redacted), selectorCandidates[] (redacted), zone ∈ {shell, module_body, unknown}, signature, observedBeforeAction:true}`. Every blocked action: `{id:"blocked_NNN", intendedSelector (redacted), reason ∈ {ungrounded_action_blocked | hallucinated_control_candidate | selector_not_observed | …}, executed:false}`.

**Agent rules**: Read `coverageGrounding.coverageStatus` first. shell_only_coverage / no_real_coverage → do NOT claim app behavior was tested. NEVER classify a missing invented control as a product bug.

Compact Proof embeds the compact `coverageGrounding`. Proof Packet contractVersion 1.7.0 → 1.8.0 embeds the full block.

Schema JSON: `GET /api/docs/coverage-grounding.json`.

Full walkthrough at https://testorax.com/docs/coverage-grounding

## Action Fidelity Contract (synthetic-click + native-input detection)

A JS `.click()` is NOT a real user click. An `evaluate`-set value is NOT native typing. Reports carry a closed-shape `actionFidelity` block: `{applicable, contractVersion, summary:{realPointerActions, nativeFillActions, keyboardTypeActions, programmaticClicks, evaluateFallbacks, weakFidelityActions, ambiguousActions, weakFidelityPresent}, actions[], weakFidelityWarnings, doNotClaim}`.

Closed-set fidelity verdicts: `real_pointer | native_fill | keyboard_type | programmatic_click | evaluate_fallback | synthetic_event | ambiguous | not_applicable`. Closed-set weaknessReason: `pointer_proof_missing | forced_click_without_target_proof | overlay_intercepted_target_differs | evaluate_set_value_no_input_event | programmatic_click_via_evaluate | dispatched_event_only | coordinate_click_no_target_match | keyboard_proof_missing | focus_proof_missing | instrumentation_unavailable | not_applicable`.

PassingRunEvidence: when `weakFidelityPresent=true`, proofStrength capped at moderate / weak. doNotClaim attaches "Do not claim real-user interaction fidelity was proven."

Compact Proof embeds compact summary. Proof Packet contractVersion 1.8.0 → 1.9.0 embeds full block.

Schema JSON: `GET /api/docs/action-fidelity.json`.

Full walkthrough at https://testorax.com/docs/action-fidelity

## Optional

- Scenario schema reference: https://testorax.com/docs/scenarios
- Agent Fix Loop: https://testorax.com/docs/agent-fix-loop
- Login Memory: https://testorax.com/docs/login-memory
- Watch Test Live: https://testorax.com/docs/watch-test-live
- Vibe Browser Engine: https://testorax.com/docs/vibe-browser-engine
- Authenticated scenarios: https://testorax.com/docs/authenticated-scenarios
- Run History Intelligence: https://testorax.com/docs/run-history-intelligence
- Run Submission Schema Contract: https://testorax.com/docs/run-submission-schema
- Report Evidence Contract: https://testorax.com/docs/report-evidence-contract
- Coverage Grounding (FBS Reality Grounding): https://testorax.com/docs/coverage-grounding
- Action Fidelity (synthetic-click detection): https://testorax.com/docs/action-fidelity
- Compact Proof Summary: https://testorax.com/docs/compact-proof
- Agent Visual Proof Loop: https://testorax.com/docs/agent-visual-proof-loop
- Preflight / Build Freshness Guard: https://testorax.com/docs/preflight-build-freshness
- Scenario Template Library: https://testorax.com/docs/scenario-template-library
- Assertion DSL + Scenario Authoring Expansion: https://testorax.com/docs/scenario-templates
- Fix Check Same-Test Lock: https://testorax.com/docs/fix-check-same-test-lock
- Code-Aware Fix Hints: https://testorax.com/docs/code-aware-fix-hints
- API reference: https://testorax.com/api-docs
- CLI reference: https://testorax.com/cli
- Integrations (Claude Code, Cursor, MCP): https://testorax.com/integrations
- Account and API key: https://testorax.com/account/api-key
- MCP package on npm: https://www.npmjs.com/package/testorax-mcp
- CLI package on npm: https://www.npmjs.com/package/testorax

## Smart Queue (Stage A)

`POST /api/runs/start` adds a customer-safe `queueEta` block (`status` ∈ {normal,busy,saturated,unknown}, `estimatedStartSeconds` bucketed 5s, `confidence` ∈ {low,medium,high}, plain-language `message`). ETA failures never block run creation.

Admin telemetry: `GET /api/admin/capacity/snapshot` — contract version 1.0.0, admin-only, contract-pinned 6-bucket lane projection (fast/workflow/visual/backend/operational/unknown), saturation status with deterministic decision tree, recommendations, anti-fabrication via `limitations[]`. The full feed at `/api/admin/capacity` is preserved unchanged.

Stage A is observation only — scheduler routing, run pricing, run-credit deduction, and worker behavior are unchanged.

Full walkthrough at https://testorax.com/docs/smart-queue

## Smart Queue Stage B (admin alerts)

Admin telemetry: `GET /api/admin/capacity/alerts` — contract 1.0.0, admin-only, deterministic alerts (8 closed-set types: queue_wait_busy / queue_wait_saturated / worker_stale / failure_surge / timeout_surge / deep_jobs_blocking_fast / underutilization / insufficient_data). Recommendation-only — `safeToAutomate=false` on every alert. No scheduler routing change, no billing change. Anti-fabrication: every alert carries observedValue + threshold + sourceMetric.

## Smart Queue Stage C (smart run scheduling — default OFF)

Pure deterministic `decideDispatch` function in `@testorax/shared/scheduler` with 4 decision kinds (`claim`/`defer`/`skip_concurrency_cap`/`no_eligible_runs`) and 4 cap names (`per_user_active`/`per_app_active`/`per_account_queued`/`per_worker_heavy`). Contract version `1.0.0`. Default fairness caps: perUserActive=4, perAppActive=8, perAccountQueued=16, perWorkerHeavyCap=1. Admin accounts bypass per-user/per-app/per-account caps. Hetzner dispatch wired behind `SMART_DISPATCH_ENABLED='true'` env, default OFF — when unset, existing Stage D + Stage F policies run unchanged. Customer-safe `queueEta.ownQueuePosition` on run-start responses. Admin dry-run telemetry on `/api/admin/capacity/snapshot.proposedFairness` + 2 new `fairness_cap_hit_*` alert types.

## Smart Queue Stage D (burst recommendation — advisory only)

Admin endpoint: `GET /api/admin/capacity/burst-recommendation` (contract 1.0.0). Returns 4-value recommendation enum: `add_worker` / `remove_burst_worker` / `keep_current` / `insufficient_data`. `safeToAutomate=false`. Never calls Hetzner API. Never provisions a worker. Failure/timeout surge ≥50% forces `keep_current` ("scaling up would mask root cause"). `notes.unmeasured` declares CPU/RAM/browser saturation as unmeasured (anti-fabrication).

## Smart Queue Stage E (history/drift — recommendation only)

Admin endpoints: `POST /api/admin/capacity/burst-recommendation-snapshot` (writer; idempotent on 10-min window) + `GET /api/admin/capacity/burst-recommendation-history` (reader; contract 1.0.0). Drift status 5-value enum: `stable` / `oscillating` / `add_pressure` / `remove_pressure` / `unknown`. Writer hard-codes `safe_to_automate=0`. Reader never calls Hetzner API. New additive table `burst_recommendation_snapshots` (migration 0074). The existing `*/10 * * * *` cron also writes one snapshot per 10-min window automatically through the same writer helper (`captureBurstRecommendationSnapshot`); manual POST and auto-cron use identical code path. No new cron, no Hetzner API, no auto-provisioning.

## Smart Queue Stage F (operator action ledger)

Admin endpoints: `POST /api/admin/capacity/operator-action` (writer; appends one ledger row) + `GET /api/admin/capacity/operator-actions` (reader; contract 1.0.0). 4-value action enum: `acknowledged` / `declined` / `acted_manually` / `deferred`. Invalid action → 400 `code=invalid_action`. `note` hard-capped at 400 chars at write boundary. Reader returns latest 50 rows (7-day window), counts by action, last action per snapshot recommendation window (cap 25). New additive table `capacity_operator_actions` (migration 0075). LEDGER ONLY. No row in this table triggers any Hetzner API call, auto-provisioning, billing change, or scheduler change. `notes.ledgerOnly=true`, `notes.hetznerApiCalled=false` on every response. `actor_type` derived from auth path (admin_session vs admin_key), never from caller payload.

## Autonomous QA Campaign Orchestrator — Stage A

Read-only campaign preview for a target URL. Returns route inventory,
control inventory (closed-set 18 control types + 8 safeClassification
states), scenario manifest, page-run cost estimate, recommended mode,
and willNotBeTested list. NEVER mutates target. NEVER creates a run.
NEVER deducts credits. Always returns `confirmationRequiredBeforeRun: true`.

Endpoint: `POST /api/campaigns/preview` (public, no auth, 30/min/IP).
Body: `{ targetUrl, manualSeedRoutes?, hasAuthProfile? }`. SSRF-guarded.

MCP tool (stdio + hosted): `campaign_preview`.
CLI: `testorax campaign preview <url> [--seed /x] [--has-auth] [--json]`.

One selected page-run consumes one run credit. The preview surfaces a
customer-facing run-balance estimate denominated in run credits — never
invented dollar amounts.

Closed-set `pricingStatus` (4): `authenticated_enough_runs` (caller has
enough; `requiresPurchaseBeforeRun: false`); `authenticated_insufficient_runs`
(short; surfaces `runsShortfall` + plan/pack options);
`anonymous_pricing_preview` (no `X-Api-Key`); `pricing_unavailable`
(balance lookup errored — never invents a dollar total).

Closed-set `availableActions` (5): `confirm_run`, `upgrade_plan`,
`buy_run_pack`, `pay_per_run`, `sign_in`.

Forwarded `X-Api-Key` reads the caller's wallet read-only — quota is NOT
consumed by a preview. Internal cost terms (tokens, model cost, browser
cost, worker cost, server cost) are NEVER returned.

Stage B (truthful default): bounded HTTP/static per-route discovery via
Cloudflare Worker fetch + regex extraction. NO Playwright, NO real
browser, NO JavaScript execution.
discoveryEngine=cloudflare_worker_http_fetch_static_html. Stage B
discoveryMethod values: http_route_fetch / static_html_extraction /
static_html_fallback / sitemap_only / auth_blocked / error. Caller
passes scope (all / module / route_group / route_only), module (closed-set
21 module ids), routes[]. Response includes moduleOptions[] (per-module
page-run + run-credit estimate) and controlInventory.byRouteSummary[]
(typed counts + closed-set discoveryMethod + discoveryLimitations[]).

Stage C (env-flag gated, STAGE_C_BROWSER_PREVIEW_ENABLED): real-browser
preview via the Testorax Hetzner Playwright runner. Caller passes
discoveryDepth=real_browser_preview to enqueue a preview job (mode=
campaign_preview_browser). Response includes capabilities.previewJobId
(shape prv_<22>); poll GET /api/campaigns/preview/jobs/:id (or MCP tool
campaign_preview_job) for the per-route hydrated control inventory.
discoveryEngine=testorax_runner_playwright_preview.

Stage C invariants HARD (runner-enforced): chargedRunCredit:0,
consumeQuota:false, submitForms:false, clickDestructive:false,
clickPayment:false, sendEmail:false, uploadFiles:false,
expandDropdowns:false, openModals:false, useAuthProfile:false. Caps:
3 routes default, 10 hard cap, 20s per-route, 60s total, 5 screenshots,
256 KB output. When env flag is off, the synchronous response carries
realBrowserFallbackReason=feature_flag_disabled — never silently
presents static as browser.

Closed-set additions Stage C: discoveryMethod adds real_browser_preview /
real_browser_preview_hydration_failed / real_browser_preview_capped.
discoveryDepth adds real_browser_preview. discoveryLimitations adds
dropdown_options_not_expanded / modal_contents_not_opened.
discoveryEngine (2): cloudflare_worker_http_fetch_static_html /
testorax_runner_playwright_preview. previewJobStatus (6): queued /
running / complete / failed / capped / feature_disabled.
realBrowserFallbackReason (7).

Stage C.1 ships the Hetzner Playwright runner consumer
(apps/worker/src/jobs/campaignPreviewBrowserJob.ts). It validates HARD
invariants before launching Chromium, navigates each selected route,
waits for hydration, and reads visible-DOM controls without clicking
or submitting anything. Result is written to context_json.previewJobResult
and returned through GET /api/campaigns/preview/jobs/:id. A
JS-rendered fixture endpoint at /api/campaigns/preview/fixture/js-rendered
proves the static-vs-browser difference: static sees ~1 control (loading
placeholder), browser preview sees the hydrated inventory after JS runs.

Stage D ships authenticated real-browser preview discovery: pass
authScope="authenticated_preview" + authProfileId="login_<22>" (Login
Memory profile reference) on POST /api/campaigns/preview. NEVER pass
raw cookies / passwords / storageState / tokens — rejected as
forbidden_auth_field. The Worker validates ownership before queuing
(login_memory_sessions.email must match the X-Api-Key caller's email).
The Hetzner runner is the ONLY consumer that decrypts the vault blob.
Per-route authDiscoveryStatus closed-set: not_requested / authenticated
/ redirected_to_login / unauthorized / forbidden / profile_missing /
profile_invalid / profile_expired / profile_forbidden / unknown.
discoveryMethod flips to real_browser_preview_authenticated only for
routes inspected with auth + hydrated. Invariants HARD:
chargedRunCredit:0, no form submission, no destructive/payment/email.

Stage E ships live SSE progress telemetry for preview jobs. Three
endpoints: GET /api/campaigns/preview/jobs/:id/progress (JSON snapshot,
120/min/IP), GET /api/campaigns/preview/jobs/:id/events (Server-Sent
Events with closed-set event kinds: progress / route_started /
route_completed / job_capped / job_failed / job_complete / heartbeat;
stream closes on isFinal=true or 60s cap), website live view at
/campaigns/preview/jobs/:id. CLI: `testorax campaign watch <id>`
renders progress bar + ETA + status. MCP: campaign_progress tool
returns the snapshot. Progress contract closed-set status (8): queued
/ running / inspecting / testing / capped / failed / complete /
stopped. ETA is null when not enough data — never fabricated.
chargedRunCredit:0, consumeQuota:false on every snapshot.

Stage F ships Chrome parity + runner-mismatch classification. Preview
jobs now carry a parity surface: GET /api/campaigns/preview/jobs/:id
returns jobParity + routeParities[]. The compact /progress endpoint
returns parityHint when the job completes. MCP campaign_parity tool
returns the parity-only projection. CLI `testorax campaign job <id>`
renders the parity panel. parityStatus closed-set (5+2 retained for
back-compat): not_checked / internal_compared / chrome_checked /
mismatch_detected / insufficient_evidence — `chrome_checked` is NEVER
emitted in Stage F because Chrome Live is not wired in this environment.
mismatchClassification closed-set (12): none_detected /
static_vs_browser_mismatch / runner_vs_chrome_mismatch /
selector_mismatch / route_mismatch / hydration_mismatch /
auth_state_mismatch / viewport_mismatch / click_actionability_mismatch /
network_environment_mismatch / chrome_confirmation_needed /
insufficient_evidence. chromeParityAvailable is hard-false. Parity
computation is a pure read over already-stored runner output and the
static control inventory captured at queue time. NO credit deducted, NO
wallet write, NO quota consumed. A `mismatch_detected` is NOT
automatically an app bug — agents MUST read mismatchClassification +
runnerMismatchPossible before patching production code. When
mismatchClassification === 'static_vs_browser_mismatch', the static
preview undercounted a JS-rendered SPA — use real-browser depth, this
is NOT an app fix.

Stage G.0 ships the Chrome Live MCP Bridge — the first honest path to
parityStatus='chrome_checked'. Local stdio MCP tool
`chrome_capture_for_job` and CLI `testorax campaign chrome-capture` spawn
the operator's installed Google Chrome stable via raw CDP, open the
target URL, capture a redacted DOM control inventory, and POST it to
`POST /api/campaigns/preview/jobs/:id/chrome-evidence`. Worker compares
runner output vs real Chrome and emits chrome_checked when both observed
data, with closed-set classifications: none_detected /
runner_vs_chrome_mismatch / route_mismatch / auth_state_mismatch /
viewport_mismatch / network_environment_mismatch /
insufficient_evidence. chromeParityAvailable flips true job-wide. The
ingestion endpoint rejects forbidden fields by NAME (cookies /
storageState / Authorization / bearer / password / accessToken /
refreshToken / idToken / apiKey / secret / csrfToken / requestBodies /
responseBodies). Hosted MCP does NOT expose chrome_capture_for_job —
it cannot drive a real Chrome on the operator's machine. NO chargeable
run created, NO credit deducted, NO wallet/ledger write, NO quota
consumed; ingestion only updates the existing preview-job context_json
in place.

Stage G.1 ships the Testorax Chrome Proof Bridge browser extension MVP
(`apps/chrome-proof-bridge/`). MV3, minimal permissions (activeTab +
scripting + storage), host permission limited to https://testorax.com/*,
extension name "Testorax Chrome Proof Bridge". The extension uses the
same Chrome evidence contract as Stage G.0 with the new
source='chrome_extension' value (CHROME_EVIDENCE_SOURCES widened from
1 → 2). The popup runs `chrome.scripting.executeScript` with a function
reference (NOT a code string) on user click against the active tab,
walks visible interactive controls only, and POSTs to the same
ingestion endpoint. The extension never reads cookies, localStorage,
sessionStorage, Authorization/Bearer headers, passwords (counted only
— input.value is never read), tokens, request bodies, or response
bodies. The Stage G.0 MCP/CLI bridge is preserved verbatim — both
paths produce chrome_checked, and a job without either still does NOT
emit chrome_checked. Chrome Web Store public listing flip remains
operator-gated.

Sandbox mutation flow and full campaign execution ship in later stages.

Closed-set proof-scope labels (16): `route_render`, `visible_ui_click`,
`modal_open`, `field_visible`, `form_fill`, `search_filter`,
`save_persistence`, `backend_api`, `cross_module_reflection`, `mutation`,
`cleanup`, `partial`, `blocked`, `false_positive_likely`,
`runner_mismatch`, `chrome_confirmation_needed`.

Runner-mismatch kinds (5): `route_loaded_in_runner_not_chrome`,
`control_visible_in_chrome_not_runner`,
`runner_click_failed_chrome_click_works`,
`api_proof_without_visible_ui_proof`, `selector_mismatch_likely`.

Full walkthrough at https://testorax.com/docs/campaign-preview

## Assertion helpers (DSL)

Closed-set 19 helper kinds: `page_should_load`, `text_should_appear`, `text_should_not_appear`, `button_should_be_clickable`, `form_should_submit`, `url_should_include`, `element_should_exist`, `element_should_not_exist`, `console_should_be_clean`, `network_should_be_clean`, `no_4xx_5xx_requests`, `no_visible_error`, `toast_should_appear`, `table_should_have_rows`, `modal_should_open`, `modal_should_close`, `input_should_accept_text`, `select_should_change`, `checkbox_should_toggle`. Each compiles deterministically to existing engine `TestStep` actions — no new `StepAction` values.

Validate via `POST /api/scenario-templates/validate-assertion` (public, no auth). Body: `{ helper: { kind, ...fields } }`. Refusal codes: `unknown_helper_kind` / `missing_required_field` / `invalid_field_shape` / `forbidden_value_shape` / `too_many_helpers`. 50-helper hard cap.

Companion: 12 new scenario templates across 6 new categories (`dashboard`, `admin`, `proof`, `fix_check`, `booking`, `console_clean`) — total registry now 57. 0-step generation refused as `invalid_template_generation`.

### MCP + CLI helper

Use BEFORE submitting a custom-scenario run — no API key, no run created, no quota burn.

**MCP tool** (stdio + hosted): `validate_assertion`. Args `{ kind, fields }`. Returns `{ ok, contractVersion, compiledSteps }` on success; `{ ok:false, code, field?, message, exampleValid }` on failure.

**CLI**: `testorax assertion validate <kind> [--field key=value …]` · `--json '{...}'` · `--file helper.json` · `--stdin`. Alias `compile`. Exit codes: 0 OK, 2 usage, 3 invalid helper (corrected example printed). NO API key required.

Copy-paste examples:

```
testorax assertion validate text_should_appear --field text="Welcome back"
testorax assertion validate console_should_be_clean
echo '{"kind":"url_should_include","substring":"/dashboard"}' | testorax assertion validate --stdin
```

Full walkthrough at https://testorax.com/docs/scenario-templates


## Campaign Fix Check loop (Stage J)

Verified-rerun automation for Stage I campaigns. After fixing code, hit
POST /api/campaigns/:id/fix-check with one of 5 scopes (issue / page_run
/ patch_batch / failed_routes / full_campaign). Hetzner reruns the same
routes; comparator emits one of 13 closed-set verdicts:
verified_fixed, still_failing, partially_fixed, regression_detected,
inconclusive, blocked_auth, blocked_safety, runner_mismatch_possible,
chrome_confirmation_needed, chrome_checked_match, chrome_checked_mismatch,
insufficient_evidence, not_rerun_yet.

Idempotent on (sourceCampaignId, scope, sourceXxxId). Charges 1 run
credit per page-run (admin bypass). Returns 402 insufficient_run_credits
before any deduction.

Read endpoints: GET /api/fix-checks/:id, GET /api/fix-checks/:id/progress.

CLI: testorax campaign fix-check <campaignId> [--issue|--page-run|
--patch-batch|--failed-routes|--full-campaign]; testorax campaign-fix-check
status|watch <fixCheckId>.

MCP tools: fix_check_start, fix_check_status, fix_check_progress,
fix_check_verdict (hosted + stdio).

Strict rule: do not claim verified_fixed without a deployed rerun. The
runner reruns the SAME routes against the actual app; only the comparator
on real evidence can decide verified_fixed.

Walkthrough: /docs/fix-check


## Marketing + discoverability (2026-05-07)

- /is-testorax-legit — trust page
- /how-testorax-works — 7-step pipeline
- /no-code-access-needed — Testorax tests running URLs, no codebase access required
- /safe-browser-testing — Safe Test Mode contract
- /proof-packet — proof contract
- /ai-fix-prompt — 7 closed-set branches
- /fix-check — 13 closed verdicts
- /testorax-for-claude-code, /testorax-for-cursor, /testorax-for-codex
- /mcp-testing-for-ai-agents — MCP catalog
- /alternatives + /testorax-vs-{testsprite,magicpod,bugbug,reflect}
- /docs/mobile-native-qa — future-track honest disclosure

## Refund tooling (Stage P.2 Half 2)

- POST /api/billing/refunds/request — auto under $30, manual queue ≥ $30
- GET  /api/billing/refunds/:refundId
- GET  /api/account/refunds?email=
- /account/wallet#refunds — customer-facing history
- /refund-policy — public policy + 30-day under-$30 quick-refund language

## Proof Depth Telemetry
Fetch proof packet to see proofDepthTelemetry.{visual,network,persistence}.
- Screenshot captured is NOT visual diff
- Network summary is NOT raw HAR
- Write observed is NOT persistence verified
- Readback verified is stronger than UI-only success
- Do not claim backend persistence without readback proof

## Autonomous Config Discovery
API endpoint: POST /api/config/discover
CLI: `testorax config discover <targetUrl>` (flags: --auth-session, --run-id, --route, --qa-prefix, --from-report, --from-proof, --safe, --cleanup, --json)
MCP tool: `discover_config` (hosted + stdio)
Shared builder: `buildConfigDiscovery()` from `@testorax/shared/configDiscovery`

Drafts only — never auto-runs.
- destructiveAllowed=false by default
- hardDeleteAllowed=false by default
- requiresConfirmation=true on every draft
- Confirm routes and selectors before running CRUD
- Provide safeMutationPolicy + cleanupPolicy before running mutation
- Use QA prefix for cleanup
- Do not claim autonomous CRUD coverage from draft config alone
- Run `audit preview` after discovery to confirm

## Deployment Awareness
Launch readiness emits a `deploymentAwareness` block (`status` closed set: same_target / target_mismatch / stale_proof / sandbox_or_staging_only / preview_only / mixed_targets / caller_declared / deploy_metadata_supplied / production_unconfirmed / unknown).

Rules:
- Same-target proof is stronger than unrelated proof.
- Sandbox/staging/preview proof is NOT production proof.
- Stale proof is NOT launch proof.
- Caller-supplied deploy metadata (--deploy-provider, --deploy-environment, --deploy-branch, --deploy-commit, --deploy-id) is NOT independently verified.
- GO_AFTER_DEPLOY means fix/proof may be good but production is not yet proven.
- Only use `--deploy-environment production` when the run target AND the deployment really match.

## Confirmed Config Execution Gate
Endpoint: POST /api/config/confirm
CLI: testorax config confirm <targetUrl> --from-draft draft.json --confirm-draft --confirm-mutation-risk --confirm-cleanup-required --json
MCP tool: confirm_config

Honesty rules:
1. Discovery creates drafts — drafts do NOT execute.
2. Confirmation validates + acknowledges risk — STILL does NOT execute.
3. Audit run executes only with a valid, unexpired confirmedConfigToken.
4. Mutations require safeMutationPolicy + cleanupPolicy.
5. hard_delete is BLOCKED by default. destructiveAllowed=false by default.
6. Cleanup proof must be checked AFTER any mutating execution.
7. Confirmed token expires in 30 minutes.
8. Confirmation does NOT prove CRUD coverage — only proof packets from real runs do.

Batch 2: Audit run now dispatches confirmed tokens:
- workflow type → starts workflow_test (caller must pass draftConfig in run body — token is small metadata only)
- crud type → starts full_crud_e2e (hard_delete BLOCKED, destructiveAllowed FORCED false; mutation requires safeMutationPolicy + cleanupPolicy at run-time)
- campaign type → still unsupported (campaign_execution_not_supported_from_confirmed_config)

Run-time gates: targetUrl + authSessionId must match token; destructiveAllowed=true refused; hard_delete refused; mutation requires safeMutationPolicy + cleanupPolicy; billing preserved through startRunInternal.

After execution: inspect cleanupTelemetry + proofDepthTelemetry.persistence in proof packet. Do not claim CRUD coverage without successful run + cleanup proof.

## CRUD Persistence Readback
Proof packet now carries optional `persistenceReadback` (status closed set, per-resource breakdown).

Honesty rules:
- UI save success is NOT backend persistence proof.
- Write observed is NOT readback verified.
- Readback verified is required for CRUD persistence claims.
- Cleanup verified is separate from readback verified.
- Confirmed CRUD execution requires explicit readback config on every mutating resource (Batch 3 gate).
- Fetch proof packet or launch readiness to see persistenceReadback.
- Do not claim full CRUD coverage from config confirmation alone.

Confirmed CRUD audit-run will refuse with blockedReason `crud_execution_requires_readback_config` if any mutating resource lacks readback config.

## CRUD Readback Config (engine-side capture)
CrudConfig resources can now carry:
- `readbackMethod`: 'dom' | 'api' | 'report' | 'none'
- `readbackAssertions[]`: array of dom_visible / dom_text_includes / dom_attribute_equals / list_contains / list_excludes / api_status / api_field_present
- `readbackSelector`: shorthand for dom_visible
- `readbackUrl`: shorthand for api_status 200-299

Engine support in this release:
- DOM readback assertions: deferred to next batch (Hetzner engine wiring)
- API readback assertions: NOT auto-executed by the engine — caller must inspect proof packet manually
- Cleanup verification: shipped via cleanupTelemetry

Honesty:
- UI save success is NOT backend persistence proof
- Write observed is NOT readback verified
- Readback config is required for mutating CRUD (Batch 1 gate)
- Engine outcome capture inside full_crud_e2e runner ships in a follow-up batch

## Agent Summary (unified compact JSON)
Endpoint: GET /api/runs/:id/agent-summary
CLI: testorax summary <runId> [--json]
MCP tool: get_agent_summary

Compact (<=12 KB) agent-readable envelope. Use this FIRST after a run completes — answers what happened, what is proven, what is NOT proven, doNotClaim guards, evidence links, AI fix prompt, recommended next action, AND Coverage Confidence Batch 1 blocks: coverageConfidence + routeDiscovery + spaAwareness + formIntelligence.

Honesty:
- screenshots[] is empty when no screenshots exist (never invented)
- persistenceReadbackAvailable is false unless persistenceReadback envelope exists
- engineCapturedOutcomesAvailable is false unless engine actually captured outcomes
- secretsRedacted is always true (literal); rawPayloadsExposed is always false (literal)

## Coverage Confidence (Batch 1, read-time inference)

Embedded inside the agent summary envelope. Batch 2: Hetzner engine instrumentation shipped — `inferred: false` when real browser evidence captured (framenavigated route events, pushState/replaceState/popstate/hashchange transitions, count-only form metadata, selector shape classifier). Falls back to inferred heuristics otherwise. Auto-emits unsafeToClaimAdditions appended into doNotClaim (deduplicated, capped at 12). Runtime captures NEVER read field values, page HTML, cookies, headers, or request/response bodies.

- coverageConfidence: overall (high|medium|low|unknown), coverageStrength (strong|moderate|weak|minimal|none), coverageType[] (route_render/interaction/form_interaction/mutation/cross_route/spa_navigation/deep_navigation), diagnostics[] (16-value closed set), metrics, riskFlags, reasons. Fast Bug Scan capped at moderate; low route count never reaches high; persistence-grade coverage requires explicit signals.

- routeDiscovery: also at GET /api/runs/:id/routes. totalDiscovered/totalVisited/totalBlocked, routes[] (cap 100) with source/visited/rendered/blocked/blockReason/shellOnly/spaTransition. spaDetected requires explicit signal. Same-host filter — never lists third-party hosts. CLI: `testorax routes <runId> [--json]`. MCP: `get_routes`.

- spaAwareness: framework detection (react|vue|angular|svelte|next|nuxt|unknown), spaTransitionsObserved count, deepSpaTraversalAttempted requires >=3 transitions. htmlSnippet capped at 1 KB — only framework markers, never user content.

- formIntelligence: formsDetected, formsAnalyzed (clamped: never exceeds formsDetected), submissionsAttempted, authFormsDetected, searchFormsDetected. Never claims submission success from detection-only evidence.

## Fix Verification (Regression Truth Classification — Batch 1)

Endpoint: POST /api/fix-verification {originalRunId, verificationRunId}
CLI: `testorax verify-fix <original> <verification> [--json]`
MCP tool: `verify_fix`

12-status closed set. Honesty rules:
- fixed_verified requires proof scope equal-or-stronger than original failure
- flaky requires ≥3 inconsistent reruns (NEVER declared from one rerun)
- selector_changed when route works but selector path differs
- proof_changed_but_not_verified when rerun used weaker proof
- prefer unable_to_verify over false certainty
- doNotClaim auto-appended into agent summary doNotClaim
- inferred:true (comparison is heuristic until full proof-scope-equality comparator ships)
- Read-only; no run created, no credit deducted

## Safe Test Data Mode

Endpoint: GET /api/runs/:id/safe-test-data
CLI: testorax safe-data <runId> [--json]
MCP tool: get_safe_test_data

4 modes: dry_run / fixture_only / sandbox_mutation / read_only

Safety gates (closed set):
- production_mutation_blocked
- payment_action_blocked
- email_send_blocked
- sms_send_blocked
- whatsapp_send_blocked
- cleanup_plan_missing
- marker_missing
- sandbox_not_certified
- destructive_action_blocked
- read_only_mode_blocked_mutation

Honesty rules (always):
- cleanupAttempted is NOT cleanupVerified
- Marker in request is NOT persistence proof (readback required)
- Production mutation is PERMANENTLY blocked (literal safety.productionMutationBlocked=true)
- Retained QA data reduces launch readiness
- Different markers between original/verification reduce fix-verification confidence

Cross-system:
- Agent Summary embeds safeTestData; doNotClaim auto-appended
- Proof Packet contract 1.13.0 carries top-level safeTestData
- Launch Readiness consumes downgrade-only (cleanup_failed → NO_GO; retainedCount on mutation scope → INSUFFICIENT_PROOF)
- Fix Verification detects marker mismatch between original/verification
- Engine-side marker injection ships in future batch (Batch 1 is contract + builder + projection)
- Read-only; no run created, no credit deducted

## Visual / Mobile Layout Intelligence

Endpoint: GET /api/runs/:id/visual-layout
CLI: testorax visual <runId> [--json]
MCP tool: get_visual_layout

Operational layout intelligence (NOT pixel-perfect visual diff, NOT WCAG compliance).

Closed-set issue types (20): horizontal_overflow / clipped_content / offscreen_control / overlap_detected / zindex_overlay_block / hidden_interactive_control / modal_unusable / drawer_unusable / sticky_header_overlap / viewport_scroll_trap / layout_shift_high / tiny_tap_target / responsive_breakpoint_regression / text_cutoff / button_outside_viewport / interaction_blocked_visually / infinite_scroll_instability / collapsed_navigation / inaccessible_mobile_menu / excessive_zoom_required.

Closed-set severities: critical / high / medium / low / info.

Closed-set viewport presets: mobile_sm / mobile_lg / tablet / desktop / desktop_wide.

Honesty rules (always in doNotClaim):
- Visual issues are operational layout checks, NOT pixel-perfect visual regression
- Do not claim WCAG/accessibility compliance — only tap-target size and overflow are checked
- If a viewport was NOT tested, layout there is UNVERIFIED

Cross-system:
- Agent Summary embeds visualLayout; doNotClaim auto-appended; interactionBlocking > 0 adds an unsafeToClaim line
- Proof Packet contract 1.14.0 carries top-level visualLayout
- Launch Readiness consumes downgrade-only (interactionBlocking>0 on launch → INSUFFICIENT_PROOF; highSeverity≥3 on launch → INSUFFICIENT_PROOF)
- Coverage Confidence adds mobile_viewport_untested / desktop_viewport_untested when only one viewport spectrum tested
- Batch 1 projects from existing click_audit_findings mobile_* errorTypes; engine-side visual scan deferred
- Read-only; no run created, no credit deducted

## Patch Batch Verification State Tracking

Endpoints: GET /api/runs/:id/patch-batches + GET /api/patch-batches/:id + POST /api/patch-batches/:id/verification
CLI: testorax patch-batches <runId> | testorax patch-batch <patchBatchId>
MCP: get_patch_batches + get_patch_batch (hosted + stdio)

v1 = run-level computed-only (persisted:false). Closed-set states (10): unverified / verification_pending / fixed_verified / still_reproduces / partially_fixed / unable_to_verify / proof_weakened / flaky / regression_reopened / superseded. Honesty: fixed_verified ONLY when Fix Verification returns fixed_verified; flaky requires >=3 inconsistent reruns; regression_reopened requires prior fixed_verified history. NEVER infers fixed from fewer findings.

## App Intelligence Map / Proof Graph Dashboard

Endpoint: GET /api/runs/:id/app-map
CLI: testorax app-map <runId> [--json]
MCP tool: get_app_map (hosted + stdio)
UI: /runs/:id/app-map

Run-level read-time projection that fuses routeDiscovery + coverageConfidence + spaAwareness + formIntelligence + visualLayout + safeTestData + fixVerification + findings into graph nodes/edges + summary + classification. NO new engine instrumentation. Run-level ONLY (not project-level).

Closed-set vocabularies:
- 2 scopes: run / project (Batch 1 = run only)
- 5 overall-health levels: strong / moderate / weak / critical / unknown
- 8 node types: route / module / flow / control / form / issue / proof / blocked_area
- 7 node statuses: pass / fail / warning / blocked / untested / inconclusive / unknown
- 6 node colors: green / red / yellow / gray / blue / purple
- 6 severities: critical / high / medium / low / info / unknown
- 6 edge types: navigation / spa_transition / form_submit / flow / issue_affects / proof_supports
- 3 edge confidences: high / medium / low

Color classification (CONSERVATIVE):
- green = tested + ≥1 control observed + no medium-or-higher finding
- red = bug / high-risk on visited route. NEVER on auth-blocked
- yellow = warning / weak proof
- gray = untested / unknown — NEVER claimed as pass
- blue = auth-blocked / guardrail-blocked. NEVER red/bug
- purple = fix-verification proof node

Overall health rules (CONSERVATIVE):
- 0 tested routes → overallHealth=unknown (NEVER strong)
- ≥3 bugs OR ≥2 high-risk → critical
- coverageConfidence=high AND 0 bugs AND <3 warnings → strong
- ≥1 bug OR ≥5 warnings → weak
- otherwise → weak

Honesty rules (always in doNotClaim):
- Run-level map only — does not represent the full app architecture
- Untested routes are UNVERIFIED (gray); do not claim coverage of gray nodes
- Auth-blocked routes (blue) are NOT product bugs — they require Login Memory or auth profile
- Visual layout issues are operational, NOT pixel-perfect regression
- Skipped destructive/payment actions remain blocked

Caps: 200 nodes, 500 edges, 10 clusters, 12 doNotClaim, 8 dataSources, 100 input routes.

Cross-system:
- Agent Summary carries compact appMap pointer {available, url, summary}
- inferred=true when any source block was inferred
- Read-only; no run created, no credit deducted

## GitHub PR Workflow

CLI: testorax pr-comment <runId> [--json] [--output comment.md]
Workflow template: .github/workflows/testorax-pr.yml
Docs: /docs/github-action.md

Honesty (CONSERVATIVE status mapping):
- 0 tests / 0 routes → statusForGithub='neutral' (NEVER 'success')
- PASS + testedRoutes >= 1 + coverage high/medium → 'success'
- FAIL with high/critical severity findings → 'failure'
- FAIL with only low/medium severity → 'warning'
- INCONCLUSIVE → 'neutral'; BLOCKED → 'warning'
- This is a comment generator, NOT a GitHub App
- No GitHub tokens stored by Testorax
- Markdown output capped at 10 KB
- CLI composes locally from existing endpoints; no new server route

Comment contains: outcome / status / proof strength / coverage table,
top bugs (cap 5), warnings (cap 5), skipped, limitations, doNotClaim
(always includes "Do not claim coverage on untested routes"),
nextActions (includes Fix Check when failures), links to Report /
Proof Packet / Agent Summary / App Map.

Comment NEVER contains: API keys, cookies, storage state, passwords,
raw response bodies, DOM snippets.

## Agent-Native Workflow

8-step loop for coding agents (Claude / Codex / Cursor / MCP):
1. Scan (testorax scan <url> or POST /api/runs/start)
2. Read agent summary + App Map (testorax summary + app-map)
3. Patch from proof (not guesses)
4. Verify fix (testorax verify-fix <originalRunId> <newRunId>)
5. Review unsafeToClaim
6. Launch audit (GET /api/launch/readiness?runId=...)
7. PR comment if shipping via PR (testorax pr-comment)
8. Ship ONLY on Fix Verification = fixed_verified

6 honesty rules:
- 0 findings ≠ 100% coverage (read coverageConfidence)
- Untested ≠ tested-pass (gray on App Map is unverified)
- fixed_verified is the only proof a fix landed
- Auth-blocked is a missing Login Memory step, not an app bug
- UI render ≠ persistence (no readback_assert = no save proof)
- Do not weaken failing scenarios — same-test lock detects it

Read:
- /docs/agent-workflow — canonical 8-step loop
- /docs/vibe-coder-guide — 5-min quickstart for non-testers
- /docs/testing-philosophy — proof-based, not claim-based
- /docs/launch-audit-guide — "safe to launch?" verdict
- /docs/fix-verification-guide — 12-status closed set
- /docs/coverage-confidence — 4-tier reference
- /docs/app-map-guide — 6-color legend
- /docs/visual-layout-intelligence — operational layout, not pixel diff
- /docs/patch-batch-verification — 10-state batch lifecycle

## Outcome Semantics Hardening (No Fake Pass v1)

11-class closed set replaces shallow PASS/FAIL:
- VERIFIED_PASS — meaningful coverage + assertions + good proof
- SMOKE_PASS — entry loaded only; NOT release-ready QA
- LOW_COVERAGE_PASS — no bugs found, coverage low; broader test needed
- INCONCLUSIVE — partial/conflicting evidence
- BLOCKED_AUTH_REQUIRED — auth wall blocked most routes
- BLOCKED_SAFETY_POLICY — destructive/payment refused
- NOT_APPLICABLE — run mode doesn't match target (NOT a bug)
- TEST_DESIGN_WEAK — assertions absent or insufficient
- FAILURE_CONFIRMED — bug reproduced with strong proof
- FAILURE_SUSPECTED — bug signal with weak proof
- UNKNOWN — insufficient evidence to classify

Status mapping (PR comment):
- VERIFIED_PASS → success
- FAILURE_CONFIRMED → failure
- BLOCKED_* / TEST_DESIGN_WEAK / FAILURE_SUSPECTED → warning
- SMOKE_PASS / LOW_COVERAGE_PASS / INCONCLUSIVE / NOT_APPLICABLE / UNKNOWN → neutral

Launch Readiness downgrades (scope=launch, NEVER promotes):
- SMOKE_PASS → INSUFFICIENT_PROOF (semantic_outcome_smoke_only)
- LOW_COVERAGE_PASS → INSUFFICIENT_PROOF (semantic_outcome_low_coverage_for_launch)
- TEST_DESIGN_WEAK → INSUFFICIENT_PROOF (semantic_outcome_test_design_weak)
- NOT_APPLICABLE → INSUFFICIENT_PROOF (semantic_outcome_not_applicable_for_target)

Honesty rules:
- NEVER VERIFIED_PASS when testedRoutes=0
- NEVER VERIFIED_PASS when proofStrength=none
- NEVER VERIFIED_PASS when coverage low/unknown
- NEVER VERIFIED_PASS without meaningful assertions
- NEVER VERIFIED_PASS when fix verification absent
- Static-target CRUD failure is NOT_APPLICABLE, not FAILURE_*

## Target Capability Intelligence v1

Conservative read-time classifier. Inferred from existing telemetry only
(routeDiscovery, formIntelligence, spaAwareness, network requests,
authBlocked counts). NO new crawling.

Target types (closed set):
- static_site | crud_app | dashboard | ecommerce | docs | api_app | spa | marketing | unknown

Interaction levels: none | low | medium | high | unknown.

Surfaces:
- agent-summary.targetCapability — full envelope
- classifySemanticOutcome consumes formsDetected / mutationSurfaceDetected / apiSurfaceDetected (CRUD-on-static-target → NOT_APPLICABLE)
- testorax pr-comment — adds Target type + Interaction level rows
- App Intelligence Map — emits blue blocked_area node when CRUD attempted against static target

Honesty rules:
- paymentSurfaceDetected=true means PAYMENT ROUTES OBSERVED, NEVER payment flows tested
- Capability classifications are inferred from telemetry, not framework introspection
- Prefer UNKNOWN over false certainty

## Coverage Reality Layer v1

Explicit NOT REACHED + 10-state blocked-by closed set. Read-time projection over existing telemetry. No new engine instrumentation.

BLOCKED_STATES (10):
- blocked_by_auth | blocked_by_safety_policy | blocked_by_gating | blocked_by_missing_setup | blocked_by_product_gap | blocked_by_missing_credentials | blocked_by_login_wall | blocked_by_pin_wall | blocked_by_environment | blocked_by_unknown

NOT_REACHED_KINDS (11):
- authenticated_module | crud_create | crud_read | crud_update | crud_delete | cross_module_check | refresh_persistence_check | payment_flow | admin_panel | mobile_viewport | generic_route

Metrics:
- routesDiscovered / routesRendered / authenticatedRoutesReached
- crudCreateTested / crudReadTested / crudUpdateTested / crudDeleteTested
- crossModuleChecks=0 / refreshPersistenceChecks=0 (engine does not surface these in v1)
- visualProofAvailable / consoleChecked / networkChecked

Surfaces:
- agent-summary.coverageReality — full envelope (auto-appends doNotClaim, cap 14)
- App Intelligence Map — gray (not_reached) / blue (blocked) / yellow (partial proof) nodes
- testorax pr-comment — Coverage reality row + Not reached section
- Launch Readiness scope=launch — coverage_not_reached_for_launch + authenticated_coverage_not_reached blocking reasons
- Outcome Semantics — PASS + authBlockedCount >= 1 → SMOKE_PASS (NEVER VERIFIED_PASS)

Honesty rules:
- NEVER imply authenticated coverage without authenticated modules reached AND exercised
- NEVER imply CRUD coverage without CRUD operations executed
- NEVER imply persistence/cross-module proof without verified evidence
- Prefer explicit NOT REACHED over vague "clean"
- inferred=true in v1