Intent-Driven AI

Intent mode lets one goal-based spec run across whitelabelled or translated applications without copying the test for every label set.

The model is a bounded sidecar, not the source of pass/fail truth. It proposes a plan from ## Goal, WebTest AI validates that every action is allowed, the deterministic runner executes the actions, and ## Expected decides the outcome.

Bounded Goal Planner

The Bounded Goal Planner is the V2 intent execution contract. It accepts a natural-language ## Goal, target metadata, approved ## Data, current URL, visible text, accessibility candidates, DOM action candidates, and optional reviewed memory. It returns strict JSON using only allowed actions such as open, click, fill approved value, select, wait, and non-destructive submit.

The runner validates every action before execution. It rejects unsupported actions, unapproved values, destructive intents, malformed JSON, and low-confidence choices. This keeps AI useful for intent resolution without turning the test runner into an open-ended browser agent.

Before calling the planner, WebTest AI collects bounded text-first page evidence from the active driver: current URL, page fingerprint, visible text, accessibility candidates, and DOM action candidates. This gives the model locale and whitelabel context without letting it invent selectors or credentials.

Config

Define target variants in webtest-ai.config.json:

{
  "driver": {
    "name": "cdp",
    "require": ["actions", "assertions", "screenshots", "network", "console"]
  },
  "intent": {
    "memoryPath": ".webtest-ai/intent-plans.json",
    "memoryConfidenceThreshold": 0.9,
    "memoryStaleAfterDays": 30
  },
  "targets": {
    "brand-a-en": {
      "baseUrl": "https://brand-a.example",
      "locale": "en",
      "brand": "Brand A"
    },
    "brand-b-fr": {
      "baseUrl": "https://brand-b.example",
      "locale": "fr",
      "brand": "Brand B",
      "intentAliases": {
        "checkout": ["paiement", "commander"]
      }
    }
  },
  "models": {
    "activeProfile": "local-intent",
    "profiles": {
      "local-intent": {
        "provider": "ollama",
        "model": "qwen3:14b",
        "endpoint": "http://127.0.0.1:11434",
        "capabilities": {
          "structuredJson": true,
          "reasoning": true,
          "vision": false
        }
      }
    }
  }
}

Driver Parity

With driver.name: "cdp", intent runs execute through the Chrome DevTools Protocol path. CDP can consume reusable/API auth state and can save state from UI auth flows as cookies plus per-origin localStorage/sessionStorage. Chromium WebTest AI workflows are intended to be interchangeable between CDP and Playwright, including traces, axe/vitals quality signals, popups, and cross-origin frame actions. Non-Chromium browsers and Playwright trace/debugger workflows remain Playwright-only.

intentAliases are optional hints for translated or branded wording. They help the resolver connect canonical intents such as checkout to target-local labels such as commander.

When a model-generated goal plan passes, WebTest AI writes an intent-plans.proposed.json artifact. Reviewed entries can be promoted into intent.memoryPath; later runs reuse high-confidence matching plans before calling the model.

Review a generated proposal before reuse:

webtest-ai intent-memory list --proposal artifacts/<runId>/<testId>/intent-plans.proposed.json
webtest-ai intent-memory approve --proposal artifacts/<runId>/<testId>/intent-plans.proposed.json --dry-run
webtest-ai intent-memory approve --proposal artifacts/<runId>/<testId>/intent-plans.proposed.json

The lifecycle is deliberately review-first:

Passing intent runs create proposals, not approved memory.
approve --dry-run previews applies, skips, and overwrites for the same goal, target, and page fingerprint.
Only approved plans at or above intent.memoryConfidenceThreshold are reused.
stale and prune-stale help review or remove plans older than intent.memoryStaleAfterDays.

webtest-ai intent-memory stale --stale-after-days 30
webtest-ai intent-memory prune-stale --stale-after-days 30 --dry-run

Spec

---
suite: checkout-intent
modelMode: intent
targets: [brand-a-en, brand-b-fr]
tags: [smoke]
---

# Guest Checkout

## Goal
Buy the configured starter product as a guest.

## Data
product: Starter Plan
email: env WEBTEST_AI_BUYER_EMAIL

## Expected
1. Assert outcome "order is confirmed"

Run all configured suite targets:

webtest-ai run --suite specs/checkout-intent.md

Run one target:

webtest-ai run --suite specs/checkout-intent.md --target brand-b-fr

Whitelabel Smoke Coverage

The public demo site includes three whitelabel storefronts:

northwind-en: English copy, light Northwind Cloud theme, labels such as Start secure order, Ask an expert, and Inspect plan snapshot.
maison-fr: French copy, dark Maison Bleu theme, labels such as Finaliser la commande, Parler a l'equipe, and Voir l'apercu.
mercado-es: Spanish copy, Mercado Claro theme, labels such as Comprar ahora, Hablar con ventas, and Revisar vista previa.

Open them locally from demo-site/public/intent-shop.html?target=northwind-en, ?target=maison-fr, or ?target=mercado-es. The executable demo spec is specs/intent-whitelabel-demo.md:

---
suite: intent-whitelabel-demo
app: intent-shop
modelMode: intent
targets: [northwind-en, maison-fr, mercado-es]
---

# Starter plan guest order

## Goal
Buy the configured starter plan as a guest.

## Data
email: env WEBTEST_AI_BUYER_EMAIL

## Expected
1. Assert outcome "order is confirmed"

All variants run from the same intent goal and the same approved data. The deterministic test model returns canonical actions such as Fill intent "email address" and Click action "checkout"; target intentAliases resolve the localized controls. This keeps CI stable while proving the core whitelabel/locale promise through real CDP and Playwright browser execution.

Functional Commerce Demo

The demo evidence pack also includes:

specs/intent-commerce-demo.md, a 12-case functional commerce suite covering account signup, login, collection browse, product search, cart, discount, saved item, profile, newsletter, shipping, and signed-in checkout.
specs/intent-support-demo.md, which uses the canonical support goal and Assert outcome "help request is received" without exact localized support labels.
specs/intent-surface-demo.md, which opens and confirms the plan preview through canonical preview intents without exact localized dialog labels.

Run the local evidence pack:

npm run demo:intent
npm run demo:discover
npm run demo:v2
npm run demo:intent -- --headed
node demo-site/intent-demo-runner.js --intent --live-model --config examples/config/ollama-qwen3.config.json --target northwind-en --spec specs/intent-whitelabel-demo.md

demo:intent serves all three local storefronts and runs the 12-case commerce showcase plus checkout, support, and preview specs through the CDP driver with deterministic model responses. demo:v2 runs both the intent suite and discovery proposal flow.

Auto Exploration And UI Inventory

demo:discover explores the same storefronts and emits discovery-flow/UI-inventory proposals from bounded browser evidence. Auto exploration follows safe same-origin links, avoids risky actions such as logout/delete/refund patterns, collects accessibility/action candidates, and can write reviewable Markdown spec proposals. UI Inventory proposals stay review-first so teams can approve durable selectors and intent hints instead of accepting silent rewrites.

The --live-model form keeps the local demo servers and target matrix, but it calls the configured model profile instead of the deterministic demo sidecar. Start with one target and one spec because each goal and semantic outcome calls the model.

The same real-driver suite also serves the local edge demo with branded modal labels. The spec keeps canonical steps:

3. Click action "open modal"
4. Assert text "Preview open."
5. Click action "close modal"
6. Assert text "Preview closed."

The page labels are intentionally different: Launch preview opens the modal and Hide preview closes it. CDP and Playwright both pass only when the model sidecar ranks those bounded candidates and the follow-up state assertions prove the selected controls executed.

Run it with:

node tests/real-drivers.test.js
npm run test:v2

Live Model Sidecar

The optional next layer is a live Ollama smoke behind an environment flag, so local and CI runs do not depend on a model server by default:

WEBTEST_AI_LIVE_MODEL_SMOKE=1 npm run test:models:live

That smoke uses the fake driver rather than a real browser or app server, but it calls the configured model for both V2 intent paths: intent.plan_goal and intent.assert_outcome. The test passes only when the planner returns bounded actions and the semantic outcome matches bounded fixture evidence above the configured confidence threshold.

Optional Vision Evidence

Intent mode is text-first by default. If a model profile declares "vision": true, you can opt into screenshot evidence for semantic outcomes:

{
  "intent": {
    "visionEvidence": true
  }
}

When this is enabled, Assert outcome captures a bounded screenshot artifact and includes the screenshot path in the model evidence payload. The runner still validates the model output against explicit confidence and evidence rules, and drivers must advertise screenshots before a vision-evidence outcome can run.

Safety Model

The sidecar can only propose allowed action types: open, click, fill, select, wait, and submit.
Fill/select actions can only use approved ## Data keys or env-backed values.
Destructive intents such as delete, refund, archive, deactivate, or cancel subscription are blocked by default.
Low-confidence or ambiguous semantic assertions fail with evidence instead of guessing.
Reports include target metadata, generated goal plan, semantic assertion evidence, confidence, and model call metadata.