ASSAY AI

Know whether your browser agent actually did the work.

ASSAY runs evidence-focused evaluations for task completion, false completion, stop-condition obedience, precision navigation, dynamic pages, and prompt-injection resistance.

Evaluation packet Independent review
Completion evidence Verified
False completion Flagged
Prompt injection Tested
Task completion False completion Stop conditions Dynamic pages Prompt injection

Evaluation coverage

Built for agents that operate in the real browser.

01

Completion truth

ASSAY checks the observable end state, not just the agent's final message or a hopeful plan.

02

Behavior boundaries

Stop-condition tests catch risky follow-through before purchases, submissions, account actions, or irreversible steps.

03

Adversarial pages

Dynamic content and injection prompts reveal whether page text can pull the agent away from the user's task.

How it works

A compact test loop with human-readable evidence.

1

Design the assay

Define target tasks, risky stop conditions, and the exact evidence required for a pass.

2

Run browser trials

Exercise the agent against local and live-style pages with controlled grading criteria.

3

Report the verdict

Deliver clear pass, fail, and review calls that product and safety teams can act on.

Pilot evaluations

Want ASSAY to test your agent?

Schedule a pilot