Skip to content

Commit b20ec4d

Browse files
authored
Merge pull request #224 from Predicate-Labs/agent_auto
Predicate agent
2 parents 120797f + 8103916 commit b20ec4d

11 files changed

+1424
-1
lines changed

CHANGELOG.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,105 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## Unreleased
99

10+
### 2026-02-15
11+
12+
#### PredicateBrowserAgent (snapshot-first, verification-first)
13+
14+
`PredicateBrowserAgent` is a new high-level agent wrapper that gives you a **browser-use-like** `step()` / `run()` surface, but keeps Predicate’s core philosophy:
15+
16+
- **Snapshot-first perception** (structured DOM snapshot is the default)
17+
- **Verification-first control plane** (you can gate progress with deterministic checks)
18+
- Optional **vision fallback** (bounded) when snapshots aren’t sufficient
19+
20+
It’s built on top of `AgentRuntime` + `RuntimeAgent`.
21+
22+
##### Quickstart (single step)
23+
24+
```python
25+
from predicate import AgentRuntime, PredicateBrowserAgent, PredicateBrowserAgentConfig, RuntimeStep
26+
from predicate.llm_provider import OpenAIProvider # or AnthropicProvider / DeepInfraProvider / LocalLLMProvider
27+
28+
runtime = AgentRuntime(backend=...) # PlaywrightBackend, CDPBackendV0, etc.
29+
llm = OpenAIProvider(model="gpt-4o-mini")
30+
31+
agent = PredicateBrowserAgent(
32+
runtime=runtime,
33+
executor=llm,
34+
config=PredicateBrowserAgentConfig(
35+
# Token control: include last N step summaries in the prompt (0 disables history).
36+
history_last_n=2,
37+
),
38+
)
39+
40+
ok = await agent.step(
41+
task_goal="Find pricing and verify checkout button exists",
42+
step=RuntimeStep(goal="Open pricing page"),
43+
)
44+
```
45+
46+
##### Customize the compact prompt (advanced)
47+
48+
If you want to change the “compact prompt” the executor sees (e.g. fewer fields / different layout), you can override it:
49+
50+
```python
51+
from predicate import PredicateBrowserAgentConfig
52+
53+
def compact_prompt_builder(task_goal, step_goal, dom_context, snapshot, history_summary):
54+
system = "You are a web automation agent. Return ONLY one action: CLICK(id) | TYPE(id, \"text\") | PRESS(\"key\") | FINISH()"
55+
user = f"TASK: {task_goal}\nSTEP: {step_goal}\n\nRECENT:\n{history_summary}\n\nELEMENTS:\n{dom_context}\n\nReturn the single best action:"
56+
return system, user
57+
58+
config = PredicateBrowserAgentConfig(compact_prompt_builder=compact_prompt_builder)
59+
```
60+
61+
##### CAPTCHA handling (interface-only; no solver shipped)
62+
63+
If you set `captcha.policy="callback"`, you must provide a handler. The SDK does **not** include a public CAPTCHA solver.
64+
65+
```python
66+
from predicate import CaptchaConfig, HumanHandoffSolver, PredicateBrowserAgentConfig
67+
68+
config = PredicateBrowserAgentConfig(
69+
captcha=CaptchaConfig(
70+
policy="callback",
71+
# Manual solve in the live session; SDK waits until it clears:
72+
handler=HumanHandoffSolver(timeout_ms=10 * 60_000, poll_ms=1_000),
73+
)
74+
)
75+
```
76+
77+
##### LLM providers (cloud or local)
78+
79+
`PredicateBrowserAgent` works with any `LLMProvider` implementation. For a local HF Transformers model:
80+
81+
```python
82+
from predicate.llm_provider import LocalLLMProvider
83+
84+
llm = LocalLLMProvider(model_name="Qwen/Qwen2.5-3B-Instruct", device="auto", load_in_4bit=True)
85+
```
86+
87+
##### Opt-in token usage accounting (best-effort)
88+
89+
If you want to measure token spend, you can enable best-effort accounting (depends on provider reporting `prompt_tokens` / `completion_tokens` / `total_tokens` in `LLMResponse`):
90+
91+
```python
92+
from predicate import PredicateBrowserAgentConfig
93+
94+
config = PredicateBrowserAgentConfig(token_usage_enabled=True)
95+
96+
# Later:
97+
usage = agent.get_token_usage()
98+
agent.reset_token_usage()
99+
```
100+
101+
##### RuntimeAgent: act once without step lifecycle (orchestrators)
102+
103+
`RuntimeAgent` now exposes `act_once(...)` helpers that execute exactly one action **without** calling `runtime.begin_step()` / `runtime.emit_step_end()`. This is intended for external orchestrators (e.g. WebBench) that already own step lifecycle and just want the SDK’s snapshot-first propose+execute block.
104+
105+
- `await agent.act_once(...) -> str`
106+
- `await agent.act_once_with_snapshot(...) -> (action, snap)`
107+
- `await agent.act_once_result(...) -> { action, snap, used_vision }`
108+
10109
### 2026-02-13
11110

12111
#### Expanded deterministic verifications (adaptive resnapshotting)

examples/agent/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Predicate agent examples.
2+
3+
- `predicate_browser_agent_minimal.py`: minimal `PredicateBrowserAgent` usage.
4+
- `predicate_browser_agent_custom_prompt.py`: customize the compact prompt builder.
5+
- `predicate_browser_agent_video_recording_playwright.py`: enable Playwright video recording via context options (recommended).
6+
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
"""
2+
Example: PredicateBrowserAgent with compact prompt customization.
3+
4+
This shows how to override the compact prompt used for action proposal.
5+
6+
Usage:
7+
python examples/agent/predicate_browser_agent_custom_prompt.py
8+
"""
9+
10+
import asyncio
11+
import os
12+
13+
from predicate import AsyncSentienceBrowser, PredicateBrowserAgent, PredicateBrowserAgentConfig
14+
from predicate.agent_runtime import AgentRuntime
15+
from predicate.llm_provider import LLMProvider, LLMResponse
16+
from predicate.models import Snapshot
17+
from predicate.runtime_agent import RuntimeStep
18+
from predicate.tracing import JsonlTraceSink, Tracer
19+
20+
21+
class RecordingProvider(LLMProvider):
22+
"""
23+
Example provider that records the prompts it receives.
24+
25+
Swap this for OpenAIProvider / AnthropicProvider / DeepInfraProvider / LocalLLMProvider in real usage.
26+
"""
27+
28+
def __init__(self, action: str = "FINISH()"):
29+
super().__init__(model="recording-provider")
30+
self._action = action
31+
self.last_system: str | None = None
32+
self.last_user: str | None = None
33+
34+
def generate(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMResponse:
35+
_ = kwargs
36+
self.last_system = system_prompt
37+
self.last_user = user_prompt
38+
return LLMResponse(content=self._action, model_name=self.model_name)
39+
40+
def supports_json_mode(self) -> bool:
41+
return False
42+
43+
@property
44+
def model_name(self) -> str:
45+
return "recording-provider"
46+
47+
48+
def compact_prompt_builder(
49+
task_goal: str,
50+
step_goal: str,
51+
dom_context: str,
52+
snap: Snapshot,
53+
history_summary: str,
54+
) -> tuple[str, str]:
55+
_ = snap
56+
system = (
57+
"You are a web automation executor.\n"
58+
"Return ONLY ONE action in this format:\n"
59+
"- CLICK(id)\n"
60+
'- TYPE(id, "text")\n'
61+
"- PRESS('key')\n"
62+
"- FINISH()\n"
63+
"No prose."
64+
)
65+
# Optional: aggressively control token usage by truncating DOM context.
66+
dom_context = dom_context[:4000]
67+
user = (
68+
f"TASK GOAL:\n{task_goal}\n\n"
69+
+ (f"RECENT STEPS:\n{history_summary}\n\n" if history_summary else "")
70+
+ f"STEP GOAL:\n{step_goal}\n\n"
71+
f"DOM CONTEXT:\n{dom_context}\n"
72+
)
73+
return system, user
74+
75+
76+
async def main() -> None:
77+
run_id = "predicate-browser-agent-custom-prompt"
78+
tracer = Tracer(run_id=run_id, sink=JsonlTraceSink(f"traces/{run_id}.jsonl"))
79+
80+
api_key = os.environ.get("PREDICATE_API_KEY") or os.environ.get("SENTIENCE_API_KEY")
81+
82+
async with AsyncSentienceBrowser(api_key=api_key, headless=False) as browser:
83+
page = await browser.new_page()
84+
await page.goto("https://example.com")
85+
await page.wait_for_load_state("networkidle")
86+
87+
runtime = await AgentRuntime.from_sentience_browser(
88+
browser=browser, page=page, tracer=tracer
89+
)
90+
91+
executor = RecordingProvider(action="FINISH()")
92+
93+
agent = PredicateBrowserAgent(
94+
runtime=runtime,
95+
executor=executor,
96+
config=PredicateBrowserAgentConfig(
97+
history_last_n=2,
98+
compact_prompt_builder=compact_prompt_builder,
99+
),
100+
)
101+
102+
out = await agent.step(
103+
task_goal="Open example.com",
104+
step=RuntimeStep(goal="Take no action; just finish"),
105+
)
106+
print(f"step ok: {out.ok}")
107+
print("--- prompt preview (system) ---")
108+
print((executor.last_system or "")[:300])
109+
print("--- prompt preview (user) ---")
110+
print((executor.last_user or "")[:300])
111+
112+
tracer.close()
113+
114+
115+
if __name__ == "__main__":
116+
asyncio.run(main())
117+
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
"""
2+
Example: PredicateBrowserAgent minimal demo.
3+
4+
PredicateBrowserAgent is a higher-level, browser-use-like wrapper over:
5+
AgentRuntime + RuntimeAgent (snapshot-first action proposal + execution + verification).
6+
7+
Usage:
8+
python examples/agent/predicate_browser_agent_minimal.py
9+
"""
10+
11+
import asyncio
12+
import os
13+
14+
from predicate import AsyncSentienceBrowser, PredicateBrowserAgent, PredicateBrowserAgentConfig
15+
from predicate.agent_runtime import AgentRuntime
16+
from predicate.llm_provider import LLMProvider, LLMResponse
17+
from predicate.runtime_agent import RuntimeStep, StepVerification
18+
from predicate.tracing import JsonlTraceSink, Tracer
19+
from predicate.verification import exists, url_contains
20+
21+
22+
class FixedActionProvider(LLMProvider):
23+
"""Tiny in-process provider for examples/tests."""
24+
25+
def __init__(self, action: str):
26+
super().__init__(model="fixed-action")
27+
self._action = action
28+
29+
def generate(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMResponse:
30+
_ = system_prompt, user_prompt, kwargs
31+
return LLMResponse(content=self._action, model_name=self.model_name)
32+
33+
def supports_json_mode(self) -> bool:
34+
return False
35+
36+
@property
37+
def model_name(self) -> str:
38+
return "fixed-action"
39+
40+
41+
async def main() -> None:
42+
run_id = "predicate-browser-agent-minimal"
43+
tracer = Tracer(run_id=run_id, sink=JsonlTraceSink(f"traces/{run_id}.jsonl"))
44+
45+
api_key = os.environ.get("PREDICATE_API_KEY") or os.environ.get("SENTIENCE_API_KEY")
46+
47+
async with AsyncSentienceBrowser(api_key=api_key, headless=False) as browser:
48+
page = await browser.new_page()
49+
await page.goto("https://example.com")
50+
await page.wait_for_load_state("networkidle")
51+
52+
runtime = await AgentRuntime.from_sentience_browser(
53+
browser=browser, page=page, tracer=tracer
54+
)
55+
56+
# For a "real" run, swap this for OpenAIProvider / AnthropicProvider / DeepInfraProvider / LocalLLMProvider.
57+
executor = FixedActionProvider("FINISH()")
58+
59+
agent = PredicateBrowserAgent(
60+
runtime=runtime,
61+
executor=executor,
62+
config=PredicateBrowserAgentConfig(
63+
# Keep a tiny, bounded LLM-facing step history (0 disables history entirely).
64+
history_last_n=2,
65+
),
66+
)
67+
68+
steps = [
69+
RuntimeStep(
70+
goal="Verify Example Domain is loaded",
71+
verifications=[
72+
StepVerification(
73+
predicate=url_contains("example.com"),
74+
label="url_contains_example",
75+
required=True,
76+
eventually=True,
77+
timeout_s=5.0,
78+
),
79+
StepVerification(
80+
predicate=exists("role=heading"),
81+
label="has_heading",
82+
required=True,
83+
eventually=True,
84+
timeout_s=5.0,
85+
),
86+
],
87+
max_snapshot_attempts=2,
88+
snapshot_limit_base=60,
89+
)
90+
]
91+
92+
ok = await agent.run(task_goal="Open example.com and verify", steps=steps)
93+
print(f"run ok: {ok}")
94+
95+
tracer.close()
96+
print(f"trace written to traces/{run_id}.jsonl")
97+
98+
99+
if __name__ == "__main__":
100+
asyncio.run(main())
101+

0 commit comments

Comments
 (0)