Skip to content

Commit f268f24

Browse files
author
SentienceDEV
committed
Predicate agent
1 parent 120797f commit f268f24

File tree

5 files changed

+614
-0
lines changed

5 files changed

+614
-0
lines changed

CHANGELOG.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,83 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## Unreleased
99

10+
### 2026-02-15
11+
12+
#### PredicateBrowserAgent (snapshot-first, verification-first)
13+
14+
`PredicateBrowserAgent` is a new high-level agent wrapper that gives you a **browser-use-like** `step()` / `run()` surface, but keeps Predicate’s core philosophy:
15+
16+
- **Snapshot-first perception** (structured DOM snapshot is the default)
17+
- **Verification-first control plane** (you can gate progress with deterministic checks)
18+
- Optional **vision fallback** (bounded) when snapshots aren’t sufficient
19+
20+
It’s built on top of `AgentRuntime` + `RuntimeAgent`.
21+
22+
##### Quickstart (single step)
23+
24+
```python
25+
from predicate import AgentRuntime, PredicateBrowserAgent, PredicateBrowserAgentConfig, RuntimeStep
26+
from predicate.llm_provider import OpenAIProvider # or AnthropicProvider / DeepInfraProvider / LocalLLMProvider
27+
28+
runtime = AgentRuntime(backend=...) # PlaywrightBackend, CDPBackendV0, etc.
29+
llm = OpenAIProvider(model="gpt-4o-mini")
30+
31+
agent = PredicateBrowserAgent(
32+
runtime=runtime,
33+
executor=llm,
34+
config=PredicateBrowserAgentConfig(
35+
# Token control: include last N step summaries in the prompt (0 disables history).
36+
history_last_n=2,
37+
),
38+
)
39+
40+
ok = await agent.step(
41+
task_goal="Find pricing and verify checkout button exists",
42+
step=RuntimeStep(goal="Open pricing page"),
43+
)
44+
```
45+
46+
##### Customize the compact prompt (advanced)
47+
48+
If you want to change the “compact prompt” the executor sees (e.g. fewer fields / different layout), you can override it:
49+
50+
```python
51+
from predicate import PredicateBrowserAgentConfig
52+
53+
def compact_prompt_builder(task_goal, step_goal, dom_context, snapshot, history_summary):
54+
system = "You are a web automation agent. Return ONLY one action: CLICK(id) | TYPE(id, \"text\") | PRESS(\"key\") | FINISH()"
55+
user = f"TASK: {task_goal}\nSTEP: {step_goal}\n\nRECENT:\n{history_summary}\n\nELEMENTS:\n{dom_context}\n\nReturn the single best action:"
56+
return system, user
57+
58+
config = PredicateBrowserAgentConfig(compact_prompt_builder=compact_prompt_builder)
59+
```
60+
61+
##### CAPTCHA handling (interface-only; no solver shipped)
62+
63+
If you set `captcha.policy="callback"`, you must provide a handler. The SDK does **not** include a public CAPTCHA solver.
64+
65+
```python
66+
from predicate import CaptchaConfig, HumanHandoffSolver, PredicateBrowserAgentConfig
67+
68+
config = PredicateBrowserAgentConfig(
69+
captcha=CaptchaConfig(
70+
policy="callback",
71+
# Manual solve in the live session; SDK waits until it clears:
72+
handler=HumanHandoffSolver(timeout_ms=10 * 60_000, poll_ms=1_000),
73+
)
74+
)
75+
```
76+
77+
##### LLM providers (cloud or local)
78+
79+
`PredicateBrowserAgent` works with any `LLMProvider` implementation. For a local HF Transformers model:
80+
81+
```python
82+
from predicate.llm_provider import LocalLLMProvider
83+
84+
llm = LocalLLMProvider(model_name="Qwen/Qwen2.5-3B-Instruct", device="auto", load_in_4bit=True)
85+
```
86+
1087
### 2026-02-13
1188

1289
#### Expanded deterministic verifications (adaptive resnapshotting)

predicate/__init__.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,15 @@
3333
from .agent_config import AgentConfig
3434
from .agent_runtime import AgentRuntime, AssertionHandle
3535

36+
# Snapshot-first browser agent (new high-level surface)
37+
from .agents import (
38+
CaptchaConfig,
39+
PermissionRecoveryConfig,
40+
PredicateBrowserAgent,
41+
PredicateBrowserAgentConfig,
42+
VisionFallbackConfig,
43+
)
44+
3645
# Backend-agnostic actions (aliased to avoid conflict with existing actions)
3746
# Browser backends (for browser-use integration)
3847
from .backends import (

predicate/agents/__init__.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
"""
2+
Agent-level orchestration helpers (snapshot-first, verification-first).
3+
4+
This package provides a "browser-use-like" agent surface built on top of:
5+
- AgentRuntime (snapshots, verification, tracing)
6+
- RuntimeAgent (execution loop and bounded vision fallback)
7+
"""
8+
9+
from .browser_agent import (
10+
CaptchaConfig,
11+
PermissionRecoveryConfig,
12+
PredicateBrowserAgent,
13+
PredicateBrowserAgentConfig,
14+
VisionFallbackConfig,
15+
)
16+
17+
__all__ = [
18+
"CaptchaConfig",
19+
"PermissionRecoveryConfig",
20+
"PredicateBrowserAgent",
21+
"PredicateBrowserAgentConfig",
22+
"VisionFallbackConfig",
23+
]
24+

0 commit comments

Comments
 (0)