Skip to content

Commit a7c2d24

Browse files
author
SentienceDEV
committed
Sentience Context for Browser-use
1 parent ac07099 commit a7c2d24

File tree

3 files changed

+683
-0
lines changed

3 files changed

+683
-0
lines changed

docs/sentience_context_design.md

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
# Build SentienceContext for Browser-Use Agent Framework
2+
3+
## Relevant Files for Reference:
4+
1.state_injector.py: `/Users/guoliangwang/Code/Python/browser-use/browser_use/integrations/sentience/state_injector.py`
5+
2. Browser-use repo: `/Users/guoliangwang/Code/Python/browser-use`
6+
7+
## Task
8+
You are implementing a “Token-Slasher Context Middleware” for browser-use users, shipped inside the Sentience SDK.
9+
10+
### Goal
11+
12+
Create a **directly importable** context class named `SentienceContext` that browser-use users can plug into their agent runtime to generate a compact, ranked DOM context block using Sentience snapshots, reducing tokens and improving reliability.
13+
14+
This should be implemented inside the Sentience SDK repo under:
15+
16+
* `sentience/backends/sentience_context.py` (new file)
17+
* and exported from `sentience/backends/__init__.py`
18+
19+
It should refactor and supersede the logic currently in `state_injector.py` (which lives in a local browser-use repo copy). Use it as the baseline behavior, but remove debugging prints and improve robustness.
20+
21+
Also integrate with the already-existing `BrowserUseAdapter` inside the SDK.
22+
23+
---
24+
25+
## Requirements
26+
27+
### 1) Public API
28+
29+
Implement:
30+
31+
```py
32+
@dataclass
33+
class SentienceContextState:
34+
url: str
35+
snapshot: Snapshot
36+
prompt_block: str
37+
# optional: selected_element_ids: list[int]
38+
39+
class SentienceContext:
40+
def __init__(
41+
self,
42+
*,
43+
api_key: str | None = None,
44+
api_url: str | None = None,
45+
use_api: bool | None = None,
46+
limit: int = 60,
47+
show_overlay: bool = False,
48+
top_by_importance: int = 60,
49+
top_from_dominant_group: int = 15,
50+
top_by_position: int = 10,
51+
role_link_when_href: bool = True,
52+
include_rank_in_group: bool = True,
53+
env_api_key: str = "SENTIENCE_API_KEY",
54+
): ...
55+
56+
async def build(
57+
self,
58+
browser_session: "BrowserSession",
59+
*,
60+
goal: str | None = None,
61+
wait_for_extension_ms: int = 5000,
62+
retries: int = 2,
63+
retry_delay_s: float = 1.0,
64+
) -> SentienceContextState | None:
65+
"""Return context state or None if snapshot isn’t available."""
66+
```
67+
68+
Notes:
69+
70+
* `build()` must not throw for common failures; it should return `None` and log a warning.
71+
* `goal` should be passed into SnapshotOptions (so gateway rerank can use it).
72+
* Support both **extension-only mode** and **gateway mode** (if api_key exists or use_api=True).
73+
* `api_key` defaults from env var `SENTIENCE_API_KEY` if not passed.
74+
* Must avoid making browser-use a hard dependency (import types only under TYPE_CHECKING).
75+
76+
### 2) Snapshot acquisition (browser-use)
77+
78+
Use Sentience SDK’s existing integration pattern:
79+
80+
* Construct a `BrowserUseAdapter(browser_session)` and call `await adapter.create_backend()` (or equivalent) to obtain a backend for `sentience.backends.snapshot.snapshot()`.
81+
82+
Your `BrowserUseAdapter` exists and wraps CDP access. Don’t change its behavior except where necessary to make the context class clean.
83+
84+
### 3) Formatting: compact prompt block
85+
86+
The prompt block should be a minimal token “inventory,” similar to `state_injector.py`:
87+
88+
* Output header:
89+
90+
* `Elements: ID|role|text|imp|docYq|ord|DG|href` (compatible with existing)
91+
* Then list lines `cur_line = f"{id}|{role}|{name}|{importance}|{doc_yq}|{ord_val}|{dg_flag}|{href}"`
92+
93+
BUT improvements required:
94+
95+
#### 3.1 Remove debug prints
96+
97+
The existing file prints group keys and formatted lines; remove those entirely.
98+
99+
#### 3.2 Role semantics improvement (link vs button)
100+
101+
If `role_link_when_href=True`:
102+
103+
* If element has a non-empty `href`, output `role="link"` even if original role/tag is `button`.
104+
* Else keep existing role.
105+
106+
This improves LLM priors for feed/list pages.
107+
108+
#### 3.3 Dominant group membership must NOT use exact match
109+
110+
Use `el.in_dominant_group` if present (preferred). That field is expected from gateway and uses fuzzy matching.
111+
If it’s missing, fallback to exact match ONLY as last resort. (You already do this; keep it.)
112+
113+
#### 3.4 Fix `ord_val` semantics (avoid huge values)
114+
115+
If `include_rank_in_group=True`:
116+
117+
* Prefer a true small ordinal index over `group_index` if `group_index` can be “bucket-like”.
118+
Implement:
119+
* `rank_in_group`: computed locally in this formatter:
120+
121+
* Take interactive elements where `in_dominant_group=True`
122+
* Sort them by `(doc_y, bbox.y, bbox.x, -importance)` using available fields
123+
* Assign `rank_in_group = 0..n-1`
124+
* Then set:
125+
126+
* `ord_val = rank_in_group` for dominant group items
127+
* otherwise `ord_val="-"`
128+
129+
Do NOT modify the Snapshot schema; compute this locally in the context builder.
130+
131+
Keep emitting `doc_yq` as `round(doc_y/200)` like current code, but ensure doc_y uses `el.doc_y` if present.
132+
133+
#### 3.5 `href` field: keep short token
134+
135+
Keep the current behavior of compressing href into a short token (domain second-level or last path segment).
136+
137+
### 4) Element selection strategy (token slasher)
138+
139+
Replicate the selection recipe from `state_injector.py`:
140+
141+
* Filter to “interactive roles”
142+
* Take:
143+
144+
* top_by_importance
145+
* top_from_dominant_group
146+
* top_by_position (lowest doc_y)
147+
* Deduplicate by element ID.
148+
149+
BUT make it robust:
150+
151+
* don’t rely on `snapshot.dominant_group_key` being present; use `in_dominant_group=True` filtering primarily.
152+
* if `doc_y` missing, fallback to `bbox.y` if bbox exists.
153+
154+
### 5) Logging
155+
156+
Use `logging.getLogger(__name__)`.
157+
158+
* On ImportError: warn “Sentience SDK not available…”
159+
* On snapshot failure: warn and include exception string.
160+
* On success: info “SentienceContext snapshot: X elements URL=…”
161+
162+
No prints.
163+
164+
### 6) Packaging / exports
165+
166+
* Export `SentienceContext` and `SentienceContextState` from `sentience/backends/__init__.py`
167+
* Keep browser-use as optional dependency (document usage in README; do not introduce mandatory dependency)
168+
* Ensure type hints don’t import browser-use at runtime.
169+
170+
### 7) Example usage snippet
171+
172+
Provide an example in docstring or comments:
173+
174+
```py
175+
from browser_use import Agent
176+
from sentience.backends import SentienceContext
177+
178+
ctx = SentienceContext(show_overlay=True)
179+
state = await ctx.build(agent.browser_session, goal="Click the first Show HN post")
180+
if state:
181+
agent.add_context(state.prompt_block) # or however browser-use injects state
182+
```
183+
184+
Do not modify browser-use repo. This is SDK-only.
185+
186+
---
187+
188+
## Deliverables
189+
190+
1. `sentience/backends/sentience_context.py` new module
191+
2. update `sentience/backends/__init__.py` exports
192+
3. Ensure it compiles and is formatted
193+
4. Keep behavior backwards compatible with existing compact line schema, but improve `role` and `ord_val` as above.
194+
195+
---
196+
197+
If you need to reference the baseline behavior, use the attached `state_injector.py` as the template.
198+
199+
---
200+
201+
If you want, I can also give you a short "README integration snippet" for browser-use users (the 5-line copy/paste install + usage) once Claude produces the code.
202+
203+
---
204+
205+
## Feasibility & Complexity Assessment
206+
207+
### Overall Verdict: ✅ FEASIBLE - Medium Complexity
208+
209+
**Estimated effort:** 2-4 hours for Python SDK
210+
211+
---
212+
213+
### Prerequisites Analysis
214+
215+
| Prerequisite | Status | Notes |
216+
|-------------|--------|-------|
217+
| `BrowserUseAdapter` exists | ✅ Ready | `sentience/backends/browser_use_adapter.py` - wraps CDP for browser-use |
218+
| `snapshot()` function exists | ✅ Ready | `sentience/backends/snapshot.py` - supports both extension and API modes |
219+
| `Element` model has ordinal fields | ✅ Ready | `doc_y`, `group_key`, `group_index`, `href`, `in_dominant_group` all present |
220+
| `Snapshot` model has `dominant_group_key` | ✅ Ready | Added in Phase 2 |
221+
| `SnapshotOptions` supports `goal` | ✅ Ready | Line 139 in models.py |
222+
| browser-use not a hard dependency | ✅ Ready | Already uses `TYPE_CHECKING` pattern |
223+
224+
---
225+
226+
### Complexity Breakdown by Requirement
227+
228+
| Requirement | Complexity | Rationale |
229+
|-------------|------------|-----------|
230+
| 1) Public API (`SentienceContext`, `SentienceContextState`) | 🟢 Low | Simple dataclass + class with `__init__` and `build()` |
231+
| 2) Snapshot acquisition | 🟢 Low | Reuse existing `BrowserUseAdapter` + `snapshot()` |
232+
| 3.1) Remove debug prints | 🟢 Low | Just don't add them |
233+
| 3.2) Role link-when-href | 🟢 Low | Simple conditional: `"link" if href else role` |
234+
| 3.3) Use `in_dominant_group` (fuzzy) | 🟢 Low | Field already exists from gateway |
235+
| 3.4) Fix `ord_val` (local rank computation) | 🟡 Medium | Need to sort dominant group elements locally and assign 0..n-1 |
236+
| 3.5) Short href token | 🟢 Low | URL parsing logic already in state_injector.py |
237+
| 4) Element selection (token slasher) | 🟡 Medium | 3-way selection + deduplication, but logic is clear |
238+
| 5) Logging | 🟢 Low | Standard `logging.getLogger(__name__)` |
239+
| 6) Packaging/exports | 🟢 Low | Add 2 lines to `__init__.py` |
240+
| 7) Example in docstring | 🟢 Low | Copy from design doc |
241+
242+
---
243+
244+
### Risk Areas
245+
246+
1. **`ord_val` local computation (Req 3.4)**: The design requires computing `rank_in_group` locally by sorting `in_dominant_group=True` elements. This is the right approach to fix the large `ord_val` issue, but requires careful implementation:
247+
- Sort key: `(doc_y or bbox.y, bbox.x, -importance)`
248+
- Must handle missing `doc_y` gracefully
249+
250+
2. **Retry logic**: The `build()` method has `retries` and `retry_delay_s` parameters. Need to implement exponential backoff or simple retry loop.
251+
252+
3. **Error handling**: Must catch exceptions from `snapshot()` and return `None` instead of propagating.
253+
254+
---
255+
256+
### Implementation Checklist
257+
258+
- [ ] Create `sentience/backends/sentience_context.py`
259+
- [ ] Update `sentience/backends/__init__.py` with exports
260+
- [ ] Test with browser-use locally
261+
262+
---
263+
264+
### Conclusion
265+
266+
This is a **well-scoped, medium-complexity task** with all prerequisites already in place. The main implementation work is:
267+
1. Element selection logic (3-way merge with deduplication)
268+
2. Local `rank_in_group` computation (sort + enumerate)
269+
3. Compact line formatting
270+
271+
No schema changes or gateway modifications required. The gateway fix for `is_content_like_element` (MIN_CONTENT_TEXT_LENGTH=5) has already been implemented, which should reduce the large `ord_val` issue at the source.

sentience/backends/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,7 @@
9797
)
9898
from .playwright_backend import PlaywrightBackend
9999
from .protocol_v0 import BrowserBackendV0, LayoutMetrics, ViewportInfo
100+
from .sentience_context import SentienceContext, SentienceContextState
100101
from .snapshot import CachedSnapshot, snapshot
101102

102103
__all__ = [
@@ -113,6 +114,9 @@
113114
# browser-use adapter
114115
"BrowserUseAdapter",
115116
"BrowserUseCDPTransport",
117+
# SentienceContext (Token-Slasher Context Middleware)
118+
"SentienceContext",
119+
"SentienceContextState",
116120
# Backend-agnostic functions
117121
"snapshot",
118122
"CachedSnapshot",

0 commit comments

Comments
 (0)