Skip to content

Reference demos comparing semantic geometry grounding vs vision-based browser automation.

Notifications You must be signed in to change notification settings

SentienceAPI/sentience-sdk-playground

Repository files navigation

🤖 Sentience SDK Playground

Reproducible demos showing how structure-first browser agents outperform vision-only agents.

This repository contains 8 real-world browser agent demos that run using:

  • Semantic geometry snapshots (DOM-based, not vision)
  • Jest-style AgentRuntime assertions
  • 6 of these 8 demos use local-first inference (Qwen 2.5 3B)
  • amazon_shopping and google_search use cloud LLM models for comparison
  • Optional vision fallback only after exhaustion

TL;DR

  • ✅ 100% task success across all demos
  • 💸 ~50% lower token usage per step
  • 🧠 Works with small local models (3B–7B)
  • ❌ Vision-only agents fail systematically on the same tasks

🎯 What This Repo Is

This is a playground + benchmark for developers evaluating:

  • browser agents
  • local LLM execution
  • deterministic web automation
  • flaky UI handling
  • assertion-driven verification

Each demo includes:

  • runnable code
  • logs
  • screenshots
  • optional video artifacts
  • token accounting

🧪 Canonical Demos (Start Here)

🥇 Demo 1: News List Skimming (Hacker News)

Task Open the top "Show HN" post deterministically.

Why it matters This tests ordinal reasoning ("first", "top") — a known weakness of vision agents.

Config

  • Model: Qwen 2.5 3B (local)
  • Vision: Disabled
  • Assertions: ordinal=first, url_contains
  • Tokens: ~1.6k per step

Result ✅ PASS — zero retries, deterministic

Demo Screenshot

📂 news_list_skimming/ | 📹 Video


🥈 Demo 2: Login + Profile Check (Local Llama Land)

Task Log in, wait for async hydration, verify profile state.

Why it matters Shows state-aware assertions (enabled, visible, value_equals) on a modern SPA.

Config

  • Model: Qwen 2.5 3B (local)
  • Vision: Disabled
  • Assertions: eventually(), is_enabled, text_contains
  • Handles delayed hydration + dynamic state

Result ✅ PASS — no sleeps, no magic waits

Demo Screenshot

📂 login_profile_check/ | 📹 Video


🥉 Demo 3: Amazon Shopping Flow (Stress Test)

Task Search product → open result → add to cart.

Why it matters High-noise, JS-heavy, real production site.

Config

  • Model: Qwen 2.5 3B (local)
  • Vision: Disabled (fallback optional)
  • Assertions: navigation, button state, success banner
  • Tokens: ~5.5k total

Result ✅ PASS — vision-only agents failed 3/3 runs

Demo Screenshot

📂 amazon_shopping_with_assertions/ | 📹 Video


📊 Key Results (Across All Demos)

Metric Vision-Only Sentience SDK
Task success ❌ 0–30% ✅ 100%
Avg tokens / step ~3,000+ ~1,500
Vision usage Required Optional fallback
Determinism No Yes
Local model viable No Yes (3B–7B)

🧠 Why This Works

Vision agents reason from pixels. Sentience agents reason from structure.

Snapshots provide:

  • semantic roles
  • ordinality
  • grouping
  • state (enabled, checked, expanded)
  • confidence diagnostics

Assertions verify outcomes — not guesses.

Why Compact Prompts + Local LLMs Work Well

The demo suite consistently succeeds with a small local model (Qwen2.5 3B) using compact, structured prompts:

  • Token efficiency: ~14.9K tokens across 5 demos vs 100K+ for vision-heavy approaches
  • Reliability: 5/5 PASS with 0 retries across multi-step flows
  • Speed: Local text models are faster than vision LLMs for structured UI tasks

See docs/DEMO_REPORTS.md for full metrics and results.


🚀 Quick Start

git clone https://github.com/SentienceAPI/sentience-sdk-playground
cd sentience-sdk-playground
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
pip install sentienceapi
playwright install chromium

Run a demo:

cd news_list_skimming
python main.py

📁 Repo Structure

news_list_skimming/              # Ordinality + list reasoning
amazon_shopping_with_assertions/ # Real-world stress test
login_profile_check/             # SPA + form + login flows
dashboard_kpi_extraction/        # KPI extraction + DOM churn
form_validation_submission/      # Multi-step form validation
local-llama-land/               # Demo Next.js site (SPA)
docs/                           # Reports, plans, comparisons

🔗 Learn More


🎓 Takeaway

Structure replaces vision. Assertions replace retries. Small models become viable.

This repo shows that clearly — with real logs, real sites, real results.


📚 Additional Demos

Dashboard KPI Extraction

Task: Extract KPIs from dynamic dashboard with DOM churn resilience.

Demo Screenshot

📂 dashboard_kpi_extraction/ | 📹 Video

Form Validation + Submission

Task: Complete multi-step form with validation at each step.

📂 form_validation_submission/ | 📹 Video (screenshots generated locally after running)

See docs/DEMO_REPORTS.md for detailed execution reports and metrics.

About

Reference demos comparing semantic geometry grounding vs vision-based browser automation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published