Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,115 @@ All notable changes to `@predicatelabs/sdk` will be documented in this file.

## Unreleased

### 2026-02-15

#### PredicateBrowserAgent (snapshot-first, verification-first)

`PredicateBrowserAgent` is a new high-level agent wrapper that gives you a **browser-use-like** `step()` / `run()` surface, but keeps Predicate’s core philosophy:

- **Snapshot-first perception** (structured DOM snapshot is the default)
- **Verification-first control plane** (you can gate progress with deterministic checks)
- Optional **vision fallback** (bounded) when snapshots aren’t sufficient

It’s built on top of `AgentRuntime` + `RuntimeAgent`.

##### Quickstart (single step)

```ts
import {
AgentRuntime,
PredicateBrowserAgent,
type RuntimeStep,
LocalLLMProvider, // or OpenAIProvider / AnthropicProvider / DeepInfraProvider
} from '@predicatelabs/sdk';

const runtime = new AgentRuntime(browserLike, page, tracer);
const llm = new LocalLLMProvider({ model: 'qwen2.5:7b', baseUrl: 'http://localhost:11434/v1' });

const agent = new PredicateBrowserAgent({
runtime,
executor: llm,
config: {
// Token control: include last N step summaries in the prompt (0 disables history).
historyLastN: 2,
},
});

const ok = await agent.step({
taskGoal: 'Find pricing and verify checkout button exists',
step: { goal: 'Open pricing page' } satisfies RuntimeStep,
});
```

##### Customize the compact prompt (advanced)

```ts
const agent = new PredicateBrowserAgent({
runtime,
executor: llm,
config: {
compactPromptBuilder: (_taskGoal, _stepGoal, domContext, _snap, historySummary) => ({
systemPrompt:
'You are a web automation agent. Return ONLY one action: CLICK(id) | TYPE(id,"text") | PRESS("key") | FINISH()',
userPrompt: `RECENT:\n${historySummary}\n\nELEMENTS:\n${domContext}\n\nReturn the single best action:`,
}),
},
});
```

##### CAPTCHA handling (interface-only; no solver shipped)

If you set `captcha.policy="callback"`, you must provide a handler. The SDK does **not** include a public CAPTCHA solver.

```ts
import { HumanHandoffSolver } from '@predicatelabs/sdk';

const agent = new PredicateBrowserAgent({
runtime,
executor: llm,
config: {
captcha: {
policy: 'callback',
// Manual solve in the live session; SDK waits until it clears:
handler: HumanHandoffSolver({ timeoutMs: 10 * 60_000, pollMs: 1_000 }),
},
},
});
```

#### RuntimeAgent: structured prompt override hooks

`RuntimeAgent` now supports optional hooks used by `PredicateBrowserAgent`:

- `structuredPromptBuilder(...)`
- `domContextPostprocessor(...)`
- `historySummaryProvider(...)`

#### PredicateBrowserAgent: opt-in token usage accounting (best-effort)

If you want to measure token spend, you can enable best-effort accounting (depends on provider reporting token counts):

```ts
const agent = new PredicateBrowserAgent({
runtime,
executor: llm,
config: {
tokenUsageEnabled: true,
},
});

const usage = agent.getTokenUsage();
agent.resetTokenUsage();
```

#### RuntimeAgent: actOnce without step lifecycle (orchestrators)

`RuntimeAgent` now exposes `actOnce(...)` helpers that execute exactly one action **without** calling `runtime.beginStep()` / `runtime.emitStepEnd()`. This is intended for external orchestrators (e.g. WebBench) that already own step lifecycle and just want the SDK’s snapshot-first propose+execute block.

- `await agent.actOnce(...) -> string`
- `await agent.actOnceWithSnapshot(...) -> { action, snap }`
- `await agent.actOnceResult(...) -> { action, snap, usedVision }`

### 2026-02-13

#### Expanded deterministic verifications (adaptive resnapshotting)
Expand Down
6 changes: 6 additions & 0 deletions examples/agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Predicate agent examples.

- `predicate-browser-agent-minimal.ts`: minimal `PredicateBrowserAgent` usage.
- `predicate-browser-agent-custom-prompt.ts`: customize the compact prompt builder.
- `predicate-browser-agent-video-recording-playwright.ts`: enable Playwright video recording via context options (recommended).

114 changes: 114 additions & 0 deletions examples/agent/predicate-browser-agent-custom-prompt.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
/**
* Example: PredicateBrowserAgent with compact prompt customization.
*
* Usage:
* ts-node examples/agent/predicate-browser-agent-custom-prompt.ts
*/

import { Page } from 'playwright';
import {
AgentRuntime,
PredicateBrowserAgent,
type PredicateBrowserAgentConfig,
RuntimeStep,
SentienceBrowser,
} from '../../src';
import { createTracer } from '../../src/tracing/tracer-factory';
import { LLMProvider, type LLMResponse } from '../../src/llm-provider';
import type { Snapshot } from '../../src/types';

function createBrowserAdapter(browser: SentienceBrowser) {
return {
snapshot: async (_page: Page, options?: Record<string, any>): Promise<Snapshot> => {
return await browser.snapshot(options);
},
};
}

class RecordingProvider extends LLMProvider {
public lastSystem: string | null = null;
public lastUser: string | null = null;

constructor(private action: string = 'FINISH()') {
super();
}

get modelName(): string {
return 'recording-provider';
}
supportsJsonMode(): boolean {
return false;
}
async generate(
systemPrompt: string,
userPrompt: string,
_options: Record<string, any> = {}
): Promise<LLMResponse> {
this.lastSystem = systemPrompt;
this.lastUser = userPrompt;
return { content: this.action, modelName: this.modelName };
}
}

const config: PredicateBrowserAgentConfig = {
historyLastN: 2,
compactPromptBuilder: (
taskGoal: string,
stepGoal: string,
domContext: string,
_snap: Snapshot,
historySummary: string
) => {
const systemPrompt =
'You are a web automation executor. Return ONLY ONE action: CLICK(id) | TYPE(id,"text") | PRESS("key") | FINISH(). No prose.';
const userPrompt =
`TASK GOAL:\n${taskGoal}\n\n` +
(historySummary ? `RECENT STEPS:\n${historySummary}\n\n` : '') +
`STEP GOAL:\n${stepGoal}\n\n` +
`DOM CONTEXT:\n${domContext.slice(0, 4000)}\n`;
return { systemPrompt, userPrompt };
},
};

async function main() {
const apiKey = (process.env.PREDICATE_API_KEY ||
process.env.SENTIENCE_API_KEY) as string | undefined;
if (!apiKey) {
console.error('Error: PREDICATE_API_KEY or SENTIENCE_API_KEY not set');
process.exit(1);
}

const runId = 'predicate-browser-agent-custom-prompt';
const tracer = await createTracer({ apiKey, runId, uploadTrace: false });

const browser = new SentienceBrowser(apiKey, undefined, false);
await browser.start();
const page = browser.getPage();

try {
await page.goto('https://example.com');
await page.waitForLoadState('networkidle');

const runtime = new AgentRuntime(createBrowserAdapter(browser), page, tracer);
const executor = new RecordingProvider('FINISH()');

const agent = new PredicateBrowserAgent({ runtime, executor, config });

const out = await agent.step({
taskGoal: 'Open example.com',
step: { goal: 'Take no action; just finish' } satisfies RuntimeStep,
});

console.log(`step ok: ${out.ok}`);
console.log('--- prompt preview (system) ---');
console.log((executor.lastSystem || '').slice(0, 300));
console.log('--- prompt preview (user) ---');
console.log((executor.lastUser || '').slice(0, 300));
} finally {
await tracer.close(true);
await browser.close();
}
}

main().catch(console.error);

108 changes: 108 additions & 0 deletions examples/agent/predicate-browser-agent-minimal.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
/**
* Example: PredicateBrowserAgent minimal demo.
*
* Usage:
* ts-node examples/agent/predicate-browser-agent-minimal.ts
*
* Requires:
* - PREDICATE_API_KEY or SENTIENCE_API_KEY (SentienceBrowser API key)
*/

import { Page } from 'playwright';
import {
AgentRuntime,
PredicateBrowserAgent,
type PredicateBrowserAgentConfig,
RuntimeStep,
StepVerification,
SentienceBrowser,
exists,
urlContains,
} from '../../src';
import { createTracer } from '../../src/tracing/tracer-factory';
import { LLMProvider, type LLMResponse } from '../../src/llm-provider';
import type { Snapshot } from '../../src/types';

function createBrowserAdapter(browser: SentienceBrowser) {
return {
snapshot: async (_page: Page, options?: Record<string, any>): Promise<Snapshot> => {
return await browser.snapshot(options);
},
};
}

class FixedActionProvider extends LLMProvider {
constructor(private action: string) {
super();
}
get modelName(): string {
return 'fixed-action';
}
supportsJsonMode(): boolean {
return false;
}
async generate(
_systemPrompt: string,
_userPrompt: string,
_options: Record<string, any> = {}
): Promise<LLMResponse> {
return { content: this.action, modelName: this.modelName };
}
}

async function main() {
const apiKey = (process.env.PREDICATE_API_KEY ||
process.env.SENTIENCE_API_KEY) as string | undefined;
if (!apiKey) {
console.error('Error: PREDICATE_API_KEY or SENTIENCE_API_KEY not set');
process.exit(1);
}

const runId = 'predicate-browser-agent-minimal';
const tracer = await createTracer({ apiKey, runId, uploadTrace: false });

const browser = new SentienceBrowser(apiKey, undefined, false);
await browser.start();
const page = browser.getPage();

try {
await page.goto('https://example.com');
await page.waitForLoadState('networkidle');

const runtime = new AgentRuntime(createBrowserAdapter(browser), page, tracer);

const executor = new FixedActionProvider('FINISH()');
const config: PredicateBrowserAgentConfig = { historyLastN: 2 };

const agent = new PredicateBrowserAgent({ runtime, executor, config });

const steps: RuntimeStep[] = [
{
goal: 'Verify Example Domain is loaded',
verifications: [
{
predicate: urlContains('example.com'),
label: 'url_contains_example',
required: true,
} satisfies StepVerification,
{
predicate: exists('role=heading'),
label: 'has_heading',
required: true,
} satisfies StepVerification,
],
maxSnapshotAttempts: 2,
snapshotLimitBase: 60,
},
];

const ok = await agent.run({ taskGoal: 'Open example.com and verify', steps });
console.log(`run ok: ${ok}`);
} finally {
await tracer.close(true);
await browser.close();
}
}

main().catch(console.error);

Loading
Loading