Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 4 additions & 14 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ src/
index.ts # Plugin entry point — registers apify + CLI
apify-client.ts # Shared Apify client factory, config helpers
cli.ts # openclaw apify setup|status|test commands
util.ts # Inlined utilities (not exported by openclaw/plugin-sdk)
util.ts # Inlined utilities: ToolInputError, normalizeSecretInput, wrapExternalContent
tools/
apify-scraper-tool.ts # Universal scraper — discover + start + collect + cached_runs
apify-scraper-tool.ts # Universal scraper — discover + start + collect
test/
helpers.ts # makeMockFetch, standardRunResponses, TEST_CONFIG
apify-scraper.test.ts # Tool tests
Expand All @@ -44,23 +44,15 @@ Single tool with 3 actions:
The tool description includes instructions for the agent:
- **Sub-agent delegation:** Tool should be used by a sub-agent that returns only relevant extracted data, not raw dumps.
- **Batching:** Batch multiple URLs into a single run (e.g. `startUrls: [{url: "..."}, ...]`).
- **Caching:** Every response auto-includes a `previousRuns` field with a compact summary of cached scrape results. The agent should evaluate this before starting new runs.
- **Known actors:** Compact comma-separated list of 57 actors across Instagram, Facebook, TikTok, YouTube, Google Maps, and more.
- **Support:** Directs users to integrations@apify.com for issues.

### Cache Architecture

- **In-memory `Map<string, CacheEntry>`** keyed by `apify-scraper:run:<runId>`.
- Default TTL: 15 minutes. Configurable via `cacheTtlMinutes`. Max 100 entries (LRU eviction).
- At collect time, the original run input is fetched from `keyValueStore(kvStoreId).getRecord("INPUT")` in parallel with dataset items and stored in the cache payload.
- **Auto-injected `previousRuns`:** Every tool response includes a compact summary of cached runs (actor, result count, input, run/dataset IDs, expiry). Expired entries are auto-purged. Last 10 entries shown.

## Key Architecture Decisions

- **Single tool, multiple actions:** All scraping goes through `apify` with `discover`/`start`/`collect` actions.
- **Async two-phase pattern:** `start` returns immediately with run references. `collect` polls and fetches results. The agent does other work between calls.
- **`apify-client` SDK:** Uses the official `apify-client` npm package (not raw HTTP). Client created via `createApifyClient(apiKey, baseUrl)`.
- **Inlined utilities (`util.ts`):** `ToolInputError`, cache helpers, and `wrapExternalContent` are NOT exported from `openclaw/plugin-sdk`. We carry local copies.
- **Inlined utilities (`util.ts`):** `ToolInputError`, `normalizeSecretInput`, and `wrapExternalContent` are NOT exported from `openclaw/plugin-sdk`. We carry local copies.
- **No build step:** OpenClaw loads plugins via `jiti` (TypeScript JIT). We ship `.ts` source directly.
- **No skills:** Skills were removed — the tool description and `discover` action provide all needed guidance.

Expand All @@ -87,7 +79,7 @@ The wizard merges safely: preserves existing config, adds to `tools.alsoAllow` w
- **Type-check:** `npx tsc --noEmit`
- **Test:** `npx vitest run`
- **Pack (dry run):** `npm pack --dry-run`
- **Current state:** 1 test file, 12 tests passing.
- **Current state:** 1 test file, 10 tests passing.

## Coding Style

Expand Down Expand Up @@ -160,7 +152,6 @@ Tool names that collide with core tool names are silently dropped. Plugin tools
config: {
apiKey: "apify_api_...", // or use APIFY_API_KEY env var
baseUrl: "https://api.apify.com",
cacheTtlMinutes: 15,
maxResults: 20,
enabledTools: [], // empty = all tools enabled
},
Expand Down Expand Up @@ -188,7 +179,6 @@ All scraped data is **untrusted external content**. The `wrapExternalContent(con
- **API keys:** Resolved from plugin config `apiKey` or `APIFY_API_KEY` env var. Never logged or included in tool output.
- **Base URL validation:** Only `https://api.apify.com` prefix allowed. Rejects other URLs to prevent SSRF.
- **External content wrapping:** All scraped results wrapped with untrusted content markers.
- **HTTP timeout:** 30s per request via `AbortSignal`.

---

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Apify Plugin for OpenClaw

Universal web scraping and data extraction via [Apify](https://apify.com) — 57+ Actors across Instagram, Facebook, TikTok, YouTube, Google Maps, Google Search, e-commerce, and more.
Universal web scraping and data extraction via [Apify](https://apify.com) — 20k+ Actors across Instagram, Facebook, TikTok, YouTube, Google Maps, Google Search, e-commerce, and more.

## Install

Expand Down Expand Up @@ -77,7 +77,7 @@ Actor IDs use the `username~actor-name` format (tilde separator, not slash).

### Known Actors

The tool description includes 57+ known Actors across these categories:
The tool description includes 20k+ Actors across these categories:

- **Instagram** — profiles, posts, comments, hashtags, reels, search, followers, tagged posts
- **Facebook** — pages, posts, comments, likes, reviews, groups, events, ads, reels, photos, marketplace
Expand Down
6 changes: 6 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,12 @@
"vitest": "^3.0.0"
},
"openclaw": {
"id": "apify",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: builtWithOpenClawVersion is hardcoded to 2026.2.19. Should this be automated or updated before each release?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep manual for now. When we build a publish pipe we can handle it there.
Added this info to ticket here: #8

I suggest when we have the release pipeline, we add a check that fails if the openclaw version is not the latest version in package.json. We should test that our plugin works for the latest version anyways

"compat": {
"pluginApi": ">=1.0.0",
"builtWithOpenClawVersion": "2026.2.19",
"pluginSdkVersion": "2026.2.19"
},
"extensions": [
"./src/index.ts"
]
Expand Down
29 changes: 9 additions & 20 deletions src/cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,17 +41,17 @@ export function registerCli(api: OpenClawPluginApi): void {
apify
.command("setup")
.description("Interactive setup wizard for the Apify plugin")
.action(() => runSetupCommand(api));
.action(async () => runSetupCommand(api));

apify
.command("status")
.description("Show current Apify plugin configuration status")
.action(() => runStatusCommand(api));
.description("Show Apify plugin configuration and test API connection")
.action(async () => runStatusCommand(api));

apify
Comment thread
protoss70 marked this conversation as resolved.
.command("test")
.description("Test the Apify API connection")
.action(() => runTestCommand(api));
.description("Test Apify API connection")
.action(async () => runStatusCommand(api));
},
{ commands: ["apify"] },
);
Expand Down Expand Up @@ -252,7 +252,7 @@ async function runSetupCommand(api: OpenClawPluginApi): Promise<void> {
// status command
// ---------------------------------------------------------------------------

function runStatusCommand(api: OpenClawPluginApi): void {
async function runStatusCommand(api: OpenClawPluginApi): Promise<void> {
Comment thread
protoss70 marked this conversation as resolved.
const config = (api.pluginConfig ?? {}) as Record<string, unknown>;
const apiKey = getApiKey(api);
const baseUrl = getBaseUrl(api);
Expand All @@ -268,26 +268,15 @@ function runStatusCommand(api: OpenClawPluginApi): void {
: "all (no restriction)";
console.log(` Tools: ${enabledTools}`);
console.log(` Plugin: ${config.enabled === false ? "disabled" : "enabled (when API key is set)"}`);
console.log();
}

// ---------------------------------------------------------------------------
// test command
// ---------------------------------------------------------------------------

async function runTestCommand(api: OpenClawPluginApi): Promise<void> {
const apiKey = getApiKey(api);
const baseUrl = getBaseUrl(api);

console.log("\n=== Testing Apify API Connection ===\n");

// Connection test
if (!apiKey) {
console.log(" ✗ Cannot test: API key not configured.");
console.log(`\n ✗ Cannot test connection: API key not configured.`);
console.log(" Run 'openclaw apify setup' to configure.\n");
return;
}

process.stdout.write(" Connecting… ");
process.stdout.write("\n Testing connection… ");

try {
const client = createApifyClient(apiKey, baseUrl);
Expand Down
Loading