Queue System

Optional. Use when the user explicitly requests it. Always specify --project.

Commands

Add:

./scrapai queue add <url> --project NAME [-m "instruction"] [--priority N]

Bulk add:

./scrapai queue bulk <file.csv|file.json> --project NAME [--priority N]

List:

./scrapai queue list --project NAME                          # 5 pending/processing (default)
./scrapai queue list --project NAME --limit 20               # more items
./scrapai queue list --project NAME --all --limit 50         # include completed/failed
./scrapai queue list --project NAME --status pending         # filter by status
./scrapai queue list --project NAME --count                  # just the count
./scrapai queue list --project NAME --status failed --count  # count by status

Claim next:

./scrapai queue next --project NAME

Update status (ID is globally unique, no --project needed):

./scrapai queue complete <id>
./scrapai queue fail <id> [-m "error message"]
./scrapai queue retry <id>
./scrapai queue remove <id>

Cleanup:

./scrapai queue cleanup --completed --force --project NAME
./scrapai queue cleanup --failed --force --project NAME
./scrapai queue cleanup --all --force --project NAME

Bulk File Formats

JSON format (see templates/queue-template.json):

[
  {"url": "https://site1.com", "custom_instruction": "Focus on research articles", "priority": 5},
  {"url": "https://site2.com", "priority": 10},
  {"url": "https://site3.com", "custom_instruction": "Include all news sections", "priority": 1}
]

CSV format (see templates/queue-template.csv):

url,custom_instruction,priority
https://site1.com,Focus on research articles,5
https://site2.com,,10
https://site3.com,Include all news sections,1

Required columns:

url - Website URL (required)
custom_instruction - Special instructions for this site (optional)
priority - Higher number = higher priority (optional, default: 0)

Queue Processing Workflow

./scrapai queue next --project NAME — claim next item
Note: ID, URL, project, custom_instruction
If custom_instruction exists: use it to override CLAUDE.md defaults
Run Phases 1-4. Include "source_url": "<queue_url>" in final_spider.json.
Import spider with --project matching queue project.
On success: ./scrapai queue complete <id>
On failure: ./scrapai queue fail <id> -m "error description"

Custom Instruction Handling

When queue next returns a custom_instruction, use it to override CLAUDE.md defaults during analysis. Example: instruction "Focus only on research articles" → only create rules for research article URL patterns, skip other content sections.

Direct vs Queue

User says "Add this website: X" → process immediately (Phases 1-4), no queue
User says "Add X to the queue" → queue add only, do NOT process
User says "Process next" → queue next then Phases 1-4

Project Isolation

NEVER omit --project when importing spiders from queue. Without it, spider defaults to "default" project, mixing data across projects. Always match the queue item's project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queue System

Commands

Bulk File Formats

Queue Processing Workflow

Custom Instruction Handling

Direct vs Queue

Project Isolation

FilesExpand file tree

queue.md

Latest commit

History

queue.md

File metadata and controls

Queue System

Commands

Bulk File Formats

Queue Processing Workflow

Custom Instruction Handling

Direct vs Queue

Project Isolation