Skip to content

A codebase that enables websites to be **agentic-browser friendly**. BotSight crawls a webpage, validates the scrape, extracts structured data, generates metadata/snippets, and ships it as an npm module/snippet that site owners can embed. The goal is to make pages visible and actionable for AI agents and LLM crawlers.

License

Notifications You must be signed in to change notification settings

praneethramk/BotSight-CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BotSight

A codebase that enables websites to be agentic-browser friendly. BotSight crawls a webpage, validates the scrape, extracts structured data, generates metadata/snippets, and ships it as an npm module/snippet that site owners can embed. The goal is to make pages visible and actionable for AI agents and LLM crawlers.

πŸ“¦ Packages

This monorepo contains the following packages:

botsight-core

Core scraping, validation, and extraction logic with FireCrawl integration for enhanced data extraction.

botsight-cli

CLI tool to run BotSight locally.

botsight-npm

Reusable npm package snippet for websites.

botsight-dashboard

(Planned) Analytics and simulation dashboard.

botsight-server

Backend server with telemetry, configuration, and simulation endpoints.

botsight-snippet

Client-side snippet for easy website integration.

πŸš€ Getting Started

Prerequisites

  • Node.js >= 16
  • pnpm
  • PostgreSQL database
  • Redis server

Installation

pnpm install

Build

pnpm build

πŸ› οΈ Usage

CLI

# Initialize configuration
botsight init

# Crawl a website
botsight crawl https://example.com

# Validate a website
botsight validate https://example.com

# Generate snippet from JSON data
botsight generate data.json

NPM Package

npm install botsight-snippet
import { injectBotSight } from "botsight-snippet";

injectBotSight({
  name: "Client Site",
  url: "https://client.com",
  offers: [...]
});

Server

# Navigate to server directory
cd server

# Set environment variables
export DATABASE_URL=postgresql://user:password@localhost:5432/botsight_db
export REDIS_URL=redis://localhost:6379

# Run database migrations
npm run migrate

# Seed initial data
npm run seed

# Start the server
npm start

# In separate terminals, start worker and agent sync
npm run worker
npm run sync-agents

Client-side Snippet

Add this to your website's HTML:

<script 
    src="https://your-server.com/botsight.iife.js" 
    data-site-id="your-site-id" 
    async>
</script>

πŸ—οΈ Architecture

β”œβ”€β”€ botsight-core/          # Core logic with FireCrawl integration
β”‚   β”œβ”€β”€ crawler.ts          # Page crawling (static + dynamic + FireCrawl)
β”‚   β”œβ”€β”€ validator.ts        # Scrape validation
β”‚   β”œβ”€β”€ extractor.ts        # Structured data extraction
β”‚   └── snippet-generator.ts # Snippet generation
β”œβ”€β”€ botsight-cli/           # CLI interface
β”‚   └── cli.ts              # Command definitions
β”œβ”€β”€ botsight-npm/           # Embeddable snippet
β”‚   └── index.ts            # Browser-compatible JS
β”œβ”€β”€ botsight-server/        # Backend server
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ routes/         # API endpoints
β”‚   β”‚   β”œβ”€β”€ workers/        # Playwright simulation worker
β”‚   β”‚   β”œβ”€β”€ jobs/           # Agent sync job
β”‚   β”‚   └── db/             # Database connection
β”‚   └── db/ddl/             # Database schema
β”œβ”€β”€ botsight-snippet/       # Client-side snippet
β”‚   └── src/
└── botsight-dashboard/     # (Future) Analytics dashboard

🎯 Features

  • Enhanced Page Crawling: Static + dynamic rendering with Playwright and FireCrawl integration
  • Scrape Validation: Confidence scoring and missing element detection
  • Structured Data Extraction: JSON-LD, OpenGraph, Twitter Cards, etc.
  • Snippet Generation: Agent-readable structured data with BotSight-specific metadata
  • NPM Package: One-line integration for site owners
  • CLI Tool: Easy local testing and onboarding
  • Auto Agent Detection: Automatically detect new AI agents via user-agent database
  • Site-owner Analytics: Track which agents visited and what they extracted
  • Dashboard Simulation: See how GPT/Perplexity and other agents view your page
  • Production Ready: Security, privacy, and performance optimized

πŸ“ˆ Backend Features

Telemetry Endpoint

  • POST /v1/telemetry - Accepts telemetry data from client snippets
  • Automatically matches user agents to known AI agents
  • Stores visit data and extracted fields for analytics

Configuration Endpoint

  • GET /v1/config/:siteId - Returns site configuration for snippets

Simulation Endpoint

  • POST /v1/simulate - Enqueues Playwright jobs to simulate how agents see pages
  • Captures JSON-LD, meta tags, content, and screenshots

Agent Management

  • Automatic sync with remote agents database
  • Unknown agent detection and candidate review system

πŸ“Š Analytics & Monitoring

  • Comprehensive database schema for tracking visits, agents, and extracted data
  • Example SQL queries for insights and reporting
  • Dashboard options for real-time monitoring
  • Performance and security optimized

πŸ“„ Documentation

πŸ“„ License

MIT

About

A codebase that enables websites to be **agentic-browser friendly**. BotSight crawls a webpage, validates the scrape, extracts structured data, generates metadata/snippets, and ships it as an npm module/snippet that site owners can embed. The goal is to make pages visible and actionable for AI agents and LLM crawlers.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published