Skip to content

brightdata/sdk-js

Repository files navigation

sdk-banner(1)

Bright Data JavaScript SDK providing easy and scalable methods for scraping, web search, datasets, and more.

Installation @latest

npm install @brightdata/sdk

Quick start

1. Signup and get your API token

2. Initialize the client

import { bdclient } from '@brightdata/sdk';

const client = new bdclient({
    apiKey: '[your_api_token]', // or set BRIGHTDATA_API_TOKEN env variable
});

3. Launch your first request

// Scrape a webpage
const html = await client.scrapeUrl('https://example.com');
console.log(html);

// Search the web
const results = await client.search.google('pizza restaurants');
console.log(results);

Don't forget to close when done:

await client.close();

Features

  • Web Scraping — Scrape any website using anti-bot detection bypass and proxy support
  • Search Engine Results — Google, Bing, and Yandex search with batch support
  • Platform Scrapers — Structured data collection from LinkedIn, Amazon, Instagram, TikTok, YouTube, Reddit, and more
  • Datasets — Access 19 pre-built datasets (Amazon, LinkedIn, Instagram, TikTok, X/Twitter) with query/download support
  • Parallel Processing — Concurrent processing for multiple URLs or queries
  • Robust Error Handling — Typed error classes with retry logic
  • Zone Management — Automatic zone creation and management
  • Multiple Output Formats — HTML, JSON, Markdown, and screenshots
  • Dual Build — Both ESM and CommonJS supported
  • TypeScript — Fully typed API with overloaded signatures
  • Subpath Exports — Tree-shakeable imports via @brightdata/sdk/scrapers, @brightdata/sdk/search, @brightdata/sdk/datasets

Usage

Scrape websites

// Single URL — returns HTML string by default
const html = await client.scrapeUrl('https://example.com');

// Multiple URLs (parallel processing)
const results = await client.scrapeUrl([
    'https://example1.com',
    'https://example2.com',
]);

// Get markdown content
const md = await client.scrapeUrl('https://example.com', {
    dataFormat: 'markdown',
});

// Get structured JSON
const data = await client.scrapeUrl('https://example.com', {
    format: 'json',
});

// Take a screenshot
const screenshot = await client.scrapeUrl('https://example.com', {
    dataFormat: 'screenshot',
});

// Full options
const result = await client.scrapeUrl('https://example.com', {
    format: 'raw',            // 'raw' (default) or 'json'
    dataFormat: 'html',       // 'html' (default), 'markdown', 'screenshot'
    country: 'gb',            // two-letter country code
    method: 'GET',            // HTTP method
});

Search engines

// Google search
const results = await client.search.google('pizza restaurants');

// Bing search
const results = await client.search.bing('pizza restaurants');

// Yandex search
const results = await client.search.yandex('pizza restaurants');

// Batch search (parallel)
const results = await client.search.google(['pizza', 'sushi', 'tacos']);

// With options
const results = await client.search.google('pizza', {
    country: 'gb',
    format: 'json',
});

Platform scrapers

Collect structured data from popular platforms. Each platform supports sync collection (collect*) and async orchestrated scraping (trigger, poll, download).

// LinkedIn profiles
const data = await client.scrape.linkedin.collectProfiles(
    ['https://www.linkedin.com/in/satyanadella/'],
    { format: 'json' },
);

// Amazon products
const data = await client.scrape.amazon.collectProducts(
    ['https://www.amazon.com/dp/B0D77BX8Y4'],
    { format: 'json' },
);

// Instagram profiles
const data = await client.scrape.instagram.collectProfiles(
    ['https://www.instagram.com/natgeo/'],
    { format: 'json' },
);

// TikTok profiles
const data = await client.scrape.tiktok.collectProfiles(
    ['https://www.tiktok.com/@tiktok'],
    { format: 'json' },
);

// YouTube videos
const data = await client.scrape.youtube.collectVideos(
    ['https://www.youtube.com/watch?v=dQw4w9WgXcQ'],
    { format: 'json' },
);

// Reddit posts
const data = await client.scrape.reddit.collectPosts(
    ['https://www.reddit.com/r/technology/top/'],
    { format: 'json' },
);

Orchestrated scraping (async trigger → poll → download):

const result = await client.scrape.linkedin.profiles(
    ['https://www.linkedin.com/in/satyanadella/'],
    { pollInterval: 5000, pollTimeout: 180_000 },
);
console.log(result.data);    // structured data
console.log(result.status);  // 'ready'
console.log(result.rowCount);

Available platforms: linkedin, amazon, instagram, tiktok, youtube, reddit, facebook, chatGPT, digikey, perplexity

Datasets

Access pre-built datasets for querying and downloading structured data snapshots.

const ds = client.datasets;

// List all datasets available on your account
const list = await ds.list();

// Get field metadata for a dataset
const meta = await ds.instagramProfiles.getMetadata();
console.log(meta.fields); // [{ name, type, description }, ...]

// Query a dataset (triggers a snapshot)
const snapshotId = await ds.instagramProfiles.query(
    { url: 'https://www.instagram.com/natgeo/' },
    { records_limit: 10 },
);

// Check snapshot status
const status = await ds.instagramProfiles.getStatus(snapshotId);
console.log(status.status); // 'running' | 'ready' | ...

// Download when ready
const rows = await ds.instagramProfiles.download(snapshotId);

Available datasets:

Platform Datasets
LinkedIn linkedinProfiles, linkedinCompanies
Amazon amazonProducts, amazonReviews, amazonSellers, amazonBestSellers, amazonProductsSearch, amazonProductsGlobal, amazonWalmart
Instagram instagramProfiles, instagramPosts, instagramComments, instagramReels
TikTok tiktokProfiles, tiktokPosts, tiktokComments, tiktokShop
X/Twitter xTwitterPosts, xTwitterProfiles

Saving results

const data = await client.scrapeUrl('https://example.com');
const filePath = await client.saveResults(data, {
    filename: 'results.json',
    format: 'json',
});
console.log(`Saved to: ${filePath}`);

Configuration

API Token

Get your API token from Bright Data Control Panel.

Environment Variables

BRIGHTDATA_API_TOKEN=your_api_token
BRIGHTDATA_WEB_UNLOCKER_ZONE=your_web_unlocker_zone  # Optional
BRIGHTDATA_SERP_ZONE=your_serp_zone                  # Optional
BRIGHTDATA_VERBOSE=1                                  # Optional, enable verbose logging

Client Options

const client = new bdclient({
    apiKey: 'string',           // API token (or use BRIGHTDATA_API_TOKEN env var)
    timeout: 120000,            // Request timeout in ms (1000–300000)
    autoCreateZones: true,      // Auto-create zones if they don't exist
    webUnlockerZone: 'string',  // Custom web unlocker zone name
    serpZone: 'string',         // Custom SERP zone name
    logLevel: 'INFO',           // 'DEBUG' | 'INFO' | 'WARNING' | 'ERROR' | 'CRITICAL'
    structuredLogging: true,    // Use structured JSON logging
    verbose: false,             // Enable verbose logging
    rateLimit: 0,               // Max requests per period (0 = unlimited)
    ratePeriod: 1000,           // Rate limit period in ms
});

Resource Cleanup

The client maintains HTTP connections. Always close when done:

await client.close();

// Or use Symbol.asyncDispose (TypeScript 5.2+)
await using client = new bdclient();

Constants

Constant Default Description
DEFAULT_CONCURRENCY 10 Max parallel tasks
DEFAULT_TIMEOUT 120000 Request timeout (milliseconds)
MAX_RETRIES 3 Retry attempts on failure
RETRY_BACKOFF_FACTOR 1.5 Exponential backoff multiplier

Zone Management

const zones = await client.listZones();
console.log(`Found ${zones.length} zones`);

Subpath Exports

For tree-shaking or importing only what you need:

import { ScrapeRouter, LinkedinAPI } from '@brightdata/sdk/scrapers';
import { SearchRouter } from '@brightdata/sdk/search';
import { DatasetsClient, BaseDataset } from '@brightdata/sdk/datasets';

Error Handling

The SDK exports typed error classes that extend BRDError:

import { bdclient, ValidationError, AuthenticationError, BRDError } from '@brightdata/sdk';

try {
    const result = await client.scrapeUrl('https://example.com');
} catch (error) {
    if (error instanceof ValidationError) {
        console.error('Invalid input:', error.message);
    } else if (error instanceof AuthenticationError) {
        console.error('Auth failed:', error.message);
    } else if (error instanceof BRDError) {
        console.error('SDK error:', error.message);
    }
}

Error types: ValidationError, AuthenticationError, ZoneError, NetworkError, NetworkTimeoutError, TimeoutError, APIError, DataNotReadyError, FSError

Development

git clone https://github.com/brightdata/bright-data-sdk-js.git
cd bright-data-sdk-js
npm install
npm run build:dev

Commits conventions and releases

We use Semantic Release for automated releases. Commit message conventions:

  • fix: — triggers a PATCH release (0.5.0 => 0.5.1)
  • feat: — triggers a MINOR release (0.5.0 => 0.6.0)
  • feat!: or BREAKING CHANGE: in footer — triggers a MAJOR release (0.5.0 => 1.0.0)
  • docs: — documentation only, no release
  • chore: — general maintenance, no release

Support

For any issues, contact Bright Data support, or open an issue in this repository.

License

This project is licensed under the MIT License.

About

Bright Data's JS SDK, use it to call bright data's scrape and search tools. bypass any Bot-detection or Captcha and extract data from the web.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors