TypeScript SDK for MarkUDown — AI data extraction infrastructure that turns any website into structured, ready-to-use data.
Requires Node 18+ (uses native fetch). Zero dependencies.
npm install markudown
# or
pnpm add markudown
# or
yarn add markudownimport { MarkUDown } from "markudown";
const md = new MarkUDown({ apiKey: "your-api-key" });
// Scrape a page
const result = await md.scrape("https://example.com");
// Search the web
const results = await md.search("best web scraping tools 2025", {
engine: "google",
limit: 5,
});
// Extract structured data
const data = await md.extract(
"https://shop.example.com/products",
[
{ name: "title", type: "string", active: true },
{ name: "price", type: "float", active: true },
{ name: "rating", type: "float", active: true },
],
{ extractQuery: "Extract all product listings" }
);Get your API key at scrapetechnology.com/markudown. New accounts include 500 free credits.
const md = new MarkUDown({
apiKey: "your-api-key",
baseUrl: "https://api.scrapetechnology.com", // default
timeout: 120_000, // HTTP timeout in ms (default: 120 000)
pollInterval: 2_000, // ms between status polls (default: 2 000)
pollTimeout: 300_000, // max ms to wait for a job (default: 300 000)
});All async endpoints accept wait: false to return the job ID immediately:
const job = await md.crawl("https://example.com", { wait: false }) as { id: string };
// Check later
const status = await md.getJobStatus("crawl", job.id);Scrape a single URL. Returns result immediately.
const result = await md.scrape("https://example.com", {
mainContent: true,
includeLinks: true,
excludeTags: ["header", "nav", "footer"],
});| Option | Type | Default | Description |
|---|---|---|---|
timeout |
number | 60 | Timeout in seconds |
excludeTags |
string[] | ["header","nav","footer"] |
HTML tags to strip |
mainContent |
boolean | true | Extract only main content |
includeLinks |
boolean | false | Include hyperlinks |
includeHtml |
boolean | false | Include raw HTML |
Take a screenshot of a URL.
const result = await md.screenshot("https://example.com");
// result.image → base64 PNGDiscover all URLs on a website.
const result = await md.map("https://example.com", {
maxUrls: 500,
allowedWords: ["product", "shop"],
blockedWords: ["blog", "careers"],
});Crawl a website recursively and extract content from every page.
const result = await md.crawl("https://docs.example.com", {
maxDepth: 3,
limit: 50,
includeOnly: ["*guide*", "*tutorial*"],
});| Option | Type | Default |
|---|---|---|
maxDepth |
number | 2 |
limit |
number | 10 |
timeout |
number | 60 |
includeLinks |
boolean | false |
blockedWords |
string[] | [] |
includeOnly |
string[] | [] |
mainContent |
boolean | true |
callbackUrl |
string | — |
wait |
boolean | true |
Schema-based structured data extraction with AI.
const result = await md.extract(
"https://shop.example.com",
[
{ name: "title", type: "string", active: true },
{ name: "price", type: "float", active: true },
{ name: "in_stock", type: "string", active: true },
],
{
extractQuery: "Product listings with title, price and availability",
extractionScope: "whole_site",
}
);Generate an extraction schema automatically from a natural language description.
const schema = await md.createSchema(
"Extract all products from https://shop.example.com with title, price and rating"
);Extract data using a natural language prompt — no schema required.
const result = await md.promptExtract(
"https://example.com/team",
"List all team members with their name, title and LinkedIn URL"
);Scrape multiple URLs in parallel.
const result = await md.batchScrape([
"https://example.com/page-1",
"https://example.com/page-2",
"https://example.com/page-3",
]);Web search with optional scraping of result pages.
const result = await md.search("open source web scraping", {
engine: "all", // "google" | "bing" | "duckduckgo" | "all"
limit: 10,
scrapeResults: true,
lang: "en",
country: "us",
});Generate an RSS feed from any web page.
const result = await md.rss("https://blog.example.com", {
maxItems: 20,
title: "My Blog Feed",
});Detect and diff content changes on a page.
const result = await md.changeDetection("https://example.com/pricing", {
includeDiff: true,
});
// result.data.changed → true/false
// result.data.diff → diff stringScrape multiple pages and synthesize findings with an LLM.
const result = await md.deepResearch(
"What are the pricing differences between these competitors?",
[
"https://competitor-a.com/pricing",
"https://competitor-b.com/pricing",
],
{ maxTokens: 4096 }
);AI autonomous navigation agent that clicks, scrolls, and fills forms.
const result = await md.agent(
"https://example.com",
"Find the contact email on this website",
{ maxSteps: 15, includeScreenshots: false }
);Guided extraction: analyzes the site, plans interactions, and returns complete data.
const result = await md.smartExtract(
"https://example.com/products",
"Extract all products with name, price, and description",
{
outputFormat: "json",
hints: ["Click 'Load more' button to get all items"],
}
);Check where a domain ranks for a keyword in search results.
const result = await md.rank("web scraping api", "scrapetechnology.com", {
engine: "google",
country: "us",
depth: 100,
});
// result.data.position → e.g. 4Auto-paginate a listing page and extract all items as a structured dataset.
const result = await md.dataset(
"https://books.toscrape.com",
"Extract all books with title, price, rating and availability",
{ maxPages: 10, outputFormat: "json" }
);// Create
const sub = await md.monitorCreate(
"https://example.com/pricing",
"https://yourapp.com/webhook",
{ intervalMinutes: 60 }
);
const subId = (sub as { subscription_id: string }).subscription_id;
// List
const monitors = await md.monitorList();
// Delete
await md.monitorDelete(subId);Extract public Instagram data.
resource |
target |
Description |
|---|---|---|
"profile" |
username | Profile info + recent posts |
"post" |
shortcode | Single post |
"hashtag" |
hashtag | Recent posts for a hashtag |
"search" |
keyword | Search results |
// Profile
const profile = await md.instagram("profile", "instagram");
// Hashtag
const posts = await md.instagram("hashtag", "webdev", { limit: 30 });
// Search
const results = await md.instagram("search", "coffee shops NYC");Extract public X (Twitter) data.
resource |
target |
Description |
|---|---|---|
"profile" |
username | Profile info + recent tweets |
"post" |
tweet ID | Single tweet |
"search" |
keyword | Search results |
const profile = await md.x("profile", "elonmusk");
const results = await md.x("search", "web scraping tools", { limit: 25 });// Get status
const status = await md.getJobStatus("crawl", "job-id");
// status.status → "pending" | "processing" | "completed" | "failed"
// Cancel
await md.cancelJob("agent", "job-id");import { MarkUDown, MarkUDownError, RateLimitError, InsufficientCreditsError } from "markudown";
const md = new MarkUDown({ apiKey: "your-key" });
try {
const result = await md.scrape("https://example.com");
} catch (e) {
if (e instanceof RateLimitError) {
console.log("Rate limit hit — wait a minute and retry");
} else if (e instanceof InsufficientCreditsError) {
console.log("Out of credits — upgrade at scrapetechnology.com");
} else if (e instanceof MarkUDownError) {
console.log(`API error ${e.statusCode}: ${e.message}`);
}
}npm install
npm run build # compiles to dist/
npm run typecheck # type-check without emitting