NodeScraper is a fast and flexible Node.js web scraping toolkit built using Axios and Cheerio. It provides an intuitive interface for extracting structured HTML and metadata from websites — with clean and consistent outputs.
Fast. Clean. JavaScript-style scraping. 🕸️⚡
- ✅ Extract metadata: title, description, keywords, author, and more
- ✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
- ✅ Extract HTML structures:
h1–h6,p,ul,ol,img, links - ✅ Powerful
filter()method with class, ID, and tag-based selectors - ✅
returnHtmltoggle to return clean text or raw HTML - ✅ Simple return values: string, array, or object
- ✅ Powered by Axios and Cheerio
npm install @riodevnet/nodescraperRequires Node.js 14 or later
const NodeScraper = require("@riodevnet/nodescraper");
(async () => {
const scraper = new NodeScraper("https://example.com");
await scraper.init();
console.log(scraper.title()); // "Welcome to Example.com"
console.log(scraper.description()); // "This is the example meta description."
console.log(scraper.h1()); // ["Welcome", "Latest News"]
console.log(scraper.open_graph()); // { "og:title": "...", "og:description": "...", ... }
// Custom filter
console.log(
scraper.filter({
element: "div",
attributes: { class: "card" },
multiple: true,
extract: ["h1", "p", ".title", "#desc"],
})
);
})();scraper.title();
scraper.description();
scraper.keywords();
scraper.keyword_string();
scraper.charset();
scraper.canonical();
scraper.content_type();
scraper.author();
scraper.csrf_token();
scraper.image();scraper.open_graph();
scraper.open_graph("og:title");
scraper.twitter_card();
scraper.twitter_card("twitter:title");scraper.h1();
scraper.h2();
scraper.h3();
scraper.h4();
scraper.h5();
scraper.h6();
scraper.p();scraper.ul();
scraper.ol();scraper.images();
scraper.image_details();scraper.links();
scraper.link_details();Use filter() to target specific DOM elements and extract nested content.
scraper.filter({
element: "div",
attributes: { id: "main" },
multiple: false,
extract: [".title", "#description", "p"],
});scraper.filter({
element: "div",
attributes: { class: "card" },
multiple: true,
extract: ["h1", ".subtitle", "#meta"],
});The
extractarray accepts tag names, class selectors (e.g.,.title), or ID selectors (e.g.,#meta).
Output keys are automatically normalized:
.title→class__title,#meta→id__meta
Disable raw HTML output:
scraper.filter({
element: "p",
attributes: { class: "dark-text" },
multiple: true,
returnHtml: false,
});scraper.title();
// "Welcome to Example.com"
scraper.h1();
// ["Main Heading", "Another Title"]
scraper.open_graph("og:title");
// "Example OG Title"nodescraper/
├── index.js
├── package.json
├── examples/
└── tests/
Testing support coming soon using:
jestnockfor HTTP mocking
Contributions are welcome!
Found a bug or want to request a feature? Please open an issue or submit a pull request.
MIT License © 2025 — NodeScraper
Think of it as your JavaScript web detective — fast, efficient, and precise.