Skip to content

ioodev/nodescraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🕸️ NodeScraper

NodeScraper is a fast and flexible Node.js web scraping toolkit built using Axios and Cheerio. It provides an intuitive interface for extracting structured HTML and metadata from websites — with clean and consistent outputs.

Fast. Clean. JavaScript-style scraping. 🕸️⚡


🚀 Features

  • ✅ Extract metadata: title, description, keywords, author, and more
  • ✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
  • ✅ Extract HTML structures: h1h6, p, ul, ol, img, links
  • ✅ Powerful filter() method with class, ID, and tag-based selectors
  • returnHtml toggle to return clean text or raw HTML
  • ✅ Simple return values: string, array, or object
  • ✅ Powered by Axios and Cheerio

📦 Installation

npm install @riodevnet/nodescraper

Requires Node.js 14 or later


🛠️ Basic Usage

const NodeScraper = require("@riodevnet/nodescraper");

(async () => {
  const scraper = new NodeScraper("https://example.com");
  await scraper.init();

  console.log(scraper.title()); // "Welcome to Example.com"
  console.log(scraper.description()); // "This is the example meta description."
  console.log(scraper.h1()); // ["Welcome", "Latest News"]
  console.log(scraper.open_graph()); // { "og:title": "...", "og:description": "...", ... }

  // Custom filter
  console.log(
    scraper.filter({
      element: "div",
      attributes: { class: "card" },
      multiple: true,
      extract: ["h1", "p", ".title", "#desc"],
    })
  );
})();

🧪 Available Methods

🔹 Page Metadata

scraper.title();
scraper.description();
scraper.keywords();
scraper.keyword_string();
scraper.charset();
scraper.canonical();
scraper.content_type();
scraper.author();
scraper.csrf_token();
scraper.image();

🔹 Open Graph & Twitter Card

scraper.open_graph();
scraper.open_graph("og:title");

scraper.twitter_card();
scraper.twitter_card("twitter:title");

🔹 Headings & Text

scraper.h1();
scraper.h2();
scraper.h3();
scraper.h4();
scraper.h5();
scraper.h6();
scraper.p();

🔹 Lists

scraper.ul();
scraper.ol();

🔹 Images

scraper.images();
scraper.image_details();

🔹 Links

scraper.links();
scraper.link_details();

🔍 Custom DOM Filtering

Use filter() to target specific DOM elements and extract nested content.

▸ Single element

scraper.filter({
  element: "div",
  attributes: { id: "main" },
  multiple: false,
  extract: [".title", "#description", "p"],
});

▸ Multiple elements

scraper.filter({
  element: "div",
  attributes: { class: "card" },
  multiple: true,
  extract: ["h1", ".subtitle", "#meta"],
});

The extract array accepts tag names, class selectors (e.g., .title), or ID selectors (e.g., #meta).
Output keys are automatically normalized:
.titleclass__title, #metaid__meta

▸ Clean Text Output

Disable raw HTML output:

scraper.filter({
  element: "p",
  attributes: { class: "dark-text" },
  multiple: true,
  returnHtml: false,
});

📦 Output Example

scraper.title();
// "Welcome to Example.com"

scraper.h1();
// ["Main Heading", "Another Title"]

scraper.open_graph("og:title");
// "Example OG Title"

📁 Project Structure (suggested)

nodescraper/
├── index.js
├── package.json
├── examples/
└── tests/

🧪 Testing

Testing support coming soon using:

  • jest
  • nock for HTTP mocking

🤝 Contributing

Contributions are welcome!
Found a bug or want to request a feature? Please open an issue or submit a pull request.


📄 License

MIT License © 2025 — NodeScraper


🔗 Related Projects


💡 Why NodeScraper?

Think of it as your JavaScript web detective — fast, efficient, and precise.

About

NodeScraper is a fast and flexible Node.js web scraping toolkit built using Axios and Cheerio. It provides an intuitive interface for extracting structured HTML and metadata from websites — with clean and consistent outputs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors