Simple and straightforward crawler for simple and straightforward things.
pip install -r requirements.txtpython crawler.py https://example.comFollows all links on the same domain up to the specified depth:
python crawler.py https://example.com --recursiveOr specify a custom depth:
python crawler.py https://example.com --recursive --depth 3Create a file urls.txt:
https://example.com/page1
https://example.com/page2
https://example.com/page3
Then run:
python crawler.py --file urls.txtEnable HTTP caching to avoid re-downloading pages:
python crawler.py https://example.com -c
python crawler.py https://example.com --cacheurl: Single URL to crawl (optional if using --file)-f,--file: File containing URLs (one per line)-c,--cache: Enable HTTP cache (default: disabled)-r,--recursive: Follow links on same domain recursively-d,--depth: Max depth for recursive crawling (default: 2)-o,--output: Output directory or .json file (default: output/)
- HTTP Cache: Use
--cacheto enable crawl4ai's HTTP caching system - Verbose output: Shows crawl4ai's built-in progress information
- Depth control: Limits crawling depth to avoid runaway crawling
- Stream mode: Processes pages incrementally for better performance
Markdown files in the output directory. Each page becomes a separate .md file.