A configurable Python web scraping tool that extracts structured data from multiple webpages and exports the results to CSV.
Built for automation, data collection, and Upwork-style client projects.
- Scrapes multiple pages using a URL pattern with
{page} - Fully configurable via JSON (no code changes needed)
- Extracts data using CSS selectors (quotes, authors, tags, or any other fields)
- Saves clean structured data to CSV
- Logs scraping progress to
logs/scraper.log - Easy CLI interface for clients and non-technical users
webscraper_pro/
ββ README.md
ββ LICENSE
ββ requirements.txt
ββ .gitignore
ββ data/
β ββ sample_urls.txt
β ββ output/
ββ logs/
ββ webscraper/
β ββ __init__.py
β ββ config_example.json
β ββ cli.py
β ββ scraper.py
β ββ parser.py
β ββ storage.py
Example config file: webscraper/config_example.json
{
"base_url": "https://quotes.toscrape.com/page/{page}/",
"start_page": 1,
"end_page": 3,
"selectors": {
"quote": ".quote .text",
"author": ".quote .author",
"tags": ".quote .tags .tag"
}
}Fields explained:
base_urlβ must contain{page}so scraper can iteratestart_page/end_pageβ scraping rangeselectorsβ CSS selectors for each extracted field
You can modify this JSON to scrape any website, not just quotes.
Create and activate a virtual environment:
python3 -m venv .venv
source .venv/bin/activateInstall dependencies:
pip install -r requirements.txtRun the scraper:
python -m webscraper.cli --config webscraper/config_example.json --output data/output/quotes.csvResult:
- Fetches pages 1β3
- Extracts quotes, authors, and tags
- Saves them to
data/output/quotes.csv
This project is licensed under the MIT License.
You are free to use, modify, distribute, and incorporate the code into your own projects.
See the full license in the included LICENSE file.
- This project is for demonstration and educational purposes.
- Always respect website terms of service and robots.txt when scraping real websites.
- The scraper is modular and easy to extend for more complex automation.