Threads Scraper

The original repo shipped without the core scraping code. I can’t wait to use it, so I built my own version here.

Current usable features

Scrape Threads posts by username (online or offline).
Offline mode with deterministic sample data for testing.
Online mode that tries a fast HTTP parse and falls back to Playwright scrolling.
Optional login flow using a persistent Playwright profile.
Export to JSON and CSV, plus a cleaned CSV in data/processed.
Configurable settings via config/settings.yaml.
Basic proxy support (if enabled in settings).

Note: Reply scraping is still in progress and not reliable yet.

Future plan

Scrape replies in parallel with smarter resource management and better session handling.

How to use

Clone the repo:
- git clone <REPO_URL>
Open a terminal and go into the project folder:
- cd Threads-Scraper
Install Python dependencies:
- python -m pip install -r requirements.txt
If you want live scraping (not offline), install Playwright browsers:
- python -m playwright install chromium
Open config/settings.yaml and review the settings.
(Optional but recommended) Log in once to lift public visibility limits:
- Run: python src/main.py --login --profile-dir data/playwright-profile
- A browser opens. Log into Threads, then return to the terminal and press Enter.
Run a scrape:
- Example: python src/main.py --usernames example_user --limit 100
- Example (two users + logged-in profile): python src/main.py --usernames example_user another_user --limit 50 --profile-dir data/playwright-profile
- Example (offline test data): python src/main.py --offline --usernames example_user --limit 20
Find your results in output/ and data/processed.

Login notes:

Scraping without login is supported, but Threads often limits how many posts you can see in public mode.
Logging in via a persistent Playwright profile usually increases the number of posts collected.

Settings guide (config/settings.yaml)

General:

base_url: Threads base URL.
timeout: request/page timeout in seconds.
use_offline: true to use local sample data instead of live scraping.
use_proxies: true to use proxies from data/raw/proxies.json.
limit: max posts per user (best-effort; public mode may return fewer).
dump_raw_items: true to save raw payloads to data/raw for debugging.

Replies:

scrape_replies: enable reply scraping.
replies_limit: max replies per thread (best-effort).
replies_workers: parallel workers for replies (limited when using a persistent profile).
skip_zero_replies: skip threads with reply_count = 0.
replies_use_persistent_profile: force replies to use a single persistent profile (disables parallelism).

Online mode tuning:

max_scrolls / replies_max_scrolls: how many scroll cycles to attempt.
scroll_pause_ms / replies_scroll_pause_ms: delay after each scroll to let requests finish.
page_settle_ms / replies_page_settle_ms: initial wait after first load.
stagnant_scrolls / replies_stagnant_scrolls: stop after N scrolls with no new items.

Login/session:

playwright_headless: false to see the browser (needed for login).
playwright_user_data_dir: path to the saved browser profile for persistent login.
cookie: optional raw Cookie header (can also be set via THREADS_COOKIE in .env).

Defaults:

usernames: fallback list used when no --usernames are provided.

Contributing

Contributions are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
config		config
data/raw		data/raw
media		media
src		src
.env		.env
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
scrapethreads.png		scrapethreads.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Threads Scraper

Current usable features

Future plan

How to use

Settings guide (config/settings.yaml)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Threads Scraper

Current usable features

Future plan

How to use

Settings guide (config/settings.yaml)

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages