A simple Python tool to fetch and archive messages from Telegram channels using Telethon.
- Simple - Login with phone number, no session strings needed
- Flexible - Configure via
config.tomlfile - Async - Built with async/await for efficient message fetching
- Rate limiting - Respects Telegram API limits
- Parallel processing - Crawls multiple channels concurrently
- Channel exclusion - Skip unwanted channels (logs, bots, etc.)
- Checkpoints - Automatically saves progress and allows resuming if interrupted
- JSON export - Saves messages with full metadata
- Visit https://my.telegram.org
- Login with your phone number
- Go to "API Development Tools"
- Create a new application (any name/description)
- Copy your
api_idandapi_hash
pip install -r requirements.txt# Copy sample environment file
cp .sample.env .env
# Edit .env and add your credentials
nano .envYour .env should contain:
TELEGRAM_APP_ID=12345678
TELEGRAM_APP_HASH=abcdef1234567890abcdef1234567890
TELEGRAM_PHONE=+1234567890Note: TELEGRAM_PHONE is optional - you'll be prompted if not set
First, list all your accessible channels:
python list_channels.pyThis will show all channels with their IDs. Then edit config.toml to set which channels to crawl:
[channels]
include = [
-1001234567890, # Channel ID (from list_channels.py)
-1003198190559, # Another channel ID
]
exclude = [
-1001111111111, # Channel IDs to skip
]python read_messages.pyOn first run, you'll be prompted to:
- Enter your phone number (if not in .env)
- Enter the verification code Telegram sends you
- Enter your 2FA password (if enabled)
A session file will be created so you don't need to login again on subsequent runs.
Edit config.toml to customize the crawler:
[crawler]
time_window_days = 3 # How many days back to fetch
max_messages_per_channel = 2000 # Message limit per channel
parallel_requests = 3 # Concurrent channels to process
batch_size = 500 # Number of messages to fetch per batch
rate_limiting_delay = 0.5 # Delay between requests (seconds)
checkpoint_interval = 100 # Save checkpoint every N messages (0 to disable)
fetch_replies = true # Fetch replies/comments to channel posts
max_reply_depth = 2 # Maximum depth for nested replies (0-5 recommended)
[channels]
include = [-1001234567890, -1003198190559] # Channel IDs to crawl
exclude = [-1001111111111] # Channel IDs to skip
[output]
pretty_print = true # Format JSON nicely
indent_spaces = 2 # JSON indentation
# Note: Messages are saved to raw/[channel_id]_messages.jsonThe crawler automatically saves checkpoints during message fetching to prevent data loss if interrupted. Checkpoints are saved to raw/checkpoints/ directory.
How it works:
- Checkpoint files are created every N messages (configurable via
checkpoint_intervalin config.toml) - Default is every 100 messages
- If the script is interrupted, it will detect the checkpoint on next run and ask if you want to resume
- Checkpoints are automatically deleted after successful completion
- Set
checkpoint_interval = 0to disable checkpoints
Resuming from checkpoint:
python read_messages.py
# If a checkpoint is found, you'll see:
# π Found checkpoint for channel -1001234567890 with 500 messages
# Last saved: 2025-01-15T10:30:45+00:00
# Resume from checkpoint? (y/n):Messages are saved as JSON files in the raw/ directory, named by channel ID:
raw/-1001234567890_messages.json- Channel messagesraw/-1003198190559_messages.json- Another channel messagesraw/checkpoints/[channel_id]_checkpoint.json- Checkpoint files (temporary)
Each JSON file contains an array of simplified message objects with only essential fields:
[
{
"id": 9099,
"date": "2025-11-13T01:49:52+00:00",
"from_id": 526750941,
"message": "@lazovicff @dharmikumbhani",
"reply_to_msg_id": 9098,
"reactions": [
{
"user_id": 526750941,
"emoji": "π"
},
{
"user_id": 123456789,
"emoji": "π"
}
],
"replies": 3
}
]Fields included:
id- Message IDdate- Message timestamp (ISO format)from_id- User ID who sent the messagemessage- Message text contentreply_to_msg_id- ID of message being replied to (if any)reactions- Array of reactions with user ID and emojireplies_count- Number of replies to this messagereplies_data- Array of reply messages (iffetch_replies = true)
For channels with replies enabled:
When fetch_replies = true in config, each post will include a replies_data array containing all replies/comments with their reactions and user information. This is useful for channels with discussion groups enabled.
Nested replies:
Replies can have their own replies (threaded conversations). The max_reply_depth setting controls how many levels deep to fetch:
0= No replies fetched1= Only direct replies to posts2= Replies + replies to those replies (recommended)3+= Deeper nesting (slower, more data)
Each reply in replies_data can contain its own replies_data array for nested conversations.
read_messages.py- Main crawler script (run this)list_channels.py- List all accessible channels/groupslist_admins.py- List all admins/moderators for channels in config and save to CSVget_user_ids.py- Get user ID to username mapping for all membersgenerate_trust.py- Calculate trust scores from messageslogin.py- Setup guide and instructionsconfig.toml- Configuration file.env- Environment variables (credentials)requirements.txt- Python dependencies
# List all your channels
python list_channels.py
# List admins/moderators for channels in config (saves to CSV)
python list_admins.py
# Get user ID to username mapping
python get_user_ids.py
# Run the crawler
python read_messages.py
# Calculate trust scores
python generate_trust.py
# View setup guide
python login.py"Missing Telegram credentials"
β Make sure .env has TELEGRAM_APP_ID and TELEGRAM_APP_HASH
"Channel is not a valid ID"
β Only numeric IDs are accepted, run python list_channels.py to get IDs
"Could not find the input entity" β Make sure the channel ID is correct (from list_channels.py)
"A wait of X seconds is required"
β You're rate limited. Increase rate_limiting_delay in config.toml
Script keeps getting interrupted
β Enable checkpoints in config.toml with checkpoint_interval = 100 to save progress periodically
Want to restart from scratch (ignore checkpoint)
β When prompted to resume, type 'n' or manually delete checkpoint files in raw/checkpoints/
Import errors
β Install dependencies: pip install -r requirements.txt
Authorization failed β Make sure you enter the correct phone number and verification code
"Collected info for 0 unique users" for channel posts
β This is normal for channels (not groups). Set fetch_replies = true in config.toml to fetch comments/replies where user interactions happen.
List and export channel/group administrators and their roles:
python list_admins.pyWhat it does:
- Shows all owners, admins, and moderators for channels configured in
config.toml - Displays their roles, permissions, and user information
- Automatically saves admin lists to
raw/[channel_id]_admins.csv
Output CSV format:
user_id,username,first_name,last_name
123456789,john_doe,John,Doe
987654321,jane_admin,Jane,SmithUse cases:
- Identify channel moderators and their permissions
- Export admin lists for record-keeping
- Compare admin structures across multiple channels
The crawler supports generating trust scores based on user interactions:
-
Fetch messages:
python read_messages.py- Saves messages to
raw/[channel_id]_messages.json - Saves user info to
raw/[channel_id]_user_ids.csv(includes user_id, username, first_name, last_name)
- Saves messages to
-
Generate trust scores:
python generate_trust.py- Reads messages and calculates trust based on reactions, replies, and mentions
- Saves raw trust edges to
trust/[channel_id].csvwith format:i,j,v(from_user_id, to_user_id, score) - Note: Trust files now use user IDs, not usernames
-
Process scores:
python process_scores.py- Aggregates incoming trust for each user
- Converts user IDs to display names by default (username > "first_name last_name" > user_id)
- Normalizes scores to 0-1000 range
- Saves to
output/[channel_id].csv - Use
--with-user-idsflag to keep user IDs instead of converting to display names
Example workflow:
python read_messages.py # Fetch messages and user info
python generate_trust.py # Calculate trust edges (saves user IDs)
python process_scores.py # Convert to display names and normalize
python process_scores.py --with-user-ids # Keep user IDs in outputMessages are saved in the raw/ directory:
- Format:
raw/[channel_id]_messages.json - One file per channel
- Contains simplified message data (ID, date, user ID, text, reactions, replies)
- No unnecessary metadata included
User information is saved as:
- Format:
raw/[channel_id]_user_ids.csv - Columns:
user_id,username,first_name,last_name - Some users may not have usernames (this is normal on Telegram)
Admin lists are saved as:
- Format:
raw/[channel_id]_admins.csv - Columns:
user_id,username,first_name,last_name - Generated by running
python list_admins.py
Trust scores workflow:
trust/[channel_id].csv- Raw trust edges with user IDs (i,j,v format)output/[channel_id].csv- Processed scores with display names or user IDs (i,v format)
The crawler creates a telegram_session.session file to remember your login.
- This file is automatically created on first login
- Don't commit this file to git (it's in .gitignore)
- Delete it if you want to login with a different account
ISC