A powerful tool to analyze cryptocurrency development ecosystems by mapping contributor networks across GitHub repositories using the Open Source Observer (OSO) database.
This tool performs a comprehensive 4-step analysis:
- Seed Repository Analysis - Starts with configurable crypto repositories (Bitcoin, Ethereum, Cosmos, etc.)
- Core Contributor Discovery - Finds the most active contributors to these seed projects
- Extended Repository Mapping - Discovers all other repositories these contributors work on
- Extended Contributor Network - Maps the broader ecosystem of developers in related projects
-
OSO Account & API Key
- Create account at opensource.observer
- Generate API key in Account Settings > API Keys
- Set environment variable:
export OSO_API_KEY="your_key_here"
-
Install Dependencies
pip install pyoso pandas python-dotenv tomli
-
Configure Analysis
- Copy and customize
config.toml(see Configuration section below)
- Copy and customize
# Step 1: Run the main analysis
python oso_github_repositories.py
# Step 2: Generate trust network (optional)
python generate_trust.pyAll parameters are configured via TOML files. Here's the complete configuration structure:
[general]
# Output directory for CSV files
output_dir = "./raw"
# Date range for contributions (in days from now)
# Set to 0 to include all historical data
days_back = 0 # 0 = all time, 365 = last year, 730 = last 2 years
# List of seed organizations to analyze
seed_orgs = [
"ethereum",
"bitcoin",
"cosmos",
"paradigmxyz",
"compound-finance",
"smartcontractkit",
"uniswap",
"offchainlabs",
"foundry-rs",
"paritytech"
]
[analysis]
# Enable extended repository and contributor analysis (set to false to only analyze seed repos)
extended_analysis = true
# Minimum commits threshold for all analysis (repositories, core contributors, extended contributors)
min_commits = 5
# Repository filtering
max_repos_per_org = 200
# Minimum core contributors threshold for all analysis (seed repos, extended repos, all phases)
min_core_contributors = 2[filters]
# Repository filters
exclude_forks = true
exclude_archived = false
# Contributor filters
exclude_bots = false # Set to true to exclude bot accounts
bot_keywords = ["bot", "dependabot", "mergify", "renovate", "github-actions"]
# Date filter - uses days_back from [general] section
[output]
# Output file naming
timestamp_format = "%Y%m%d_%H%M%S"
file_prefix = "crypto"
include_headers = true
include_timestamp_in_filename = trueAfter running the main analysis, you can generate trust relationships between contributors and repositories using the graph builder:
The generate_trust.py script processes your analysis results to create weighted trust networks:
- Reads Analysis Data - Uses the CSV files from your main analysis (seed repos, contributors, extended repos)
- Queries OSO Database - Gets detailed GitHub activity data for user-repository pairs
- Calculates Trust Scores - Weights different activities:
- Commits: 5 points (user β repo), 3 points (repo β user)
- Pull Requests Opened: 20 points (user β repo), 5 points (repo β user)
- Pull Requests Merged: 10 points (user β repo), 1 point (repo β user)
- Issues Opened: 10 points (user β repo)
- Stars: 5 points (user β repo)
- Forks: 1 point (user β repo)
- Generates Bidirectional Graph - Creates both user-to-repository and repository-to-user trust relationships
The graph builder uses your existing config.toml settings and can be enabled in the output section:
[output]
include_headers = true
include_timestamp_in_filename = false# 1. Run main analysis first
python oso_github_repositories.py
# 2. Generate trust network from the results
python generate_trust.py
# 3. Results will be in trust/github.csv
head trust/github.csv
# i,j,v
# vitalik,ethereum/go-ethereum,245.0
# ethereum/solidity,chriseth,89.5
# ...Trust relationships are saved as CSV files in the trust/ directory:
github.csv- Main trust network with columns:i(from),j(to),v(trust value)- Format suitable for graph analysis tools like NetworkX, igraph, or Gephi
- Processes relationships in batches of 200 to avoid database timeouts
- Automatically saves progress every 10 batches
- Filters out bot accounts and inactive relationships
The main analyzer generates 4 CSV files:
organization,repository_name,contributor_count,total_commits,status
ethereum,go-ethereum,31,18238,found
bitcoin,bitcoin,14,38508,foundcontributor_handle,total_commits,total_active_days,seed_repositories
chriseth,51499,1276,ethereum/solidity
vitalik,8234,945,ethereum/go-ethereum, ethereum/solidityorganization,repository_name,core_contributor_count,total_commits
ethereum,solc-js,7,3390
cosmos,gaia,4,1296contributor_handle,repos_contributed,total_commits
alexanderbez,13,3064
marbar3778,11,6758[general]
seed_orgs = [
"Uniswap",
"compound-finance",
"makerdao",
"aave"
]
[analysis]
min_commits = 10
[filters]
exclude_bots = true
# Uses days_back from [general] section for date filtering[general]
seed_orgs = [
"ethereum-optimism",
"0xPolygonMatic",
"matter-labs",
"starkware-libs"
]
[analysis]
min_commits = 3 # Lower threshold for newer projects[general]
seed_orgs = [
"bitcoin",
"lightninglabs",
"ElementsProject",
"BlockstreamResearch"
]
[general]
days_back = 1095 # Last 3 years- Investment Research - Map development activity across crypto ecosystems
- Talent Discovery - Find top contributors in specific blockchain domains
- Ecosystem Analysis - Understand project relationships and cross-pollination
- Due Diligence - Assess developer community strength and engagement
- Recruitment - Identify active developers for hiring
- Academic Research - Study open source collaboration patterns
The tool reveals:
- Developer Migration Patterns - How contributors move between projects
- Ecosystem Boundaries - Which projects share common developers
- Influence Networks - Key individuals working across multiple important projects
- Project Relationships - Unexpected connections between seemingly unrelated repos
- Community Health - Distribution of contributions and contributor diversity
- Start Small - Begin with 3-5 seed repos and adjust limits
- Use Date Filters - Limit analysis to recent periods for faster queries
- Enable Bot Filtering - Exclude automated contributions for cleaner results
- Adjust Repository Limits - Use
max_repos_per_orgto balance thoroughness vs. speed - Custom Output Directories - Organize results by analysis type or date
- Data Freshness - OSO data may have some lag from live GitHub
- Private Repos - Only analyzes public GitHub repositories
- Attribution - Relies on consistent GitHub usernames/emails
- Scope - Currently focused on GitHub; doesn't include GitLab, Bitbucket, etc.
"No valid seed repositories found"
- Check repository names in config (must be "org/repo" format)
- Verify repos exist in OSO database
"Query timeout or error"
- Reduce repository limits in config (e.g.,
max_repos_per_org) - Add date filters to limit scope
- Try smaller batches of seed repositories
"No contributors found"
- Lower
min_commitsthreshold - Check date range isn't too restrictive
- Verify seed repos have recent activity
Set environment variable for verbose output:
export OSO_DEBUG=1
python oso_github_repositories.pyApache 2.0
Contributions welcome! Areas for improvement:
- Additional data sources beyond GitHub
- Enhanced network visualization capabilities
- Advanced statistical analysis features
- Graph metrics computation (centrality, clustering, etc.)
- Performance optimizations
- Additional export formats (JSON, Parquet, etc.)
- Open Source Observer - Data source
- PyOSO - Python client library