A batch analytics platform with a 3-layer data engineering pipeline (Raw → Staging → Analytics) that analyzes trending GitHub repositories across 3 programming languages (Python, TypeScript/Next.js, and Go). Leverages Render Workflows' distributed task execution to process data in parallel, storing results in a dimensional model for high-performance analytics.
- Multi-Language Analysis: Tracks Python, TypeScript/Next.js, and Go repositories
- 3-Layer Data Pipeline: Raw ingestion → Staging validation → Analytics dimensional model
- Parallel Processing: 4 concurrent workflow tasks using Render Workflows SDK
- Render Ecosystem Spotlight: Dedicated showcase for Render-deployed projects
- Real-time Dashboard: Next.js 14 dashboard with analytics visualizations
- Hourly Updates: Automated cron job triggers workflow execution
graph TD
A[Cron Job Hourly] --> B[Workflow Orchestrator]
B --> C[Python Analyzer]
B --> D[TypeScript Analyzer]
B --> E[Go Analyzer]
B --> F[Render Ecosystem]
C --> G[Raw Layer JSONB]
D --> G
E --> G
F --> G
G --> H[Staging Layer Validated]
H --> I[Analytics Layer Fact/Dim]
I --> J[Next.js Dashboard]
Backend (Workflows)
- Python 3.11+
- Render Workflows SDK with
@taskdecorators - asyncpg for PostgreSQL
- aiohttp for async API calls
- GitHub REST API
Frontend (Dashboard)
- Next.js 14.2 (App Router)
- TypeScript
- Tailwind CSS
- Recharts for visualizations
- PostgreSQL (pg)
Infrastructure
- Render Workflows (task execution)
- Render Cron Job (hourly trigger)
- Render Web Service (Next.js dashboard)
- Render PostgreSQL (data storage)
trender/
├── workflows/
│ ├── workflow.py # Main workflow with @task decorators
│ ├── github_api.py # Async GitHub API client
│ ├── connections.py # Shared resource management
│ ├── render_detection.py # Render usage detection
│ ├── etl/
│ │ ├── extract.py # Raw layer extraction
│ │ └── data_quality.py # Quality scoring
│ └── requirements.txt
├── trigger/
│ ├── trigger.py # Cron trigger script
│ └── requirements.txt
├── dashboard/
│ ├── app/ # Next.js App Router pages
│ ├── components/ # Reusable UI components
│ ├── lib/
│ │ ├── db.ts # Database utilities
│ │ └── formatters.ts # Data formatting helpers
│ └── package.json
├── database/
│ ├── schema/
│ │ ├── 01_raw_layer.sql
│ │ ├── 02_staging_layer.sql
│ │ ├── 03_analytics_layer.sql
│ │ └── 04_views.sql
│ └── init.sql
├── render.yaml
├── .env.example
└── README.md
If you've already completed the setup and just want to trigger a workflow run:
# Navigate to trigger directory
cd trigger
# Set environment variables
export RENDER_API_KEY=your_api_key
export RENDER_WORKFLOW_SLUG=trender-wf
# Install dependencies and run
pip install -r requirements.txt
python trigger.pyOr use the Render Dashboard: Workflows → trender-wf → Tasks → main_analysis_task → Run Task
- GitHub authentication (Personal Access Token or OAuth App - covered in step 2)
- Render account
- Node.js 18+ (for dashboard)
- Python 3.11+ (for workflows)
git clone <your-repo-url>
cd trenderTrender needs a GitHub access token to fetch repository data. You can choose between two authentication methods:
Best for: Individual developers, quick setup, local development
This is the simplest method - just create a token from GitHub settings.
cd workflows
pip install -r requirements.txt
python auth_setup.py- Open https://github.com/settings/tokens/new in your browser
- Configure the token:
- Note:
Trender Analytics Access - Expiration:
No expiration(or your preference) - Scopes:
- ✓
repo(Full control of private repositories) - ✓
read:org(Read org and team membership)
- ✓
- Note:
- Click "Generate token"
- Copy the token (starts with
ghp_orgithub_pat_) - Paste it into the terminal when prompted
The script will verify your token and display:
✅ SUCCESS! Your GitHub access token (PAT):
============================================================
ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
============================================================
Add this to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxAdd the token to your .env file:
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx✅ Done! Skip to Step 3.
Best for: Team setups, production deployments, requiring user authorization flow
- Go to https://github.com/settings/developers
- Click "New OAuth App"
- Fill in the details:
- Application name:
Trender Analytics - Homepage URL:
http://localhost:3000 - Authorization callback URL:
http://localhost:8000/callback
- Application name:
- Click "Register application"
- Note your Client ID (starts with
Ov23orIv1.) - Click "Generate a new client secret" and save it
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_herecd workflows
pip install -r requirements.txt
python auth_setup.pyChoose option [2] for OAuth, then:
- The script starts a local server on port 8000
- Your browser opens to GitHub's authorization page
- Click "Authorize" to approve
- The script exchanges the auth code for a token
- Your
GITHUB_ACCESS_TOKENis displayed
Add the token to your .env file:
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx- ✅ Tokens don't expire (unless you set expiration on PAT)
- ✅ Never commit tokens to version control (
.envis in.gitignore) - ✅ Token scopes:
repoandread:orgonly - ✅ Revoke access anytime at https://github.com/settings/tokens
⚠️ Treat tokens like passwords
PAT Issues:
- Token doesn't start with
ghp_: Classic tokens start withghp_, fine-grained tokens withgithub_pat_ - API returns 401: Token may be expired or revoked. Generate a new one.
- Rate limit errors: Ensure token has proper scopes selected
OAuth Issues:
- Port 8000 in use: Run
lsof -ti:8000 | xargs kill -9, then try again - "Redirect URI mismatch": Ensure callback URL in OAuth app is exactly
http://localhost:8000/callback - Browser doesn't open: Manually visit the URL shown in the terminal
- "Bad verification code": Code expires quickly. Run
python auth_setup.pyagain
Both Methods:
- Token verification fails: Check your internet connection
- Need to regenerate: Revoke old token at https://github.com/settings/tokens and generate new one
cp .env.example .env
# Edit .env with your credentialsYour .env file should now contain (from step 2):
If you used PAT (Option A):
GITHUB_ACCESS_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxIf you used OAuth (Option B):
GITHUB_CLIENT_ID=Ov23xxxxx_or_Iv1.xxxxx
GITHUB_CLIENT_SECRET=your_secret_here
GITHUB_ACCESS_TOKEN=gho_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxOther required variables (add as you complete the setup):
DATABASE_URL: PostgreSQL connection string (from step 4)RENDER_API_KEY: Render API key (from https://dashboard.render.com/u/settings#api-keys)RENDER_WORKFLOW_SLUG:trender-wf(or your workflow slug from step 6)
- Go to Render Dashboard
- Create new PostgreSQL database:
- Name:
trender-db - Database Name:
trender - Plan:
basic_256mb(or higher for production)
- Name:
- Note the connection string for
DATABASE_URL
# Connect to your Render PostgreSQL instance and run the initialization script
DATABASE_URL=YOUR_DATABASE_URL
psql $DATABASE_URL -f database/init.sqlIf you prefer to run the schema files one at a time:
# Run each schema file in order
psql $DATABASE_URL -f database/schema/01_raw_layer.sql
psql $DATABASE_URL -f database/schema/02_staging_layer.sql
psql $DATABASE_URL -f database/schema/03_analytics_layer.sql
psql $DATABASE_URL -f database/schema/04_views.sqlRaw Layer:
raw_github_repos: Stores complete GitHub API responses (JSONB format)raw_repo_metrics: Stores repository metrics (commits, issues, contributors)
Staging Layer:
stg_repos_validated: Cleaned and validated repository data with quality scoresstg_render_enrichment: Render-specific metadata (service types, complexity, categories)
Analytics Layer:
- Dimension tables:
dim_repositories: Repository master data with SCD Type 2 historydim_languages: Language metadatadim_render_services: Render service type reference data (web, worker, cron, etc.)
- Fact tables:
fact_repo_snapshots: Daily snapshots of repo metrics and momentum scoresfact_render_usage: Render service adoption by repository
Views:
analytics_trending_repos_current: Current top trending repos across all languagesanalytics_render_showcase: Render ecosystem showcase with enrichmentanalytics_language_rankings: Per-language rankings with Render adoption statsanalytics_render_services_adoption: Service type adoption statisticsanalytics_language_trends: Language-level aggregated statisticsanalytics_repo_history: Historical trends for charting
Total: 9 tables + 6 views
Check that all tables were created successfully:
psql $DATABASE_URL -c "\dt"You should see 9 tables across the raw, stg, dim, and fact prefixes.
If you're upgrading from an older version that had workflow execution tracking, run the cleanup script:
psql $DATABASE_URL -f database/cleanup_workflow_tracking.sqlThis removes the unused fact_workflow_executions table and analytics_workflow_performance view.
- Connection refused: Ensure your
DATABASE_URLis correct and the Render PostgreSQL instance is active - Permission denied: Make sure you're using the connection string with full admin privileges
- Tables already exist: Drop the database and recreate it, or use
DROP TABLE IF EXISTSstatements
The render.yaml file defines:
- Web Service: Next.js dashboard (
trender-dashboard) - Workflow: Main analytics pipeline (
trender-wf) - Cron Job: Hourly workflow trigger (
trender-analyzer-cron) - Database: PostgreSQL instance (
trender-db)
Deploy to Render:
- Push your code to GitHub
- In Render Dashboard, click "New +" → "Blueprint"
- Connect your GitHub repository
- Render will automatically detect and deploy all services from
render.yaml
Or use the Render CLI:
render blueprint launchAfter deploying via render.yaml, add your GitHub access token to the workflow service (trender-wf) in the Render Dashboard:
- Go to your
trender-wfworkflow in Render Dashboard - Navigate to Environment tab
- Add:
GITHUB_ACCESS_TOKEN: The token you generated in step 2 (starts withghp_orgho_orgithub_pat_)DATABASE_URL: Automatically connected from the database (no action needed)
Important: After adding the token, trigger a manual deploy:
- Click "Manual Deploy" → "Clear build cache & deploy"
- This ensures the environment variables are available to your workflow tasks
Note: You only need GITHUB_ACCESS_TOKEN in Render. If you used OAuth, you don't need to add GITHUB_CLIENT_ID or GITHUB_CLIENT_SECRET to Render.
There are three ways to trigger a workflow run to populate data:
The trigger/trigger.py script uses the Render SDK to trigger workflows programmatically:
cd trigger
# Install dependencies
pip install -r requirements.txt
# Set required environment variables
export RENDER_API_KEY=your_render_api_key
export RENDER_WORKFLOW_SLUG=trender-wf # Your workflow slug from Render dashboard
# Run the trigger script
python trigger.pyExpected output:
Triggering task: trender-wf/main-analysis-task
✓ Workflow triggered successfully at 2026-01-23 12:00:00
Task Run ID: run_abc123xyz
Initial Status: running
- Go to Render Dashboard
- Navigate to Workflows section
- Select your
trender-wfworkflow - Click on the "main-analysis-task" task
- Click "Run Task" button
- Monitor the task execution in real-time
If you have the Render CLI installed:
# Install Render CLI (if not already installed)
npm install -g @render-inc/cli
# Login to Render
render login
# Trigger the workflow
render workflows trigger trender-wf main-analysis-taskCheck the workflow status:
- Via Dashboard: Go to Workflows → trender-wf → View recent runs
- Via Script: The trigger script outputs the Task Run ID
- Via Database: Query the
dim_repositoriestable to see loaded data:
psql $DATABASE_URL -c "SELECT language, COUNT(*) as count FROM dim_repositories WHERE is_current = TRUE GROUP BY language;"Expected workflow completion time: 10-20 seconds for ~150 repositories across 3 languages + Render ecosystem
- "RENDER_API_KEY not set": Export your API key from Render Settings
- "Task not found": Verify your workflow slug is
trender-wfand that the workflow is deployed - "Connection refused": Check that
DATABASE_URLis correct and the database is running - Workflow fails: Check the Render dashboard logs under Workflows → trender-wf → Logs for detailed error messages
- "GITHUB_ACCESS_TOKEN not set": Ensure you added the token to the workflow service environment variables (step 7)
Once the workflow completes, access your dashboard at:
https://trender-dashboard.onrender.com
You should see:
- Top trending repositories across Python, TypeScript, and Go
- Render ecosystem projects
- Momentum scores and analytics
- Historical trends
- Stores complete GitHub API responses
- Tables:
raw_github_repos,raw_repo_metrics - Purpose: Audit trail and reprocessing capability
- Cleaned and validated data
- Tables:
stg_repos_validated,stg_render_enrichment - Data quality scoring (0.0 - 1.0)
- Business rules applied
- Render enrichment data: service types, complexity scores, categories, blueprint indicators
- Dimensions:
dim_repositories,dim_languages,dim_render_services - Facts:
fact_repo_snapshots,fact_render_usage - Views: Pre-aggregated analytics for dashboard
- Render analytics: Service adoption metrics, complexity distributions, blueprint quality indicators
The workflow consists of 4 main tasks decorated with @task:
main_analysis_task: Orchestrator that spawns parallel tasks and coordinates the ETL pipelinefetch_language_repos: Fetches and stores trending repos for Python, TypeScript, or Goanalyze_repo_batch: Analyzes repos in batches of 10, enriching with detailed metricsfetch_render_repos: Fetches Render ecosystem repositories using multi-strategy search
The orchestrator runs 4 parallel tasks (3 languages + 1 Render ecosystem), then aggregates results through the ETL pipeline (Extract from staging → Calculate scores → Load to analytics).
The fetch_render_repos task uses a multi-strategy approach to discover Render projects:
- Repository Search with path: qualifier - Searches for repos containing
render.yamlwith recent activity (last 6 months) - render-examples Organization - Fetches official Render example repositories (high quality blueprints)
- Topic Search - Finds community repos tagged with
render-blueprintstopic
This approach maximizes coverage and ensures we capture both official and community Render projects. When a Render repo is found, the system:
- Fetches and parses the
render.yamlfile to extract service configurations - Calculates complexity scores based on number and type of services
- Categorizes projects (official, community, blueprint)
- Stores enrichment data in
stg_render_enrichmenttable - Populates
fact_render_usagefor service adoption analytics
- Momentum Score: Composite score combining:
- 50% Normalized Stars: Stars normalized within dataset (general repos vs Render repos scored separately)
- 50% Recency Score: Based on repository creation date
- 1.0 for repos ≤ 30 days old
- 0.75 for repos 31-60 days old
- 0.5 for repos 61-90 days old
- 0.0 for repos > 90 days old
- Note: Activity metrics (commits, issues, contributors) are collected but not used in scoring
cd workflows
pip install -r requirements.txt
python workflow.pycd dashboard
npm install
npm run dev
# Access at http://localhost:3000If you need to recreate or update the schema:
psql $DATABASE_URL -f database/schema/01_raw_layer.sql
psql $DATABASE_URL -f database/schema/02_staging_layer.sql
psql $DATABASE_URL -f database/schema/03_analytics_layer.sql
psql $DATABASE_URL -f database/schema/04_views.sqlOr use the complete initialization script:
psql $DATABASE_URL -f database/init.sqlIf upgrading from an older version, apply the cleanup migration:
psql $DATABASE_URL -f database/cleanup_workflow_tracking.sqlTechnical:
- Process 150 repos across 3 languages + Render ecosystem in 10-20 seconds
- 4x parallel task execution (Python, TypeScript, Go, Render)
- 3-layer data pipeline with dimensional modeling (9 tables + 6 views)
- Data quality score >= 0.70 for all loaded repositories
- Multi-strategy Render discovery (path search + org repos + topics)
Marketing:
- Showcase trending Render ecosystem projects (render.yaml repositories)
- Highlight momentum scores combining stars and recency
- Identify case study candidates with high engagement
- Track Render service adoption patterns (web, worker, cron, etc.)
MIT
Contributions welcome! Please open an issue or submit a pull request.