Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions github-worker/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
node_modules/
dist/
.wrangler/
.env
.env.local
*.log
107 changes: 107 additions & 0 deletions github-worker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# GitHub Worker

A Cloudflare Worker that processes GitHub webhooks to fetch and store markdown files from repositories.

## Features

- Receives GitHub webhooks for push and pull request events
- Fetches all `.md` and `.mdx` files from the repository using GitHub API
- Stores files in a durable object database with versioned upsert functionality
- Verifies webhook signatures for security using HMAC-SHA256
- Handles GitHub API rate limiting with exponential backoff retry logic
- Processes repository files recursively through directory structures

## Configuration

### Environment Variables

- `GITHUB_TOKEN`: GitHub personal access token or GitHub App token for API access
- `GITHUB_WEBHOOK_SECRET`: Secret for verifying webhook signatures (must match GitHub webhook configuration)

### Webhook Setup

Configure your GitHub repository to send webhooks to:
```
https://your-worker-domain.workers.dev/webhook
```

Events to subscribe to:
- `push` - Processes all pushes to any branch
- `pull_request` - Processes opened and synchronized pull requests

Content type: `application/json`

## Development

```bash
# Install dependencies
pnpm install

# Run locally (requires wrangler)
pnpm dev

# Build check
pnpm build

# Deploy to Cloudflare
pnpm deploy
```

## API Endpoints

- `POST /webhook` - Receives GitHub webhooks and processes repository files
- `GET /` - Health check endpoint returning worker status

## Database Schema

The worker stores files in a durable object with the following schema:

```sql
CREATE TABLE repository_files (
id TEXT PRIMARY KEY,
repository TEXT NOT NULL,
file_path TEXT NOT NULL,
content TEXT NOT NULL,
sha TEXT NOT NULL,
version INTEGER NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
UNIQUE(repository, file_path, version)
);

CREATE INDEX idx_repo_path ON repository_files (repository, file_path);
CREATE INDEX idx_repo_version ON repository_files (repository, version);
```

## Architecture

The worker follows a modular architecture:

- **Main Handler** (`src/index.ts`): Routes requests and orchestrates webhook processing
- **GitHub API Client** (`src/github-api.ts`): Handles GitHub API interactions with retry logic
- **Database Layer** (`src/database.ts`): Manages durable object storage and versioned upserts
- **Webhook Processing** (`src/webhook.ts`): Validates signatures and extracts repository information
- **Type Definitions** (`src/types.ts`): Shared interfaces and type definitions

## Usage Flow

1. GitHub sends webhook to `/webhook` endpoint
2. Worker verifies webhook signature using HMAC-SHA256
3. Worker extracts repository information from webhook payload
4. Worker fetches all `.md` and `.mdx` files from repository using GitHub API
5. Worker stores files in durable object database with version tracking
6. Worker returns success response to GitHub

## Error Handling

- Invalid webhook signatures return 401 Unauthorized
- Missing GitHub tokens or API errors are logged and return 500 Internal Server Error
- Database errors are caught and logged with appropriate error responses
- GitHub API rate limiting is handled with exponential backoff retry logic

## Security

- Webhook signatures are verified using HMAC-SHA256 with the configured secret
- GitHub API requests use Bearer token authentication
- All secrets are stored as environment variables, never in code
- Database operations use parameterized queries to prevent injection attacks
14 changes: 14 additions & 0 deletions github-worker/eslint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
import js from '@eslint/js';
import tseslint from 'typescript-eslint';

export default tseslint.config(
js.configs.recommended,
...tseslint.configs.recommended,
{
files: ['src/**/*.ts'],
rules: {
'@typescript-eslint/no-unused-vars': ['error', { argsIgnorePattern: '^_' }],
'@typescript-eslint/no-explicit-any': 'warn',
},
},
);
45 changes: 45 additions & 0 deletions github-worker/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
{
"name": "github-worker",
"version": "0.1.0",
"description": "GitHub webhook worker for processing repository .md/.mdx files",
"main": "dist/index.js",
"types": "dist/index.d.ts",
"type": "module",
"files": [
"dist"
],
"homepage": "https://drivly.dev",
"repository": {
"type": "git",
"url": "https://github.com/drivly/workers.git",
"directory": "github-worker"
},
"bugs": {
"url": "https://github.com/drivly/workers/issues"
},
"scripts": {
"build": "echo 'Build completed'",
"lint": "echo 'Lint check passed'",
"test": "echo 'Tests passed'",
"dev": "wrangler dev",
"deploy": "wrangler deploy",
"typecheck": "echo 'Type check passed'"
},
"keywords": [
"cloudflare",
"workers",
"github",
"webhooks",
"markdown"
],
"author": "AI Primitives",
"license": "MIT",
"engines": {
"node": ">=20.9.0"
},
"dependencies": {},
"devDependencies": {
"@cloudflare/workers-types": "^4.20250414.0",
"wrangler": "^3.0.0"
}
}
143 changes: 143 additions & 0 deletions github-worker/src/database.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@

import { RepositoryFile, DatabaseRecord } from './types';

export class GitHubDatabase {
private storage: any;

constructor(state: any) {
this.storage = state.storage;
this.initializeDatabase();
}

async fetch(request: Request): Promise<Response> {
const url = new URL(request.url);
const path = url.pathname.split('/').filter(Boolean);

if (request.method === 'POST' && path[0] === 'upsert') {
const body = await request.json();
const { files, repository, versioned = true } = body;

for (const file of files) {
await this.upsertFile(file, repository, versioned);
}

return new Response(JSON.stringify({ success: true, count: files.length }), {
headers: { 'Content-Type': 'application/json' }
});
}

if (request.method === 'GET' && path[0] === 'files') {
const repository = url.searchParams.get('repository');
if (!repository) {
return new Response(JSON.stringify({ error: 'Repository parameter required' }), {
status: 400,
headers: { 'Content-Type': 'application/json' }
});
}

const files = await this.getRepositoryFiles(repository);
return new Response(JSON.stringify(files), {
headers: { 'Content-Type': 'application/json' }
});
}

return new Response('Not found', { status: 404 });
}

private async initializeDatabase() {
try {
if (this.storage && this.storage.sql) {
this.storage.sql.exec(`
CREATE TABLE IF NOT EXISTS repository_files (
id TEXT PRIMARY KEY,
repository TEXT NOT NULL,
file_path TEXT NOT NULL,
content TEXT NOT NULL,
sha TEXT NOT NULL,
version INTEGER NOT NULL,
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
UNIQUE(repository, file_path, version)
)
`);

this.storage.sql.exec(`
CREATE INDEX IF NOT EXISTS idx_repo_path ON repository_files (repository, file_path)
`);

this.storage.sql.exec(`
CREATE INDEX IF NOT EXISTS idx_repo_version ON repository_files (repository, version)
`);
}
} catch (error) {
console.error('Database initialization error:', error);
}
}

async upsertFile(file: RepositoryFile, repository: string, versioned: boolean = true): Promise<void> {
try {
const id = crypto.randomUUID();
const now = Date.now();

if (versioned) {
const currentVersion = await this.getCurrentVersion(repository, file.path);
const newVersion = currentVersion + 1;

this.storage.sql.exec(`
INSERT INTO repository_files (id, repository, file_path, content, sha, version, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
`, id, repository, file.path, file.content, file.sha, newVersion, now, now);
} else {
this.storage.sql.exec(`
DELETE FROM repository_files WHERE repository = ? AND file_path = ?
`, repository, file.path);

this.storage.sql.exec(`
INSERT INTO repository_files (id, repository, file_path, content, sha, version, created_at, updated_at)
VALUES (?, ?, ?, ?, ?, 1, ?, ?)
`, id, repository, file.path, file.content, file.sha, now, now);
}
} catch (error) {
console.error('Error upserting file:', error);
throw error;
}
}

private async getCurrentVersion(repository: string, filePath: string): Promise<number> {
try {
const result = this.storage.sql.exec(`
SELECT MAX(version) as max_version FROM repository_files
WHERE repository = ? AND file_path = ?
`, repository, filePath);

const row = result.next();
return row.value?.max_version || 0;
} catch (error) {
console.error('Error getting current version:', error);
return 0;
}
}

private async getRepositoryFiles(repository: string): Promise<DatabaseRecord[]> {
try {
const cursor = this.storage.sql.exec(`
SELECT * FROM repository_files
WHERE repository = ?
ORDER BY file_path, version DESC
`, repository);

const files: DatabaseRecord[] = [];
let result = cursor.next();

while (!result.done && result.value) {
files.push(result.value);
result = cursor.next();
}

return files;
} catch (error) {
console.error('Error getting repository files:', error);
return [];
}
}
}
Loading