A free SaaS that visualizes how any public Git repository's codebase grows over time, broken down by language and prod/test — with zero server-side compute. All git operations and line counting happen in the user's browser.
Working name: git strata. Tagline: Stratigraphy for your code.
The scripts/ folder contains a Go-based LoC counter (main.go, git.go, stats.go) that analyzes a local git repo.
Port its core logic to TypeScript, running entirely in the browser via Web Workers. The Go code is the reference for:
- Skip patterns (
stats.go:skipPatterns) - Date-gap filling (
main.go) - Blob deduplication approach (
git.go)
The Go code's hardcoded language categories (Rust, TS, Go, Svelte, Astro, CSS, Docs) are not carried over. Instead, use the dynamic language registry described below.
┌───────────────────────────────────────────────────────────┐
│ Browser │
│ │
│ ┌───────────────┐ Comlink ┌─────────────────┐ │
│ │ Main thread │ ◄───────────────► │ Web Worker │ │
│ │ SvelteKit UI │ (RPC + progress) │ │ │
│ │ Chart.js │ │ isomorphic-git │ │
│ │ Results table│ │ lightning-fs │ │
│ └──────┬────────┘ │ Line counting │ │
│ │ └───────┬─────────┘ │
│ IndexedDB │ │
│ (clone cache + results cache) │ │
└──────────────────────────────────────────────┼────────────┘
│ HTTPS
┌─────────▼────────────┐
│ CORS proxy │
│ (Cloudflare Worker) │
│ ~30 lines, no │
│ compute, just │
│ forwards bytes │
└─────────┬────────────┘
│
┌─────────▼────────────┐
│ Git host │
│ (GitHub, GitLab, │
│ Bitbucket, etc.) │
└──────────────────────┘
Key constraint: The CORS proxy is the only server component. It does zero processing — it adds CORS headers and passes bytes through. Everything else runs client-side.
Pick the latest versions of these from NPM, NOT based on your training data!
| Component | Technology |
|---|---|
| Framework | SvelteKit + Svelte 5 (static adapter) |
| Styling | Tailwind CSS v4 |
| Git in browser | isomorphic-git |
| Browser filesystem | @isomorphic-git/lightning-fs |
| Charting | Chart.js |
| Web Worker RPC | Comlink |
| CORS proxy | Cloudflare Worker + Hono |
| Deployment (frontend) | Cloudflare Pages (or Vercel, or Netlify) |
| Package manager | pnpm |
All deps must have MIT or similarly permissive licenses.
A single-page app. Clean, minimal. Three elements above the fold:
- Headline: "Stratigraphy for your code" with one-sentence description.
- Repo URL input: Text field with placeholder
Example: https://github.com/sveltejs/svelte. Next to it, an "Analyze" button. Submitting also works with Enter. - Quick examples: Three clickable repo links below the input (pick well-known repos of different sizes: one small, one medium, one large). Clicking one fills the input and starts analysis.
The repo URL is reflected in the URL's query parameter (?repo=...) so results are shareable via URL.
Input validation: Accept URLs like:
https://github.com/owner/repohttps://github.com/owner/repo.githttps://gitlab.com/owner/repoowner/repo(shorthand, assumes GitHub)
After the user submits a repo URL:
- The input area stays visible at the top (user can cancel and enter a different repo).
- Below it, a progress section appears:
- Progress bar with bytes received (for example: "Downloading... 12.4 MB / ~38 MB")
- Elapsed time
- "Cancel" button that immediately stops the clone and cleans up
- If the repo was previously analyzed and cached, skip to phase 3 with a "Last analyzed: YYYY-MM-DD" badge and a "Refresh" button to re-clone.
isomorphic-git clone call:
await git.clone({
fs,
http,
dir: `/${repoId}`,
url: repoUrl,
corsProxy: CORS_PROXY_URL,
singleBranch: true,
ref: defaultBranch, // 'main' or 'master', detect via git.getRemoteInfo2 first
onProgress: (event) => {
/* update progress bar */
},
onAuth: () => ({ cancel: true }), // public repos only for now — fail fast on auth
})Default branch detection: Before cloning, call git.getRemoteInfo2({ http, corsProxy, url }) to discover the
default branch (HEAD symref). This avoids hardcoding main.
After the clone is complete (or was cached):
- Progress bar updates to show commit processing:
- "Processing commits... 142 / 387"
- Estimated time remaining (based on running average of time per commit)
- "Cancel" button
- A live preview chart starts rendering as results come in — each processed day adds a data point. This gives the user something to look at while processing continues.
Once processing is complete, display:
A stacked area chart (Chart.js) showing lines of code over time. Languages are shown dynamically based on what the repo actually contains — there is no hardcoded list of chart layers.
Display-time filtering:
- Sort languages by their line count at the latest data point (descending)
- Show each language that accounts for ≥5% of total lines as its own layer
- Group all remaining languages into an "Other" layer
- Languages with 0 lines never appear — not in the chart, not in the legend, not as toggle options
- For languages that have prod/test separation: if test lines are <10% of that language's total, merge them (don't show a separate test layer). Otherwise, show two layers: "Python (prod)" and "Python (test)"
Color assignment: Use a palette of 12–16 visually distinct colors, tested against protanopia, deuteranopia, and tritanopia. Assign colors to languages in order of line count (largest language gets the first color, etc.). "Other" always gets light gray. Each language also gets a lighter tint for its test layer (when shown).
Chart interactivity:
- Hover: tooltip showing date, total lines, and per-language breakdown
- Click on legend to toggle languages
- Zoom/pan with mouse wheel and drag (use the Chart.js zoom plugin)
- Reset zoom button
View toggles (buttons above the chart):
- "All" (default) — shows all languages with prod/test split where applicable
- "Prod vs test" — two layers: total prod, total test (aggregated across all languages)
- "Languages only" — collapses prod/test per language, shows one layer per language
- "Velocity" — shows daily line-count changes as a line chart plus a 7-day rolling average. Y axis shows lines/day (for example, "5k/day"), is not stacked, and can go negative.
Era markers — a toggleable overlay (persisted to localStorage) that draws vertical dashed lines at AI-era milestones (Copilot GA 2022, Agentic era 2025, Opus 4.5 2025). Only markers within the repo's date range are shown. Works in all view modes.
- Total lines — current line count with repo size
- Prod / test split — percentage breakdown with mini pie chart
- Average growth — lines/day overall and last 90 days, with trend arrow
- Age — human-readable duration, start date, and optional "last active Mon YYYY" line shown when the last meaningful code change (>10 non-meta lines) is more than 30 days before the last commit. Also shows a dead-repo indicator when inactive for 6+ months.
- Peak day — single largest daily growth, with date
- Contributors — distinct author count derived from
DayStats.authors, with top-N commit-day concentration percentage
A sortable table with one row per date and dynamic columns based on detected languages (matching the chart layers). Show
a "Copy CSV" button and a "Download CSV" button. CSV columns: date, then one column per displayed language (using the
same ≥5% threshold), then other, then total.
A "Copy link" button that copies the current URL (which has ?repo=...). When someone opens that link, the app loads
the landing page, starts analyzing immediately, and shows results (or cached results).
All language support is defined in a single languages.ts file. Each entry declares how to recognize and categorize
files for that language. The app ships with definitions for the top ~35 languages — no code changes are needed to
support a new language, just add an entry to the registry.
interface LanguageDefinition {
/** Unique identifier, used as map key (e.g. 'python', 'cpp', 'rust') */
id: string
/** Display name (e.g. 'Python', 'C++', 'Rust') */
name: string
/** File extensions including the dot (e.g. ['.py', '.pyw']) */
extensions: string[]
/**
* Optional: filename patterns that identify test files for this language.
* Glob-style patterns matched against the basename.
* Example for Python: ['test_*.py', '*_test.py', 'conftest.py']
* Example for Go: ['*_test.go']
*/
testFilePatterns?: string[]
/**
* Optional: directory names that indicate test code.
* Falls back to the global defaults: ['test', 'tests', '__tests__', 'e2e', 'testutil', 'testdata']
*/
testDirPatterns?: string[]
/**
* Optional: a function that counts inline test lines within a prod file.
* Used for languages like Rust where tests live inside prod files (#[cfg(test)] blocks).
* Returns the number of lines that should be classified as test code.
*/
countInlineTestLines?: (content: string) => number
/** Whether this is a "meta" category (like Docs or Config) rather than a programming language */
isMeta?: boolean
}Included languages (non-exhaustive, expand as needed):
| Language | Extensions | Test file heuristics |
|---|---|---|
| Python | .py, .pyw |
test_*.py, *_test.py, conftest.py |
| JavaScript | .js, .jsx, .mjs, .cjs |
*.test.js, *.spec.js |
| TypeScript | .ts, .tsx, .mts, .cts |
*.test.ts, *.spec.ts, *.test.tsx |
| Rust | .rs |
#[cfg(test)] inline detection + tests/ dir |
| Go | .go |
*_test.go |
| C | .c, .h |
Generic dir heuristic |
| C++ | .cpp, .cc, .cxx, .hpp, .hxx |
Generic dir heuristic |
| C# | .cs |
*Tests.cs, *Test.cs |
| Java | .java |
*Test.java, src/test/ dir |
| Kotlin | .kt, .kts |
*Test.kt |
| Swift | .swift |
*Tests.swift |
| Objective-C | .m, .mm |
Generic dir heuristic |
| Zig | .zig |
Generic dir heuristic |
| Ruby | .rb |
*_test.rb, *_spec.rb |
| PHP | .php |
*Test.php |
| Scala | .scala |
*Test.scala, *Spec.scala |
| Dart | .dart |
*_test.dart |
| Elixir | .ex, .exs |
*_test.exs |
| Haskell | .hs |
*Spec.hs |
| Lua | .lua |
*_test.lua, *_spec.lua |
| Perl | .pl, .pm |
*.t |
| R | .r, .R |
test-*.R, test_*.R |
| Julia | .jl |
Generic dir heuristic |
| Clojure | .clj, .cljs, .cljc |
*_test.clj |
| Erlang | .erl |
*_SUITE.erl |
| OCaml | .ml, .mli |
Generic dir heuristic |
| F# | .fs, .fsx |
Generic dir heuristic |
| Shell | .sh, .bash, .zsh |
Generic dir heuristic |
| PowerShell | .ps1, .psm1 |
*.Tests.ps1 |
| HTML | .html, .htm |
— |
| CSS | .css, .scss, .sass, .less |
— |
| SQL | .sql |
— |
| Svelte | .svelte |
— |
| Vue | .vue |
— |
| Astro | .astro |
— |
| Docs | .md, .mdx, .rst, .txt, .adoc |
— (meta category) |
| Config/Data | .json, .yaml, .yml, .toml, .xml, .ini |
— (meta category) |
Ambiguity handling: .h is assigned to C by default. If the repo contains any .cpp/.cc/.cxx files, reassign
all .h files to C++ instead (check once at the start of processing each commit's tree).
Extension conflicts: Each extension maps to exactly one language. If a future language reuses an extension, the first match in registry order wins (but this is unlikely in practice).
/** Line counts for a single language in a single day */
interface LanguageCount {
/** Total lines (prod + test) */
total: number
/** Lines classified as production code (undefined if no prod/test heuristic exists) */
prod?: number
/** Lines classified as test code (undefined if no prod/test heuristic exists) */
test?: number
}
/** One row per calendar date */
interface DayStats {
date: string // 'YYYY-MM-DD'
total: number // Sum of all languages
languages: Record<string, LanguageCount> // Keyed by language id (e.g. 'python', 'rust')
comments: string[] // Commit messages for this day
authors: string[] // Unique "Name <email>" identifiers for this day
}
/** Full analysis result, stored in cache */
interface AnalysisResult {
repoUrl: string
defaultBranch: string
/** OID of HEAD at analysis time — used for freshness checking */
headCommit: string
analyzedAt: string // ISO 8601
/** Language ids that appear in this result, sorted by final-day line count descending */
detectedLanguages: string[]
days: DayStats[]
/** Git repo size in bytes from the forge API (includes history) */
repoSizeBytes?: number
/** Distinct authors across all commits */
totalContributors?: number
}
/** Progress updates from the worker */
type ProgressEvent =
| { type: 'clone'; phase: string; loaded: number; total: number }
| { type: 'process'; current: number; total: number; date: string }
| { type: 'day-result'; day: DayStats } // for live chart preview
| { type: 'done'; result: AnalysisResult }
| { type: 'error'; message: string }A minimal Cloudflare Worker using Hono. Lives in a cors-proxy/ directory with its own package.json and
wrangler.toml.
Requirements:
- Forward any incoming request to the URL specified in the path (for example:
https://proxy.gitstrata.dev/https://github.com/foo/bar.git/info/refsforwards to the GitHub URL) - Add
Access-Control-Allow-Origin: *(and the other CORS headers) to the response - Pass through all request headers (important:
Content-Type,Git-Protocol) - Pass through the response body as a stream (don't buffer)
- Rate limit: max 100 requests per minute per IP (use Cloudflare's built-in rate limiting)
- Reject requests to non-git URLs for security (only allow paths containing
/info/refsor/git-upload-pack)
For local development: Use the public https://cors.isomorphic-git.org proxy so devs don't need to deploy a Worker
locally.
The proxy should be deployable with a single pnpm run deploy from the cors-proxy/ directory.
This runs entirely in the Web Worker. Port the Go logic from scripts/ to TypeScript.
Use git.log() to get all commits on the default branch. Group by date (YYYY-MM-DD), keep the latest commit hash per
day but collect all commit messages.
const commits = await git.log({ fs, dir, ref: defaultBranch })
// Group by date, keep first (latest) hash per dateFor each day (in chronological order — important for live chart preview):
git.readTree({ fs, dir, oid: commitHash })— get the recursive file tree- Actually: resolve commit → tree, then walk recursively with
git.readTree
- Actually: resolve commit → tree, then walk recursively with
- For each file entry in the tree:
- Skip files matching
skipPatterns(lock files, binary extensions — seestats.gofor the base list) git.readBlob({ fs, dir, oid: blobOid })— get file content- Skip binary content (check for null bytes in the first 8000 bytes)
- Match the file extension against the language registry to determine the language id
- Count total lines (count
\ncharacters; add one if content doesn't end with\n) - Determine prod vs test:
- If the language has
countInlineTestLines, call it to split inline test lines from prod - Else if the file matches the language's
testFilePatterns, classify all lines as test - Else if any path segment matches
testDirPatterns(or the global defaults), classify all lines as test - Otherwise, classify all lines as prod
- If the language has
- Accumulate into
DayStats.languages[languageId] - Files whose extension matches no registry entry go into the
"other"bucket
- Skip files matching
- Send a
day-resultprogress event to the main thread so the chart updates live
For dates between the first and last commit that have no commits, carry forward the previous day's stats (with
comments: ["-"]). This matches the Go implementation.
Many files don't change between commits. The same blob OID appears across many days. Cache blob content by OID in
the worker's memory to avoid redundant reads from IndexedDB. This is the single biggest optimization — it mirrors the Go
tool's use of git cat-file --batch which deduplicates naturally.
Also cache the line count and category per (blobOid, filePath) tuple, not just the content. That way, if the same
blob appears in the same path across multiple commits, you skip both the read and the count.
Two levels of caching:
isomorphic-git + lightning-fs automatically persist the cloned repo data to IndexedDB. On revisit, the clone data is
already there. Use git.fetch() instead of git.clone() to pull only new commits.
Cache key: Repository URL (normalized: strip .git suffix, lowercase).
Eviction: Store a list of cached repos with last-access timestamps. If total IndexedDB usage exceeds 500 MB, evict the least recently used repo. Show a "Manage cache" option in the UI footer where users can clear specific repos.
After processing, store the AnalysisResult in a separate IndexedDB object store. On revisit:
- Show cached results immediately
- Display "Last analyzed: YYYY-MM-DD" badge
- Offer a "Refresh" button that fetches new commits, processes only the new days, and appends to the cached result
This "incremental refresh" is a great UX win: if a user analyzed a repo last week, refreshing only needs to process the last 7 days of commits instead of the entire history.
- Chart: Include a visually hidden summary (
aria-label) on the canvas describing the trend (for example: "Stacked area chart showing 42,000 lines of code across 12 languages over 2 years"). - Data table: Always present below the chart as an accessible alternative. Use proper
<th>andscopeattributes. - Color blindness: Use a palette tested against protanopia, deuteranopia, and tritanopia. Add pattern fills (stripes, dots) as a toggleable option.
- Keyboard navigation: All interactive elements (input, buttons, toggles, table sorting) must be keyboard-reachable.
- Reduced motion: Respect
prefers-reduced-motion— disable chart animations when set. - Screen reader: Progress updates should use
aria-live="polite"regions.
Support both light and dark themes. Use Tailwind's dark: variant keyed off prefers-color-scheme. Also provide a
manual toggle (sun/moon icon) in the header that persists to localStorage.
The chart colors need separate palettes for light and dark backgrounds. Chart.js supports dynamic theming via its scriptable options.
Both the clone phase and the processing phase must be immediately cancelable:
- Pass an
AbortControllersignal togit.clone()/git.fetch() - In the processing loop, check a cancellation flag before each commit
- On cancel: terminate the Web Worker, clean up any partial state, reset the UI
- Use Comlink's
proxy(callback)pattern for progress, and terminate the worker on cancel
| Error | User-facing message | Recovery |
|---|---|---|
| Repo not found (404) | "Couldn't find this repository. Check the URL and make sure it's public." | Show input again |
| Auth required (401/403) | "This looks like a private repository. Git strata only supports public repos for now." | Show input |
| CORS proxy down | "Our download proxy is temporarily unavailable. Please try again in a moment." | Retry button |
| Network lost mid-clone | "Network connection lost. Your partial download is saved — reconnect and try again." | Retry (resumes) |
| Repo too large (>1 GB) | "This repository is very large (>1 GB). This might use significant memory and bandwidth." | Proceed/cancel choice |
| IndexedDB full | "Your browser's storage is full. Clear some cached repos in the footer menu." | Link to cache manager |
Build with SvelteKit's static adapter → output is plain HTML/CSS/JS → deploy to Cloudflare Pages (free tier).
pnpm build # outputs to build/Configure in svelte.config.js:
import adapter from '@sveltejs/adapter-static'
export default { kit: { adapter: adapter() } }Deploy to Cloudflare Workers:
cd cors-proxy && pnpm run deploy| Variable | Where | Purpose |
|---|---|---|
PUBLIC_CORS_PROXY_URL |
Frontend (.env) | URL of the CORS proxy |
PUBLIC_ANALYTICS_ID |
Frontend (.env) | Optional analytics (Plausible, etc.) |
For local dev, PUBLIC_CORS_PROXY_URL defaults to https://cors.isomorphic-git.org.
/
├── src/
│ ├── app.html
│ ├── app.css # Tailwind v4 import + custom properties
│ ├── routes/
│ │ ├── +page.svelte # The one and only page
│ │ └── +layout.svelte # Theme toggle, footer
│ └── lib/
│ ├── components/
│ │ ├── RepoInput.svelte # URL input + validate + submit
│ │ ├── PipelineProgress.svelte # Unified 3-step pipeline progress UI
│ │ ├── ResultsChart.svelte # Chart.js stacked area chart
│ │ ├── ResultsSummary.svelte # Stats cards
│ │ ├── ResultsTable.svelte # Sortable data table + CSV export
│ │ ├── CacheManager.svelte # Footer cache management UI
│ │ └── ThemeToggle.svelte # Dark/light toggle
│ ├── worker/
│ │ ├── analyzer.worker.ts # Web Worker entry point
│ │ └── analyzer.api.ts # Comlink-wrapped interface for main thread
│ ├── git/
│ │ ├── clone.ts # Clone/fetch logic with progress
│ │ ├── history.ts # Commit log, grouping by date
│ │ └── count.ts # Line counting, prod/test classification
│ ├── languages.ts # Language registry: extensions, test patterns, inline detectors
│ ├── cache.ts # IndexedDB results cache read/write
│ ├── url.ts # Repo URL parsing and normalization
│ └── types.ts # DayStats, LanguageCount, AnalysisResult, ProgressEvent, etc.
├── cors-proxy/
│ ├── src/
│ │ └── index.ts # Hono-based Cloudflare Worker
│ ├── wrangler.toml
│ └── package.json
├── scripts/ # Reference Go implementation (read-only)
│ ├── main.go
│ ├── git.go
│ ├── stats.go
│ └── go.mod
├── static/
│ └── favicon.svg
├── tests/
│ ├── languages.test.ts # Unit tests for language registry / extension matching
│ ├── count.test.ts # Unit tests for line counting / prod-test classification
│ ├── history.test.ts # Unit tests for commit grouping / date filling
│ ├── url.test.ts # Unit tests for URL parsing
│ └── e2e/
│ └── analyze.test.ts # Playwright: enter URL → see chart (use small test repo)
├── package.json
├── pnpm-lock.yaml
├── svelte.config.js
├── vite.config.ts
├── tsconfig.json
├── .env.example # PUBLIC_CORS_PROXY_URL=https://cors.isomorphic-git.org
├── spec.md # This file
└── README.md
- Scaffold SvelteKit project with static adapter, Svelte 5, TypeScript strict, Tailwind v4
- Configure Vite, ESLint, Prettier, and Vitest
- Set up the single-page route with a placeholder layout
- Add
app.csswith Tailwind import, CSS custom properties for chart colors, light/dark tokens - Add theme toggle (dark/light) persisting to
localStorage
- Set up Web Worker with Comlink (
analyzer.worker.ts,analyzer.api.ts) - Implement
url.ts: parse and normalize repo URLs (GitHub shorthand,.gitsuffix, etc.) - Implement
clone.ts: clone/fetch with isomorphic-git + lightning-fs, progress callbacks, cancellation - Implement default branch detection via
git.getRemoteInfo2 - Implement
history.ts:git.log()→ group by date, collect messages, generate consecutive dates - Implement
languages.ts: language registry with ~35 language definitions (extensions, test patterns, inline detectors) - Implement
count.ts: line counting, extension-based language matching, prod/test classification using the registry - Implement Rust
#[cfg(test)]inline test detector (as acountInlineTestLinesfunction in the registry) - Implement blob content caching (by OID) in worker memory for deduplication
- Wire up the full pipeline: clone → log → process → return
AnalysisResult - Unit tests for
languages.ts(extension mapping, test pattern matching) - Unit tests for
count.ts(line counting, prod/test classification, inline test detection) - Unit tests for
history.ts(date grouping, gap filling) - Unit tests for
url.ts
- Build
RepoInput.sveltewith validation, Enter-to-submit, example repo links - Build
PipelineProgress.sveltewith 3-step pipeline indicator, progress bar, detail line, cancel button - Build
ResultsChart.svelte: Chart.js stacked area chart with dynamic language layers, legend, tooltips, zoom/pan - Implement display-time filtering: ≥5% languages shown individually, rest grouped as "Other"
- Implement prod/test merge: if test <10% of a language, don't show a separate test layer
- Implement dynamic color assignment based on language rank by line count
- Implement chart view toggles: "All", "Prod vs test", "Languages only"
- Implement live chart preview (render data points as processing streams in)
- Build
ResultsSummary.svelte: total lines, growth, dominant language, date range - Build
ResultsTable.svelte: sortable table with dynamic columns, "Copy CSV" and "Download CSV" buttons - Implement
?repo=...query parameter: populate input on load, auto-start analysis - Implement "Copy link" share button
- Make dark/light chart palettes (separate color sets for both backgrounds)
- Implement
cache.ts: IndexedDB store forAnalysisResult, keyed by normalized repo URL - On load with cached result: show results immediately + "Last analyzed" badge + "Refresh" button
- Implement incremental refresh: fetch new commits, process only new days, merge with cached result
- Implement
CacheManager.svelte: list cached repos, per-repo and "clear all" delete buttons - Implement cache eviction when total size exceeds 500 MB (LRU)
- Scaffold Cloudflare Worker with Hono in
cors-proxy/ - Implement request forwarding with CORS headers and streaming response
- Restrict to git protocol URLs (
/info/refs,/git-upload-pack) - Add per-IP rate limiting (100 req/min)
- Add
pnpm run deployscript for one-command deployment - Test end-to-end: frontend → proxy → GitHub → clone succeeds
- Implement all error messages from the error handling table
- Add
aria-liveregions for progress updates - Add
aria-labelto chart canvas with trend summary - Test with keyboard-only navigation
- Test chart colors with a color blindness simulator
- Add
prefers-reduced-motionsupport (disable chart animations) - Responsive layout: works well from 375px (mobile) to 2560px (ultrawide)
- Playwright end-to-end test: enter a small public repo URL → wait for chart → verify data points
- Manual test with three repos of different sizes (small, medium, large)
- Configure Cloudflare Pages deployment (or Vercel/Netlify)
- Write README with setup instructions, architecture overview, and deployment guide
- Deploy frontend and CORS proxy
- Verify production deployment end-to-end