Analyze git repositories to measure developer productivity — classify commit work types, track weekly velocity, and export CSV/JSON/Markdown reports.
tga walks one or more local git repositories, collects every commit into a SQLite database, classifies each commit into a work category (feature, bugfix, refactor, etc.) using a seven-tier classification cascade, then aggregates the results into per-author, per-week, DORA, velocity, and quality reports. It is a feature-complete Rust port of gitflow-analytics with the same YAML config schema and the same SQLite schema — existing config files work without modification.
cargo install trusty-git-analyticsThis installs the tga binary to ~/.cargo/bin/. Ensure ~/.cargo/bin is in your PATH.
git clone https://github.com/bobmatnyc/trusty-git-analytics
cd trusty-git-analytics
cargo install --path .tga --version
tga --helpStep 1 — Create a config.yaml:
repositories:
- path: ~/code/my-project
name: my-project
output:
directory: ./reports
formats: [csv, json, markdown]Step 2 — Run the full pipeline:
tga analyze --config config.yamlStep 3 — Find reports in ./reports/:
reports/
├── authors.csv # Per-author commit summary
├── weekly_activity.csv # Week-by-week breakdown
├── report.json # Full structured payload
└── report.md # Narrative Markdown report
repositories:
- path: ~/code/my-repo
name: my-repoAll other sections are optional. When output.formats is omitted, all three formats (CSV, JSON, Markdown) are written.
| Key | Type | Default | Description |
|---|---|---|---|
repositories |
list | required | Repos to analyze |
developer_aliases |
map | {} |
Canonical name → list of emails/aliases |
team |
object | — | Alternative to developer_aliases; roster with email |
output.directory |
path | ./reports |
Where reports are written |
output.formats |
list | [csv, json, markdown] |
csv, json, and/or markdown |
output.include_unclassified |
bool | false |
Include commits with no category |
output.include_merges |
bool | false |
Include merge commits |
output.include_files |
bool | false |
Include file-level change detail |
classification.rules_file |
path | — | Path to custom rules YAML/JSON |
classification.use_llm |
bool | false |
Enable LLM fallback tier |
classification.llm_model |
string | gpt-4o-mini |
LLM model identifier |
classification.confidence_threshold |
float | 0.7 |
Minimum acceptance confidence |
classification.llm_fallback_threshold |
float | 0.0 |
Commits with confidence above this value skip the LLM tier |
classification.llm_fallback_concurrency |
uint | 4 |
Max concurrent LLM requests during fallback |
github.token |
string | $GITHUB_TOKEN |
GitHub PAT for PR fetch |
github.org |
string | — | Org slug for org-wide PR queries |
github.repo |
string | — | Single repo slug (owner/name) |
github.fetch_prs |
bool | false |
Fetch pull request metadata |
github.ticket_regex |
string | — | Override regex for detecting GitHub ticket refs in commit messages |
jira.url |
string | — | JIRA base URL |
jira.username |
string | — | JIRA API username (email for Cloud) |
jira.token |
string | — | JIRA API token |
jira.project_key |
string | — | Project key filter (e.g. API) |
jira.ticket_regex |
string | — | Override regex for detecting JIRA ticket refs in commit messages |
linear.ticket_regex |
string | — | Override regex for detecting Linear ticket refs in commit messages |
pm.azure_devops.organization_url |
string | — | ADO org URL (e.g. https://dev.azure.com/myorg) |
pm.azure_devops.pat |
string | — | Azure DevOps Personal Access Token |
pm.azure_devops.project |
string | — | Default ADO project name |
pm.azure_devops.fetch_on_reference |
bool | false |
Fetch work items when AB#N refs appear in commits |
pm.azure_devops.fetch_prs |
bool | false |
Fetch ADO pull requests and reviewer data |
pm.azure_devops.ticket_regex |
string | AB#(\d+) |
Override regex for detecting ADO work item refs in commit messages |
pm.bitbucket.workspace |
string | — | Bitbucket Cloud workspace slug |
pm.bitbucket.repo_slug |
string | — | Repository slug within the workspace |
pm.bitbucket.fetch_prs |
bool | false |
Fetch Bitbucket Cloud pull request metadata |
pm.bitbucket.token |
string | $BITBUCKET_TOKEN |
Bearer token (App password or OAuth) |
pm.bitbucket.username |
string | — | Atlassian account username for Basic auth |
pm.bitbucket.app_password |
string | — | Atlassian App password for Basic auth (alternative to token) |
cache.directory |
path | — | Cache directory (supports ~) |
version |
string | — | Schema version; stored for compatibility |
profile |
string | — | Named profile; stored for compatibility |
Paths support ~ expansion. Config files from the Python gitflow-analytics tool load without changes — unknown keys are silently ignored.
developer_aliases (Python-compatible flat map):
developer_aliases:
"Alice Smith":
- "alice@company.com"
- "asmith@company.com"
- "alice@personal.dev"
"Bob Jones":
- "bob@company.com"
- "129991831+bobgithub@users.noreply.github.com"team.members (structured roster with canonical email):
team:
members:
- name: Alice Smith
email: alice@company.com
aliases:
- asmith@company.com
- alice@personal.devWhen developer_aliases is non-empty it takes precedence over team.members. Use developer_aliases when migrating an existing Python config file; use team.members for new setups where canonical email matters for downstream tooling.
See configs/example-config.yaml for a working example that covers multiple repositories, developer aliases, and CSV+Markdown output.
All subcommands accept these global flags:
| Flag | Default | Description |
|---|---|---|
--config <PATH> |
config.yaml |
Path to config YAML |
--database <PATH> |
tga.db |
Path to SQLite database |
-v / -vv / -vvv |
warnings only | Increase log verbosity |
Run the full pipeline: collect → classify → report.
tga analyze [--config <PATH>] [--database <PATH>] [--output <DIR>]
[--skip-collect] [--skip-classify] [--weeks <N>]| Flag | Description |
|---|---|
--skip-collect |
Skip Stage 1; use commits already in the database |
--skip-classify |
Skip Stage 2; use existing classifications |
--output <DIR> |
Override output.directory from config |
--weeks <N> |
Limit collection to the last N weeks (overrides config start_date) |
# Full pipeline
tga analyze --config config.yaml
# Re-run reports only (commits already collected and classified)
tga analyze --skip-collect --skip-classify --output ./reports-v2Stage 1: extract commits from git repositories into the database.
tga collect [--config <PATH>] [--database <PATH>]
[--repos <NAME,...>] [--since <DATE>] [--until <DATE>] [--weeks <N>]| Flag | Description |
|---|---|
--repos <NAME,...> |
Comma-separated list of repository names to collect; others are skipped |
--since <DATE> |
Collect commits on or after this ISO 8601 date (overrides config and --weeks) |
--until <DATE> |
Collect commits on or before this ISO 8601 date (overrides config) |
--weeks <N> |
Limit collection to the last N weeks; --since takes precedence if both supplied |
tga collect --repos my-project --since 2024-01-01 --until 2024-03-31
tga collect --weeks 4 # collect last 4 weeks across all reposStage 2: run the classification cascade over collected commits.
tga classify [--config <PATH>] [--database <PATH>]
[--rules <PATH>] [--use-llm]| Flag | Description |
|---|---|
--rules <PATH> |
Override classification.rules_file from config |
--use-llm |
Enable LLM fallback regardless of config setting |
tga classify --rules ./custom-rules.yaml --use-llmStage 3: generate reports from classified commits.
tga report [--config <PATH>] [--database <PATH>]
[--output <DIR>] [--formats <FMT,...>]| Flag | Description |
|---|---|
--output <DIR> |
Override output.directory from config |
--formats <FMT,...> |
Comma-separated: csv, json, markdown |
tga report --output ./q1-reports --formats csv,jsongit repos ──┐
│ collect SQLite (tga.db) classify SQLite report
GitHub API ──┼──────────────► [commits] ──────────────► [classif]─────────► CSV (×9)
JIRA API ────┤ (libgit2, [authors] (7-tier ► JSON
Linear API ──┤ reqwest) [pull_requests] cascade, ► Markdown
ADO API ────┘ [work_items] Rayon-parallel)
Stage 1 — collect (tga::collect): opens each repository with libgit2, walks the configured branch, extracts commit metadata and diff stats, resolves author identities, fetches GitHub PR / JIRA issue / Linear / Azure DevOps work item metadata via REST/GraphQL, and writes everything to SQLite.
Stage 2 — classify (tga::classify): reads unclassified commits from the database, runs each message through the seven-tier cascade (see below), and writes a classification verdict back. Rule-based tiers execute in parallel via Rayon.
Stage 3 — report (tga::report): reads the classified database, aggregates per-author, per-week, DORA, velocity, and quality statistics, and writes the configured output formats to the output directory.
Each commit message is tested against tiers in order. The first tier to produce a confident result wins.
Tier 0 — Manual Override (confidence 1.0): looks up the (commit_hash, repo_path) pair in the classification_overrides table. Managed via tga override add|list|remove.
Tier 1.5 — Issue Type (confidence 0.90): when the commit has ticket references resolving to rows in issue_cache, maps the upstream issue type (bug, story, task, spike, etc.) directly to a change_type.
Tier 3 — JIRA Project Mapping (confidence 0.95): when jira_project_mappings is configured, maps the JIRA project key prefix of any [A-Z]+-\d+ reference to a change_type.
Tier 4 — Exact (Aho-Corasick): builds a single finite-state machine from every keyword list across every rule and scans the message in O(n) time. Matches feat:, fix:, chore:, etc. Confidence 0.85–0.95.
Tier 5 — Regex: applies pre-compiled regex patterns from the rule set. Handles anchored conventional-commit patterns (^feat(\([^)]*\))?!?:) and JIRA ticket IDs (\b[A-Z][A-Z0-9]+-\d+\b).
Tier 6 — Fuzzy heuristics: detects merge commits (via is_merge flag or Merge pull request prefix) and reverts (via Revert prefix). No external dependencies.
Tier 7 — LLM fallback (optional, async): calls an OpenAI-compatible API (OpenRouter by default, AWS Bedrock behind the bedrock cargo feature) when tiers 0–6 leave a commit in a fallthrough category. Disabled by default; enable with analysis.llm_classification.enabled: true or --use-llm. Results are only accepted when confidence >= confidence_threshold (default 0.7).
| ID | Category | Keywords / Patterns |
|---|---|---|
cc-feat |
feature |
feat:, feature:, ^feat(...)?!?: |
cc-fix |
bugfix |
fix:, bugfix:, hotfix, ^fix(...)?!?: |
cc-chore |
chore |
chore:, ^chore(...)?!?: |
cc-docs |
documentation |
docs:, doc:, ^docs?(...)?!?: |
cc-refactor |
refactor |
refactor:, ^refactor(...)?!?: |
cc-test |
test |
test:, tests:, ^tests?(...)?!?: |
cc-ci |
ci |
ci:, ^ci(...)?!?: |
cc-perf |
performance |
perf:, ^perf(...)?!?: |
cc-style |
style |
style:, ^style(...)?!?: |
cc-build |
build |
build:, ^build(...)?!?: |
cc-revert |
revert |
revert:, ^revert(...)?!?: |
breaking-change |
breaking |
breaking change, breaking-change |
jira-ticket |
feature (ticketed) |
\b[A-Z][A-Z0-9]+-\d+\b |
kw-bug |
bugfix |
bug, defect |
kw-security |
bugfix (security) |
security, cve-, vulnerability |
Commits that match no rule are assigned category uncategorized with confidence 0.0.
Supply your own rules alongside the defaults:
# my-rules.yaml
version: "1.0"
rules:
- id: my-deploy
category: deployment
keywords:
- "deploy:"
- "release:"
patterns:
- "(?i)^deploy(ment)?:"
priority: 80
confidence: 0.9tga classify --rules ./my-rules.yaml
# or in config.yaml:
# classification:
# rules_file: ./my-rules.yamlTwo files are written when csv is in the format list:
authors.csv — one row per author:
| Column | Description |
|---|---|
name |
Canonical author name |
email |
Canonical author email |
commit_count |
Total commits |
insertions |
Total lines added |
deletions |
Total lines deleted |
files_changed |
Total files changed |
first_commit |
ISO 8601 timestamp of earliest commit |
last_commit |
ISO 8601 timestamp of most recent commit |
weekly_activity.csv — one row per week/author/repository bucket:
| Column | Description |
|---|---|
week |
ISO week label, e.g. 2024-W03 |
author |
Author name |
repository |
Repository name |
commit_count |
Commits in this bucket |
insertions |
Lines added in this bucket |
deletions |
Lines deleted in this bucket |
report.json — full structured payload:
{
"generated_at": "2024-03-15T10:00:00Z",
"period_start": "2024-01-01T00:00:00Z",
"period_end": "2024-03-14T23:59:59Z",
"total_commits": 347,
"total_authors": 8,
"category_breakdown": { "feature": 120, "bugfix": 45, ... },
"authors": [
{
"name": "Alice Smith",
"email": "alice@company.com",
"commit_count": 87,
"insertions": 4200,
"deletions": 1100,
"files_changed": 310,
"categories": { "feature": 50, "bugfix": 20, ... },
"first_commit": "...",
"last_commit": "..."
}
],
"repositories": [
{
"name": "my-project",
"commit_count": 347,
"author_count": 8,
"insertions": 18000,
"deletions": 6000,
"top_categories": [["feature", 120], ["bugfix", 45]]
}
],
"weekly_activity": [
{
"week": "2024-W03",
"author": "Alice Smith",
"repository": "my-project",
"commit_count": 12,
"insertions": 500,
"deletions": 120,
"categories": { "feature": 8, "bugfix": 4 }
}
]
}report.md — a narrative report containing a summary header, per-author commit table, category breakdown, and weekly activity section. Suitable for pasting into Confluence or a PR description.
# Build everything
cargo build
# Build release binary
cargo build --release
# Run all tests
cargo test
# Lint (zero warnings required)
cargo clippy -- -D warnings
# Format check (CI gate)
cargo fmt --check
# Auto-format
cargo fmt
# Generate rustdoc
cargo doc --openconfigs/example-config.yaml is a working example that analyzes repositories using developer_aliases. Copy it, adjust paths and names to match your setup, then run:
tga analyze --config configs/example-config.yaml --database tga.dbThe GitHub Actions workflow (ci.yml) requires:
cargo fmt -- --checkcargo clippy --all-targets -- -D warningscargo testcargo doc --no-depswithRUSTDOCFLAGS="-D warnings"
Single tga crate (consolidated from the original 5-crate workspace):
| Module | Path | Purpose |
|---|---|---|
tga::core |
src/core/ |
Shared types, config, DB schema, migrations, error types |
tga::collect |
src/collect/ |
Stage 1: git extraction (libgit2), GitHub/JIRA/Linear/ADO clients, PmAdapter trait |
tga::classify |
src/classify/ |
Stage 2: seven-tier classification cascade |
tga::report |
src/report/ |
Stage 3: CSV/JSON/Markdown output |
commands (binary-private) |
src/commands/ |
Subcommand handlers wired into src/main.rs |
Non-commercial use only. See LICENSE for terms.
Elastic License 2.0