Skip to content

Shahinyanm/hackaton

Repository files navigation

Bugsnag Auto-Triage Agent

An autonomous fintech-aware pipeline: a Bugsnag incident in, a reviewed pull request out.

TL;DR

An autonomous pipeline that picks up a Bugsnag incident, gathers code context (stacktrace, blame, related PRs, Jira), fixes the bug via Claude Code inside an isolated workspace, opens a Bitbucket pull request and pings Slack — all while a PII scrubber, a protected-path blocklist and a full audit log keep the financial core off-limits.

Built in a weekend by a 6-person team. The demo is reproducible today: bash scripts/bootstrap.sh and you get a UI with 6 seeded tasks across every lifecycle state (queued, running, done, failed, needs_human, rejected_blocklist), plus a kill-switch (POST /api/consumer/pause).

Demo in 60 seconds

git clone git@github.com:Shahinyanm/hackaton.git
cd hackaton
bash scripts/bootstrap.sh

bootstrap.sh is idempotent and does, in order:

  1. Copies .env.example.env (if missing) and prompts you to fill in credentials.
  2. Clones workspace/finance if not already cloned.
  3. docker compose build && docker compose up -d.
  4. Waits for PostgreSQL to be healthy.
  5. Runs composer install, Doctrine migrations, and messenger:setup-transports.
  6. Runs app:demo-seed — creates 6 demo tasks across all lifecycle states.

After it finishes:

URL What you get
http://localhost:5174 UI (Next.js 15) — task list, 2 s polling, status chips, audit timeline
http://localhost:4001/api/health API health check
http://localhost:4001/api/tasks Raw JSON task list

Useful commands

# Pipeline smoke test (skips Claude, Bitbucket, Slack)
docker exec hackathon-php-cli php bin/console app:smoke-task --skip-workspace

# Smoke test against a protected path (must land in rejected_blocklist)
docker exec hackathon-php-cli php bin/console app:smoke-task --skip-workspace --protected

# Re-seed demo data (wipes + recreates 6 tasks)
docker exec hackathon-php-cli php bin/console app:demo-seed --clean

# Pause / resume worker (kill switch for live demo)
curl -X POST http://localhost:4001/api/consumer/pause
curl -X POST http://localhost:4001/api/consumer/resume

# Logs
docker compose logs -f php-cli-consumer
docker compose logs -f php-fpm-input

User Flows

1. Happy path — Bugsnag incident becomes a reviewed PR

sequenceDiagram
    participant BS as Bugsnag
    participant API as Input API (Symfony)
    participant DB as PostgreSQL
    participant W as Consumer Worker
    participant CC as Claude Code
    participant BB as Bitbucket
    participant SL as Slack

    BS->>API: new event (polled every 2 min, or seed endpoint)
    API->>API: PII scrub + context gather
    API->>DB: INSERT Task(queued) + dispatch ProcessTaskMessage
    W->>DB: pull ProcessTaskMessage (FOR UPDATE SKIP LOCKED)
    W->>DB: Task → running, audit_event(started)
    W->>CC: claude -p (headless, agents from agents-config/)
    CC->>CC: read context, edit code, commit
    W->>W: BlocklistChecker.check(git diff)
    W->>BB: git push + open PR
    W->>SL: post message with PR link
    W->>DB: Task → done, audit_event(done)
Loading

2. Protected path — agent attempts to touch the financial core

sequenceDiagram
    participant W as Consumer Worker
    participant CC as Claude Code
    participant BL as BlocklistChecker
    participant DB as PostgreSQL
    participant SL as Slack

    W->>CC: claude -p (with bad bug pointing at src/Ledger/)
    CC->>CC: commit changes to src/Ledger/Account.php
    W->>BL: check(git diff --name-only)
    BL-->>W: BLOCKED on rule "src/Ledger/**"
    W->>DB: Task → rejected_blocklist
    W->>DB: INSERT BlockedAttempt + audit_event(rejected)
    W->>SL: ⚠️ Auto-fix blocked: protected path. Manual review required.
    Note over W,BB: No git push, no PR. Branch stays local.
Loading

Tech Stack

Layer Stack
Backend API + Worker PHP 8.4, Symfony 7.3 (Messenger, Doctrine, Scheduler, Console, HTTP Client)
Database PostgreSQL 16-alpine (Doctrine ORM 3.3 + Doctrine Messenger transport)
Frontend Next.js 15.1.6, React 19, TanStack Query 5.66, Tailwind CSS 3.4
AI Orchestrator Claude Code (claude -p headless) + Ralphex CLI for plan execution
Source Integration Bitbucket CLI (bbkt), Atlassian MCP (OAuth-remote) for Jira / Confluence
Notify Slack Web API (signing-secret verified webhook)
Infra Docker Compose (nginx, php-fpm-input, php-cli-consumer, postgres, claude-consumer, front)
Node runtime Node.js ≥ 22 (pnpm), Next.js dev server on port 5174

Architecture

High-level data flow (from architecture.md):

┌─────────────┐  poll      ┌──────────────┐  Messenger    ┌──────────────────────┐
│  Bugsnag    │ ◀────────  │  Input svc   │  dispatch     │  PostgreSQL          │
│  REST API   │ (2 min)    │  (Symfony,   │ ────────────▶ │  messenger_messages  │
│             │            │  Scheduler)  │               │  tasks               │
└─────────────┘            └──────┬───────┘               │  audit_events        │
                                  │ writes Task + context │  blocked_attempts    │
                                  ▼                       └────┬─────────────────┘
                           Front UI (Next.js)                  │ messenger:consume
                           polls /api/tasks 2s                 ▼
                                                       ┌──────────────────┐
                                                       │  Consumer svc    │──► Claude Code (headless)
                                                       │  (Symfony CLI)   │──► Bitbucket PR
                                                       └──────────────────┘──► Slack

Key decisions:

  • One Symfony codebase, two entrypoints. php-fpm-input serves HTTP (Bugsnag poll + UI API). php-cli-consumer runs bin/console messenger:consume async scheduler_default as a long-running worker. Shared entities, repositories and services.
  • Polling, not webhooks. The MVP has no public URL, so a #[AsCronTask('*/2 * * * *')] task hits the Bugsnag Data Access API every 2 minutes. A POST /webhooks/test seed endpoint lets the demo bypass the wait.
  • Shared workspace clone. workspace/finance is cloned once and reused: every task does git fetch && checkout master && pull && checkout -b hackathon/bugsnag-{taskId}. Sequential consumer means no race conditions.
  • Claude Code as the framework. No custom orchestrator. Agents live in agents-config/.claude/agents/*.md; initial-prompt.md and protected-paths.yml are copied into the workspace before each run.

Full diagram + sequence in architecture.md.

Database / Data Model

Core tables (all in PostgreSQL 16):

Entity Key fields Purpose
Task id (UUID), bugsnag_id (UNIQUE), error_title, status, context (JSONB), branch_name, pr_url, jira_key, cost_usd, duration_ms Incident lifecycle record
TaskStatus (enum) Queued → Running → ContextReady → ContextDispatched → ContextConsumed → Done, plus Failed, NeedsHuman, RejectedBlocklist Lifecycle state machine
AuditEvent id, task_id (FK), agent_name, event_type, payload (JSONB), created_at One row per phase / agent action; UI timeline reads from here
BlockedAttempt id, task_id (FK), blocked_path, diff_excerpt Recorded when Claude tried to modify a protected path
Setting key (PK), value KV store — used for bugsnag_last_polled_at watermark and consumer_paused flag

Messenger transports (app/config/packages/messenger.yaml):

  • async — default work queue (ProcessTaskMessage). max_retries = 0 for MVP (manual failure handling).
  • context_ready — separate queue an external Node-based consumer polls (atomic SELECT FOR UPDATE SKIP LOCKED via /api/external/tasks/next).
  • failed — dead-letter queue.

The schema lives in app/migrations/Version20260517120000.php (initial) and Version20260519140000.php (incremental).

Security & Trust Layer

The financial nature of the target codebase makes safety non-optional. Three pillars enforce it:

1. PII Scrubber — app/src/Service/PiiScrubber.php

A recursive walker over context payloads with regex matchers for EMAIL, IBAN, CARD (13–19 digits), PHONE. Matches are replaced with [REDACTED-{LABEL}] and the list of scrubbed field paths is written into the corresponding AuditEvent payload, so reviewers can see what was sanitized without seeing the original data.

2. Path Blocklist — app/src/Service/BlocklistChecker.php + agents-config/protected-paths.yml

After Claude Code commits, git diff --name-only is matched against globs (fnmatch with FNM_PATHNAME, plus a custom ** → regex expansion for nested patterns):

blocked:
  - "src/Ledger/**"          # financial core
  - "src/Payment/Core/**"    # payments
  - "src/Compliance/**"      # GDPR / audit
  - "migrations/**"          # DB schema — human review only
  - "config/*.prod.yaml"     # prod configs
  - ".env*"                  # environment variables
  - "**/*.key"
  - "**/*.pem"
  - "**/*.crt"

Any veto → task is moved to RejectedBlocklist, push is blocked, a BlockedAttempt row is written, Slack is notified. The branch stays local.

3. Audit Log — AuditEvent entity

Every phase transition, every agent action and every veto produces an audit_events row with a JSONB payload (including the PII-scrub field list). The UI task-detail timeline is a direct read from this table. Indexed by (task_id, created_at) for fast scans.

Plus

  • Slack webhook signature verification in SlackWebhookController (when SLACK_SIGNING_SECRET is set).
  • Wall-clock + budget caps on each Claude Code run: 5 minutes / 50 turns / $5 per task (whichever hits first).
  • No JWT in MVP for /api/external/* endpoints — they trust the internal Docker network. Documented honestly rather than hidden.
  • Kill switchPOST /api/consumer/pause flips settings.consumer_paused = true, which the worker checks before pulling each task.

Team

Role Person Owns
Consumer / Front Mher Shahinyan docker-compose, consumer service, UI
Input + Infra co-owner Yahor Dziukarau Bugsnag poller, context gather, queue producer
AI Engineer Vitautas Brazas Claude Code wrapper, agent configs (CLAUDE.md, .claude/agents/*)
QA / Test Tetiana Kryvko Demo dataset, unit + integration tests, humanity-of-output checks
PM / Storytelling Oksana Titarenko Demo narrative, slide deck, humanity review
Lead / Demo Konstantin Bogomolov Live presentation, voice, fallback video

MVP Scope

In scope:

  • One Bugsnag project: finapi-prod (Finance PayIn).
  • One Bitbucket repository: finance (Finance API).
  • One sequential consumer (no parallelism).
  • One task queue + one audit history table.
  • Read-only UI with a Pause button.
  • Trust layer: PII scrubber + path blocklist + audit log.

Out of scope (shown only as roadmap):

  • Slack as an input source (not just Bugsnag).
  • Multiple teams / repositories.
  • Settings UI.
  • Sandboxed test execution by the agent.
  • Deploy to staging / production.
  • Parallel consumers.

Documentation Index

Jury-facing (English, read these first):

File About
architecture.md System architecture — services, data flow, DB schema, workspace layout
demo-script.md 5-minute demo script with live walkthrough, fallback plans, Q&A answers
risks.md R1–R11 risk register + the must-have trust-layer items
feasibility-audit.md Pre-hackathon component-by-component confidence audit
handoff.md Team handoff — what's done, what's next, quick start

Internal / team-only (some still in Russian):

File About
plan.md Overall plan, milestones, time budget
branch-walkthrough.md What's in each git branch (12 commits, demo plan)
qa/README.md QA stories — 6 manual UI checks
day-0-prep.md Sunday pre-hackathon checklist
learning-roadmap.md Self-learning track for the team
ralphex-integration.md Roadmap to switch to ralphex (OSS) — 5-phase review
HANDOFF-TO-MHER.md Consumer container handoff Vitautas → Mher
contracts/queue-schemas.md JSON schemas for queue messages
contracts/agent-md-template.md Template for Claude Code agent MDs
roles/*.md Per-person plans

Principles

  1. Maximize parallelism. Yahor (Input), Vitautas (Claude Code) and Mher (Consumer + Front) work independently. Module contracts are frozen at Day-0.
  2. Trust layer is not optional. PII scrubber + path blocklist + audit log are required before the demo. The fintech story is hollow without them.
  3. Demo dataset ready before the hackathon. Tetiana collects 5–10 real bugs + their fix-PRs over the weekend.
  4. Claude Code as the framework. No custom orchestrator. We use claude -p (headless), agents in .claude/agents/, context via CONTEXT.md.
  5. One shared clone. Per-task branch, no fresh clone per task.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors