Skip to content

Add continuous graph updates via Git webhook and poll watcher#615

Open
Copilot wants to merge 10 commits intostagingfrom
copilot/add-continuous-graph-updates
Open

Add continuous graph updates via Git webhook and poll watcher#615
Copilot wants to merge 10 commits intostagingfrom
copilot/add-continuous-graph-updates

Conversation

Copy link
Contributor

Copilot AI commented Mar 13, 2026

Full re-indexing on every change is too slow for large repos. This adds an incremental update engine that computes a diff between two commit SHAs and only touches affected files, plus two trigger modes: a GitHub/GitLab push webhook and a background poll watcher.

Core engine — api/git_utils/incremental_update.py

  • incremental_update(repo_name, from_sha, to_sha, ignore=[]) — resolves both SHAs via pygit2, classifies file changes (added/modified/deleted), checks out to_sha, deletes stale nodes/edges, re-analyses changed files, and persists the new commit bookmark in Redis. Idempotent (from_sha == to_sha is a no-op). Accepts abbreviated or full SHAs.
  • fetch_remote(repo_path)git fetch origin via subprocess
  • get_remote_head(repo_path, branch) — returns remote branch HEAD SHA
  • repo_local_path(repo_name) — resolves clone path; respects REPOSITORIES_DIR env override

Webhook endpoint — POST /api/webhook

Accepts GitHub/GitLab push event payloads. When WEBHOOK_SECRET is set, validates X-Hub-Signature-256 with hmac.compare_digest (timing-safe). Ignores pushes to untracked branches (200 response, no retry). Resolves the target repo by URL-matching against indexed repos (normalises .git suffix and case).

// GitHub push event → triggers incremental update for the matched repo
{ "ref": "refs/heads/main", "before": "<sha>", "after": "<sha>",
  "repository": { "clone_url": "https://github.com/org/repo.git" } }

Background poll watcher

Started via FastAPI lifespan on startup (cancelled cleanly on shutdown). At each POLL_INTERVAL tick, fetches all indexed repos, compares stored commit SHA against origin/<TRACKED_BRANCH>, and calls incremental_update if behind. Handles short vs. full SHA comparison correctly (prefix match only when lengths differ).

Configuration — new env vars (documented in .env.template)

Variable Default Purpose
WEBHOOK_SECRET (empty) HMAC-SHA256 secret for webhook signature validation
TRACKED_BRANCH main Branch to watch for updates
POLL_INTERVAL 60 Seconds between poll checks; 0 disables the watcher
Original prompt

This section details on the original issue you should resolve

<issue_title>Continuous graph updates via Git webhook / branch watcher</issue_title>
<issue_description>## Summary

Add the ability for code-graph to stay in sync with a repository by automatically updating the graph on each commit to a tracked branch (e.g. main). Instead of re-indexing the entire codebase on every change, the system should compute a diff-based incremental update — only processing files that were added, modified, or deleted in the commit.

Motivation

Currently code-graph requires a full re-index to reflect codebase changes. For large repositories this is slow and wasteful. Continuous incremental updates would make code-graph viable as a live knowledge source for AI-assisted development tools (e.g. Claude Code via MCP), CI pipelines, and developer dashboards — where the graph must reflect the latest state of main at all times.

Proposed Behavior

  1. Trigger: On each push/merge to the tracked branch, the system receives a notification (Git webhook, polling, or filesystem watch).
  2. Diff extraction: Determine which files were added, modified, or deleted in the commit(s) since the last indexed commit SHA.
  3. Incremental graph update:
    • Deleted files — remove all nodes and edges originating from those files.
    • Modified files — remove existing nodes/edges for the file, re-parse, and re-insert.
    • Added files — parse and insert new nodes and edges.
    • Cross-file edges — recompute edges (calls, imports, inheritance) that involve any touched file, and prune stale edges whose targets no longer exist.
  4. Bookmark: Persist the last successfully indexed commit SHA so the system can resume correctly after restarts or failures.

Design Considerations

  • Atomicity — Graph updates for a single commit should be applied as a transaction so queries never see a half-updated state. Consider wrapping the delete + re-insert cycle in a FalkorDB transaction or using a shadow-graph swap approach for larger changesets.
  • Batch commits — If the watcher falls behind (e.g. service was down), it should be able to squash multiple commits into a single cumulative diff rather than replaying one-by-one.
  • Trigger modes — Support at least two modes:
    • Webhook — HTTP endpoint that receives a GitHub/GitLab push event payload.
    • Poll — Periodically check the remote branch HEAD and update if it has advanced.
    • (Optional) Filesystem watch — for local-only setups using inotify/fswatch on a bare repo.
  • Concurrency — Graph reads (MCP queries, API requests) should not be blocked during an update. Consider read/write isolation or short lock windows.
  • Idempotency — Re-processing the same commit SHA should be a no-op.
  • Logging & observability — Each update cycle should log: trigger commit SHA, files affected, nodes/edges added/removed, duration, and any parse errors (with the update continuing past unparseable files).

Suggested Implementation Phases

Phase 1 — Core incremental update engine

  • Given a before/after commit SHA pair, compute the file diff, update the graph accordingly, and persist the new bookmark.
  • Unit-testable in isolation (no webhook needed, just call with two SHAs).

Phase 2 — Trigger integration

  • Add the webhook HTTP endpoint (GitHub/GitLab push event format).
  • Add the poll-based watcher as an alternative.
  • Configuration: tracked branch name, poll interval, webhook secret.

Phase 3 — Robustness & observability

  • Batch catch-up for missed commits.
  • Metrics endpoint or structured logs (commits processed, lag, errors).
  • Graceful handling of force-pushes / rebases (detect non-fast-forward and trigger a full re-index as fallback).

Acceptance Criteria

  • Pushing a commit to main that adds a new function results in the corresponding node and call-edges appearing in the graph within a configurable time window (default < 30s for webhook mode).
  • Renaming/moving a function removes the old node and creates a new one with correct edges.
  • Deleting a file removes all its nodes and any dangling edges.
  • The system recovers cleanly after a restart, picking up from the last indexed SHA.
  • A full re-index can still be triggered manually as a fallback.
    </issue_description>

Comments on the Issue (you are @copilot in this section)


💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added webhook endpoint to trigger incremental graph updates via push events
    • Added automatic background polling to keep the graph continuously updated based on repository changes
    • Introduced configurable webhook security (HMAC-SHA256 validation) and polling intervals
  • Documentation

    • Updated configuration guide with new environment variables for webhook and polling settings
    • Documented the new webhook endpoint for receiving push events

- Add api/git_utils/incremental_update.py with incremental_update(),
  fetch_remote(), get_remote_head(), and repo_local_path() helpers
- Export new functions from api/git_utils/__init__.py
- Add POST /api/webhook endpoint with HMAC-SHA256 validation,
  branch filtering, and repo URL matching
- Add background poll watcher via FastAPI lifespan (_poll_loop,
  _poll_all_repos, _poll_repo)
- Add WEBHOOK_SECRET, TRACKED_BRANCH, POLL_INTERVAL env vars
- Document new env vars in .env.template
- Add tests/test_webhook.py with unit tests"

Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Copilot AI changed the title [WIP] Add continuous graph updates via Git webhook Add continuous graph updates via Git webhook and poll watcher Mar 13, 2026
Copilot AI requested a review from gkorland March 13, 2026 17:30
@gkorland gkorland marked this pull request as ready for review March 14, 2026 08:54
gkorland and others added 3 commits March 14, 2026 10:59
Resolve PR #615 conflicts, preserve the continuous update configuration, and stabilize the webhook incremental-update tests.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@gkorland gkorland requested a review from Copilot March 14, 2026 17:27
@gkorland
Copy link
Contributor

@CodeRabbit review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 14, 2026

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c09ecfd8-ac74-4a4e-90fa-90f50cce4180

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This pull request implements continuous incremental graph updates via webhooks and background polling. It adds configuration for webhook secrets, branch tracking, and polling intervals; introduces the incremental update engine for diff-based graph synchronization; exposes a POST /api/webhook endpoint with HMAC-SHA256 signature validation; and includes a background poll-watcher task with comprehensive test coverage.

Changes

Cohort / File(s) Summary
Configuration
.env.template
Added three environment variables: WEBHOOK_SECRET (for HMAC validation), TRACKED_BRANCH (default: main), and POLL_INTERVAL (polling interval in seconds; 0 disables).
Documentation
README.md
Documented new environment variables and described the /api/webhook endpoint for receiving push events and triggering incremental graph updates.
Core Incremental Update Engine
api/git_utils/incremental_update.py
Implements incremental graph updates: repo_local_path() resolves repository paths; fetch_remote() pulls latest refs; get_remote_head() retrieves branch HEAD; incremental_update() computes file diffs, removes deleted/modified nodes from the graph, and inserts/updates added/modified files. Returns summary with counts of changes and persists the updated commit bookmark in Redis.
API Integration & Webhook Handler
api/index.py
Added POST /api/webhook endpoint with optional HMAC-SHA256 signature validation, branch filtering via TRACKED_BRANCH, and invocation of incremental_update(). Integrated background poll-watcher task managed via FastAPI lifespan context manager to periodically fetch and update indexed repositories. Added repository URL matching utilities for webhook-to-repo resolution.
Public API Refactoring
api/git_utils/__init__.py
Replaced wildcard imports with explicit, named re-exports and an all list to clearly define the public API (GitRepoName, GitGraph, build_commit_graph, classify_changes, fetch_remote, get_remote_head, incremental_update, is_ignored, repo_local_path, switch_commit).
Comprehensive Test Suite
tests/test_webhook.py
Tests for URL matching logic, webhook endpoint behavior (open mode and HMAC-secured mode), payload validation, signature verification, and incremental update unit tests covering idempotence and error handling. Includes payload generation and signing utilities.

Sequence Diagram(s)

sequenceDiagram
    actor Client as Push Event/<br/>Polling Loop
    participant API as FastAPI Server<br/>/api/webhook
    participant Repo as Git Repository
    participant Analyzer as Source Analyzer
    participant Graph as FalkorDB<br/>Graph Database
    participant Redis as Redis<br/>(Bookmark)
    
    alt Webhook Path
        Client->>API: POST /api/webhook<br/>(with signature & payload)
        API->>API: Validate HMAC-SHA256<br/>Verify branch match
        API->>Repo: Extract repo from payload
    else Polling Path
        Client->>API: Background poll-watcher<br/>triggers periodically
        API->>Repo: Fetch remote & check HEAD
    end
    
    API->>Repo: Resolve from_sha, to_sha<br/>(current & latest)
    Repo->>API: Return commit objects
    API->>Repo: Compute file diff<br/>(added, modified, deleted)
    Repo->>API: Return file changeset
    
    API->>Repo: Checkout target commit
    Repo->>Repo: Update working tree
    
    Note over Graph,Analyzer: Process Changed Files
    
    par Remove Deleted/Modified
        API->>Graph: DELETE nodes & edges<br/>for removed/modified files
    and Analyze & Insert Changed
        API->>Analyzer: Analyze added/<br/>modified files
        Analyzer->>API: Return AST/symbols
        API->>Graph: INSERT new nodes/<br/>UPDATE existing edges
    end
    
    Graph->>Graph: Return change summary<br/>(added, modified, deleted counts)
    
    API->>Redis: Persist new commit SHA<br/>as bookmark
    Redis->>API: Acknowledge
    
    API->>Client: Return 200 with<br/>update summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 A webhook hops, a poller twitches,
Git diffs dance through incremental switches,
No full re-graph, just delta delights,
Graphs stay fresh through the day and nights!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main changes: adding Git webhook and poll watcher capabilities for continuous graph updates.
Linked Issues check ✅ Passed The PR implements core requirements from #614: incremental update engine with diff-based file changes, webhook endpoint with signature validation, poll watcher with configurable intervals, and persistence of commit bookmarks via Redis.
Out of Scope Changes check ✅ Passed All changes align with #614 objectives. The explicit all in init.py clarifies the public API for the new incremental_update module without introducing unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch copilot/add-continuous-graph-updates
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a continuous/incremental graph update mechanism to avoid full re-indexes on every repo change, integrating both a push-webhook trigger and a background poll-watcher into the FastAPI backend.

Changes:

  • Added api/git_utils/incremental_update.py to compute file-level diffs between two SHAs and update the FalkorDB graph incrementally while persisting the new commit bookmark in Redis.
  • Added POST /api/webhook plus URL-matching helpers and FastAPI lifespan-managed poll-watcher to trigger incremental updates automatically.
  • Documented new environment variables and the new webhook endpoint in .env.template and README.md, plus added unit tests for the webhook and incremental-update helpers.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
api/index.py Adds webhook endpoint, URL matching, poll-watcher loop, and lifespan wiring to trigger incremental updates.
api/git_utils/incremental_update.py Implements diff-based incremental update flow (checkout, delete stale file nodes, re-analyze changed files, update Redis bookmark).
api/git_utils/__init__.py Replaces wildcard export with explicit exports, including the new incremental update helpers.
tests/test_webhook.py Adds unit tests for URL matching, webhook behavior (open/secured), and basic incremental_update edge cases.
README.md Documents new env vars and the /api/webhook endpoint.
.env.template Adds WEBHOOK_SECRET, TRACKED_BRANCH, and POLL_INTERVAL configuration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/git_utils/incremental_update.py`:
- Around line 172-180: The current update deletes definitions for
deleted+modified files via g.delete_files and re-analyzes only added+modified
via analyzer.analyze_files, which leaves inbound callers/importers stale; after
computing files_to_add and files_to_remove, collect the transitive set of
dependent files that import or call those changed files (using the graph g's
reverse edges / dependency lookup), add those dependent filenames to
files_to_add (excluding already-deleted files), and then call
analyzer.analyze_files on this expanded files_to_add so
SourceAnalyzer.analyze_files reprocesses untouched callers and restores inbound
edges; ensure you use g's dependency/query methods (the graph instance g) to
find dependents before calling analyzer.analyze_files.
- Around line 166-185: Wrap the entire mutation sequence that performs
repo.checkout_tree/to_commit set_head_detached, Graph(repo_name) updates,
analyzer.analyze_files(...) and set_repo_commit(...) in an exclusive repo-scoped
lock (keyed on repo_name) so concurrent runs for the same repo cannot
interleave; acquire the lock before calling repo.checkout_tree and hold it until
after set_repo_commit (or until commit of all graph changes), use a
distributed-lock primitive if you have multiple workers, set a sensible timeout
and ensure the lock is always released in a finally/cleanup block and that
errors are logged/propagated while still releasing the lock.

In `@api/index.py`:
- Around line 497-505: The code calls incremental_update(repo_name, before,
after) using the webhook payload.before; instead, retrieve the stored bookmark
via get_repo_commit(repo_name) and use that as from_sha when calling
incremental_update (i.e., pass get_repo_commit(repo_name) as the first/from
argument), but detect mismatch: if the stored bookmark is missing or does not
equal the graph's current commit that lines up with the push history (or cannot
reach payload.before), fall back to performing a full reindex of the repo (call
the existing full reindex routine) rather than running a partial
incremental_update; update the _update closure (and its call site around
repo_local_path, fetch_remote, and loop.run_in_executor) to implement this
branching logic so the graph never advances past gaps.
- Around line 114-122: The webhook endpoint is anonymously writable because
WEBHOOK_SECRET defaults to empty; update startup and the /api/webhook handler so
webhook auth is mandatory: at startup (when reading WEBHOOK_SECRET) fail fast
with a clear error if it's empty in production mode, or modify the webhook route
handler to apply token_required when WEBHOOK_SECRET is not set (use the existing
token_required decorator) so the mutating endpoint is never unauthenticated;
ensure you update the webhook handler (the /api/webhook function) to prefer HMAC
verification when WEBHOOK_SECRET is set and fallback to token_required
otherwise, and add a clear log message indicating which auth mode is in effect.
- Around line 455-470: The webhook handler currently enforces GitHub-specific
headers and payload fields (it reads X-Hub-Signature-256 into sig_header, builds
expected_sig from WEBHOOK_SECRET, and reads repository.clone_url into repo_url),
which rejects GitLab webhooks; update the logic in the handler to detect GitLab
deliveries by checking for X-Gitlab-Token or X-Gitlab-Signature when
X-Hub-Signature-256 is absent, validate the token/signature using the configured
secret (respecting GitLab’s verification method), and when parsing payload fall
back to repository.git_http_url or project.git_http_url if repository.clone_url
is missing; alternatively, if you prefer to keep GitHub-only behavior, update
documentation to state the webhook supports GitHub only and explicitly fail with
a clear message when GitLab headers are present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cea312f7-19cf-400b-a510-c28520dca5e1

📥 Commits

Reviewing files that changed from the base of the PR and between 887b82f and 40441bf.

📒 Files selected for processing (6)
  • .env.template
  • README.md
  • api/git_utils/__init__.py
  • api/git_utils/incremental_update.py
  • api/index.py
  • tests/test_webhook.py

Reprocess dependent files during incremental updates, add repo-scoped update locking, harden webhook auth and provider handling, and fall back to full reindex when the stored bookmark no longer matches incoming history.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gkorland and others added 2 commits March 18, 2026 09:44
- Sanitize error response in webhook endpoint to avoid exposing internal
  exception details (information exposure via str(exc))
- Normalize SHA format in incremental_update no-op response to use
  consistent short SHA (to_sha[:7]) matching the non-noop path
- Add comprehensive poll-watcher unit tests covering: missing clone,
  fetch failure, up-to-date skip, behind trigger, no remote head, and
  missing bookmark scenarios

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nalysis

When _full_reindex_repository is triggered (e.g., after a force-push or
missing bookmark), the working tree may still be at the old commit. The
reindex would analyze stale files and set the bookmark to the old commit,
creating an infinite retry loop. Now accepts target_sha and checks out
the target commit before analysis when a local clone exists.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

@gkorland gkorland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature addition overall — the incremental update engine, webhook endpoint, and poll-watcher are well-structured. A few issues to address before merging.

- api/graph.py: Add _VALID_ENTITY_LABELS allowlist to get_entity_at_position
  to prevent Cypher injection via f-string label interpolation
- api/graph.py: Extract _make_falkordb_connection() helper; delete_graph_if_exists
  now accepts optional db param so callers can share a single connection
- api/index.py: Move pygit2 imports (CheckoutStrategy, Repository) to module-level;
  add subprocess and re to module-level imports
- api/index.py: Replace fragile prefix-based SHA comparison with git rev-parse
  to resolve stored short SHA to full 40-char form for unambiguous comparison;
  validate SHA hex format before passing to git
- api/index.py: Use raise ... from exc for webhook JSON parse error
- api/analyzers/{python,java,csharp}/analyzer.py: Fix graph param type to Graph
  and return types to list[Entity | ResolvedEntityRef] in resolve_type,
  resolve_method, resolve_symbol; add ResolvedEntityRef and Graph imports
- tests/test_webhook.py: Update poll-watcher tests to mock subprocess.run;
  add command assertions in mock callbacks

Co-authored-by: gkorland <753206+gkorland@users.noreply.github.com>
Copilot AI requested a review from gkorland March 18, 2026 11:15
@gkorland
Copy link
Contributor

Code Review Summary

Good feature overall — incremental update engine, webhook endpoint, and poll-watcher are well-structured with solid test coverage. Six issues flagged as inline comments:

# Severity File Issue
1 High api/graph.py:486 Cypher injection — labels interpolated into query string via f-string
2 High api/index.py:363 Fragile SHA prefix comparison in poll-watcher can silently skip updates
3 Medium api/index.py:254 Lazy pygit2 imports in fallback path hide import errors until runtime
4 Medium api/analyzers/python/analyzer.py:95 Wrong return type annotations (list[Entity] vs list[Entity | ResolvedEntityRef]) and missing graph: Graph type hint — same in Java/C# analyzers
5 Low api/graph.py:36 delete_graph_if_exists creates a new FalkorDB connection per call
6 Low api/index.py:636 raise HTTPException missing from exc — loses exception chain

See inline comments for suggested fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Continuous graph updates via Git webhook / branch watcher

3 participants