feat(v0.2): LinearAdapter, plugins, dedup, audit CLI, ConfigApprovalGate#103
feat(v0.2): LinearAdapter, plugins, dedup, audit CLI, ConfigApprovalGate#103YiWang24 wants to merge 5 commits into
Conversation
- show <task_id>: full audit trail + cost breakdown for a single task - export --format csv|json: export cost_meter rows - budget reset --repo|--global: emergency cost_meter cleanup - 9 new unit tests for parser, rendering, export
- openbot_plugins/__init__.py: plugin registry (register/get_all_plugins) - reproduce_python_issue: extract traceback, identify error, suggest repro - reproduce_js_issue: extract JS/TS stack trace, suggest repro - summarize_pr_diff: parse unified diff, group by added/modified/deleted - 11 unit tests for all 3 plugins
…GraphQL reply - LinearAdapter: ChannelAdapter implementation for Linear - verify_signature: HMAC-SHA256 of raw body - parse_event: Issue.create/update → ISSUE_OPENED/EDITED - reply/update_comment: GraphQL mutation to Linear API - Stub methods for PR/label/code operations - FastAPI route: POST /webhook/linear (same pattern as GitHub) - deps.py: verified_linear_event dependency - 8 unit tests for signature verification + event parsing
…label - ConfigApprovalGate middleware: checks if PR modifies .openbot/config.yaml without config-approved label, blocks with informative comment - 5 unit tests: non-PR, no config mention, missing label, approved, other labels
…case - dedup.py: use case with embed → search → rerank → comment pipeline - embedding.py: EmbeddingService (Voyage primary, OpenAI fallback) + search_similar_issues via pgvector cosine similarity - 003 migration: pgvector extension + issue_embeddings table + IVFFlat index - 7 unit tests for render_dedup_comment + check_duplicates flows
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
Warning Review limit reached
More reviews will be available in 53 minutes and 7 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (22)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Large feature PR (v0.2) adding ConfigApprovalGate, LinearAdapter, Audit CLI extensions, embedding/dedup pipeline, and plugin system — with a critical signature-verification bypass, several high-risk issues (pgvector type mismatch, unsized app-scoped HTTP client, unscoped budget reset, missing rerank pipeline), and medium/low concerns.
|
|
||
| def _get_http(self) -> httpx.AsyncClient: | ||
| if self._http is None: | ||
| headers = {"Content-Type": "application/json"} |
There was a problem hiding this comment.
critical — hmac.new should be hmac.HMAC (or hmac.new is not a real function — the actual function is hmac.new, but the correct call is hmac.new(key, msg, digestmod). However, the real bug is that hmac.new is lowercase — Python's module-level function is hmac.new(). Actually, hmac.new IS valid. But the signature format mismatch is the real problem: Linear docs send the signature as sha256=<hex>, but the code compares the raw header value directly against the computed hexdigest, meaning any real Linear webhook will be rejected as invalid. This effectively blocks all legitimate webhooks.
expected = hmac.new(self._webhook_secret, body, hashlib.sha256).hexdigest()
if not hmac.compare_digest(signature, expected):
| sa.UniqueConstraint("repo", "issue_number", name="uq_issue_embeddings_repo_issue"), | ||
| ) | ||
|
|
||
| # Create IVFFlat index for fast cosine similarity search |
There was a problem hiding this comment.
high — The embedding column is defined as sa.Text() but the pgvector index uses vector_cosine_ops on it. pgvector requires the column type to be vector, not text. This migration will fail at the CREATE INDEX step or silently create a broken index.
sa.Column("embedding", sa.Text(), nullable=False), # stored as vector via raw SQL
| 3. OPENBOT_POSTGRES_URL configured | ||
|
|
||
| Returns list of dicts: {issue_number, title, similarity} | ||
| """ |
There was a problem hiding this comment.
high — create_async_engine is called inside search_similar_issues() on every invocation but is never disposed. Each call leaks a connection pool / engine. The engine should be reused or explicitly disposed after use.
engine = create_async_engine(postgres_url)
async with engine.connect() as conn:
|
|
||
| def _get_http(self) -> httpx.AsyncClient: | ||
| if self._http is None: | ||
| headers = {"Content-Type": "application/json"} |
There was a problem hiding this comment.
high — The _http httpx.AsyncClient is lazily created and stored on self but never closed. Since LinearAdapter is typically an app-lifetime singleton, there should be an explicit aclose() / shutdown hook, or the client should be used as an async context manager. On shutdown or adapter replacement, this leaks connections.
self._http = httpx.AsyncClient(base_url=_GRAPHQL_URL, headers=headers, timeout=10.0)
| "feature": str(r.feature), | ||
| "model": r.model, | ||
| "cost_usd": str(r.cost_usd), | ||
| "cost_status": str(r.cost_status), |
There was a problem hiding this comment.
high — reset_budget() deletes ALL cost_meter rows when --global is passed, with no safety net beyond a --yes prompt. This is irreversible data destruction of audit/billing records. Consider soft-delete (mark as reset) or requiring a backup/export first. The session_scope commits automatically, so there's no rollback path.
if global_reset:
result = await session.execute(delete(CostMeter))
| return MiddlewareDecision.proceed() | ||
|
|
||
| # Check if the PR touches .openbot/config.yaml via raw payload | ||
| pr_body = event.raw.get("pull_request", {}).get("body", "") |
There was a problem hiding this comment.
high — The gate only checks if .openbot/config.yaml is mentioned in the PR title or body — it doesn't actually inspect the PR's changed files. A PR that modifies the config file without mentioning it in the title/body will bypass the gate entirely.
config_mentioned = _CONFIG_PATH in pr_body or _CONFIG_PATH in pr_title
if not config_mentioned:
return MiddlewareDecision.proceed()
| """ | ||
| if embedding_func is None or search_func is None: | ||
| _logger.info("dedup_skipped_not_configured") | ||
| return DedupResult(candidates=[], comment="") |
There was a problem hiding this comment.
medium — The rerank_func parameter is accepted but never called — the pipeline skips directly from search results to rendering. The docstring says 'LLM rerank → top-3 candidates' but the code just takes similar[:rerank_top] without reranking.
# Step 3: Rerank (optional)
candidates = [
DedupCandidate(...)
for s in similar[:rerank_top]
]
| async def get_issue(self, event: UnifiedEvent, issue_number: int) -> dict[str, Any]: | ||
| """Fetch Linear issue details via GraphQL.""" | ||
| query = """ | ||
| query Issue($id: String!) { |
There was a problem hiding this comment.
medium — get_issue() queries by internal Linear UUID (issue.id from the webhook payload) but the public issue_number parameter is ignored. If called with a different issue number than the event's issue, it will return the wrong issue.
async def get_issue(self, event: UnifiedEvent, issue_number: int) -> dict[str, Any]:
...
issue_id = event.raw.get("data", {}).get("id")
| @@ -1,9 +1,11 @@ | |||
| """FastAPI dependencies for the API entrypoint. | |||
There was a problem hiding this comment.
medium — The module docstring still says 'At the moment this module only exposes the GitHub webhook ingress boundary' in the first paragraph, but the diff was supposed to update it. The old docstring lines (1–5) remain unchanged because the edit only touched lines 3–8.
At the moment this module only exposes the GitHub webhook ingress boundary:
read raw bytes, verify the signature, parse the event, and surface a single
There was a problem hiding this comment.
Large feature PR adding v0.2 features (LinearAdapter, audit CLI, dedup, config approval gate, plugins). Several correctness and robustness issues found: config approval gate is trivially bypassable, migration creates a vector index on a text column, budget reset deletes data without committing, and LinearAdapter doesn't populate required pr_number field.
Repo-wide notes:
- high —
openbot/application/middleware/config_approval.py: The config approval gate checks only PR title/body for the string.openbot/config.yamlinstead of actually fetching the PR's changed files. A PR that renames/deletes config.yaml, or usesgit mv, or simply doesn't mention the path in title/body, bypasses the gate entirely. This is a security-sensitive check that must validate the actual file list from the API. - high —
openbot/infrastructure/persistence/migrations/versions/003_add_issue_embeddings.py: The migration creates an IVFFlat vector index on theembeddingcolumn, but the column is defined assa.Text(). The index usesembedding vector_cosine_opswhich requires the column to be of typevector. Mixing a plain text column with vector operators will fail at runtime. Either change the column to usevectortype (e.g. via raw DDLembedding vector(1024)) or remove the index until the column type is correct. - medium —
openbot/entrypoints/cli/audit.py: Thereset_budgetfunction executesDELETEstatements but the surroundingsession_scopecontext manager may not auto-commit. Ifsession_scopeis a read-only or rollback-on-exit context manager, the deletes will be silently lost. Additionally,result.rowcountis accessed after the session may have been closed, which can raiseMissingGreenletorDetachedInstanceErrordepending on the async session configuration. - medium —
openbot/infrastructure/adapters/linear.py: Linear issue numbers use theidentifierfield (e.g. "ENG-123") and the adapter extracts only the numeric portion. Butissue_numberisintonUnifiedEvent— if the identifier parsing fails (no hyphen),issue_numberremainsNone. More importantly, Linear identifiers are team-scoped ("ENG-123" vs "OPS-123" could be the same number), so using just the integer creates ambiguity when multiple Linear teams send webhooks. - medium —
openbot/infrastructure/llm/embedding.py:search_similar_issuescreates a newAsyncEngineon every call (create_async_engine(postgres_url)) but never disposes it. This leaks connection pools. The engine should be reused (e.g. passed as a parameter or obtained from a shared factory), matching the pattern used in the audit CLI. - medium —
openbot/application/middleware/config_approval.py: The_HIGH_RISK_PREFIXEStuple is defined but never used anywhere in the gate logic. The comment says it checks for 'high-risk fields' but the implementation only checks if the config path is mentioned in PR title/body — it doesn't inspect which fields actually changed. Dead code suggests incomplete implementation.
v0.2 Features (5 commits)
Test plan
Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.