Skip to content

Comments

feat(telemetry): add CodeQ telemetry collection system#5

Open
Kenny-Heitritter wants to merge 13 commits intodevfrom
feat/codeq-telemetry
Open

feat(telemetry): add CodeQ telemetry collection system#5
Kenny-Heitritter wants to merge 13 commits intodevfrom
feat/codeq-telemetry

Conversation

@Kenny-Heitritter
Copy link
Member

Summary

Add comprehensive telemetry collection system for CodeQ to collect agent interactions for model improvement, analytics, and debugging.

Architecture

All telemetry code is isolated in packages/opencode/src/telemetry/ with a qbraid.* config namespace to maintain easy upstream merges from opensource opencode.

New Files

  • telemetry/types.ts - TypeScript types matching the microservice schema
  • telemetry/sanitizer.ts - Redacts secrets, sensitive files, truncates large content
  • telemetry/signals.ts - Tracks implicit feedback (retries, errors, abandonment)
  • telemetry/consent.ts - Tier-based consent (free=forced opt-in, paid=opt-in)
  • telemetry/uploader.ts - Batch upload with retry and offline handling
  • telemetry/collector.ts - Main orchestration module
  • telemetry/integration.ts - Event Bus subscriptions for automatic data collection
  • telemetry/index.ts - Public API (Telemetry namespace)

Modified Files

  • src/config/config.ts - Added qbraid.telemetry config section (apiUrl, enabled, batchSize, etc.)
  • src/project/bootstrap.ts - Added Telemetry.initIntegration() call

Features

  • Privacy-first: Sanitizes secrets, API keys, credentials before upload
  • Tier-based consent: Free users have telemetry enabled by default, paid users can opt-in
  • Implicit feedback signals: Tracks retries, compactions, abandonment, errors
  • Offline resilience: Queues data when offline, retries on reconnection
  • Batch uploads: Configurable batch size and flush interval

Configuration

qbraid: {
  telemetry: {
    apiUrl: "https://qbraid-telemetry-2v5eb53w3q-uc.a.run.app",
    enabled: true,
    batchSize: 10,
    flushIntervalMs: 30000,
    maxRetries: 3,
  }
}

Related PRs

  • qBraid/qbraid-telemetry#2 - Microservice TypeScript fixes
  • qBraid/qbraid-infrastructure#7 - Cloud Run infrastructure

@github-actions
Copy link

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

The (?<!/) lookbehind prevented ANY opencode preceded by / from being
renamed, including /bin/opencode in the build script outfile path.
This caused the compiled binary to be named 'opencode' instead of
'codeq', leading to LOGO is not defined errors at runtime.

Restore the original (?<!\/opencode) lookbehind which specifically
avoids /opencode/ directory paths while still allowing /bin/opencode
to be properly renamed to /bin/codeq.
The UI namespace compiles to an IIFE in the bundled binary. Module-scope
const declarations (like LOGO) are not accessible inside the IIFE,
causing 'LOGO is not defined' at runtime when the logo() function is
called (e.g. via codeq -s <session_id>).

Fix by defining LOGO as a local constant inside the logo() function
rather than injecting it at module scope outside the namespace.
Add qBraid-specific telemetry module to collect session data for
analysis and model improvement.

Components:
- config: Add qbraid.telemetry config section under a separate namespace
  to maintain upstream compatibility when merging from opencode
- types: TypeScript types matching the telemetry service schema
- sanitizer: Strip secrets, redact sensitive files, truncate large content
- signals: Track implicit feedback (retries, errors, abandonment)
- uploader: Batch and upload data with retry logic and offline handling
- consent: Tier-based consent (free=forced opt-in, paid=opt-in)
- collector: Main module that coordinates all components

Design choices for upstream compatibility:
- All telemetry code is in a separate src/telemetry/ directory
- Config uses qbraid.* namespace that upstream will ignore
- No modifications to existing session/processor logic yet

Implements: [CodeQ] Add telemetry config schema
Implements: [CodeQ] Implement TelemetryCollector module
Implements: [CodeQ] Implement content sanitizer
Implements: [CodeQ] Implement batch uploader with retry
Complete the telemetry integration with OpenCode:

Integration module (integration.ts):
- Subscribe to Session, Message, and File events from Event Bus
- Automatic tracking of sessions, turns, tool calls, and file changes
- Uses Instance.state for automatic cleanup on disposal
- Graceful shutdown flushes pending telemetry data

Bootstrap integration:
- Initialize telemetry during InstanceBootstrap
- Telemetry is now automatically enabled on startup (respects consent)
- Errors during initialization are logged but don't block startup

Exports:
- Telemetry.initIntegration() - Initialize with Event Bus integration
- Telemetry.shutdownIntegration() - Explicit shutdown (automatic via Instance.dispose)
- Telemetry.completeTurn() - Finalize assistant response
- Telemetry.userMessage() - Record user input
- Telemetry.retry() - Record turn retry

Implements: [CodeQ] Wire up telemetry module
- Fix tool call status: now properly detects error state
- Remove unused Bus import from collector.ts
- Update integration to properly record user messages with content
- Record assistant responses with model ID and token counts
- Track tool calls with status, duration, and input/output sizes
- Finalize turns and upload to telemetry service
- Deduplicate user message recording
- Get user info (userId, organizationId) from consent endpoint
- Read qBraid API key from config (provider.qbraid.options.apiKey)
- Update default telemetry endpoint to production Cloud Run URL

The telemetry now captures:
- Full user message content (sanitized)
- Full assistant response content
- Tool call metadata (name, status, duration, sizes)
- File changes with path hashes
- Session metrics (turn count, token counts, tool counts)
- Implicit signals (retries, compactions, abandonment)
… defaults

- Rewrite collector to Instance.state-scoped with dataLevel filtering and
  messageId deduplication to prevent duplicate turn finalizations
- Fix integration user-message race by retrying after 50ms when parts are
  empty; use MessageV2.Event.Updated with time.completed instead of
  step-finish; clean up turnStartTimes entries; add SIGTERM/beforeExit flush
- Fix Session.Event.Diff handler to use Snapshot.FileDiff shape (file,
  additions, deletions) instead of non-existent type/hunks/path fields
- Rewrite consent to default OFF until explicit opt-in; add setLocalConsent
  and loadLocalConsent for KV store integration; add CODEQ_DISABLE_TELEMETRY
  env var kill switch via Flag namespace
- Export setConsent/loadConsent from telemetry barrel
- Add DialogTelemetryConsent component: free-tier shows informational
  'I Understand' (forced opt-in); paid-tier shows 'Enable'/'No Thanks'
  two-button confirm/decline with keyboard navigation
- Persist consent choice to KV store (telemetry_consent_shown,
  telemetry_enabled) and propagate to Telemetry.setConsent()
- Wire dialog into app.tsx via createEffect that fires before the
  connect-provider dialog, checking KV for prior consent
- Change brand.json models from exclusive:true to exclusive:false,
  default:true so qBraid models are prepended but models.dev catalog
  and all upstream providers (Anthropic, OpenAI, Copilot, Codex) remain
- Add 'default' boolean field to branding ModelsSchema
- Update apply.ts models.ts transform: when exclusive=false, prepend
  branded models and merge with models.dev; only clear CUSTOM_LOADERS
  and BUILTIN plugin arrays when exclusive=true
- Add TypeScript HTTP client for qBraid quantum API with auth resolution
  chain (env var > CodeQ auth store > ~/.qbraid/qbraidrc)
- Define 6 in-process tools: quantum_devices, quantum_estimate_cost,
  quantum_submit_job, quantum_get_result, quantum_cancel_job,
  quantum_list_jobs
- Job submission uses ctx.ask() permission system for cost approval,
  replacing pod_mcp's fragile cost_reviewed_and_approved boolean
- Register quantum tools in tool registry
- Tightly integrated with CodeQ auth, permissions, and telemetry;
  not portable to other agents by design
… quantum

Telemetry (integration.ts, collector.ts, index.ts):
- Remove process.once SIGTERM/beforeExit handlers that crash outside
  Instance context; rely on Instance.state disposal for flush
- Move recordedUserMessages to TelemetryState (cleared per session)
  to prevent unbounded Set growth across sessions
- Remove dead assistantToUser map and dead PartUpdated text branch
- Use info.parentID for turn start time lookup instead of fragile
  insertion-order heuristic on turnStartTimes map
- Fix exported finalizeTurn() to actually call collector.finalizeTurn()
- Cache consentTier/dataLevel from initialize() to avoid redundant
  getConsentStatus() call in startSession()
- Remove duplicate retry/recordRetry export from Telemetry namespace

Consent dialog (dialog-telemetry-consent.tsx, app.tsx):
- Use DialogTelemetryConsent.show() API instead of raw dialog.replace()
  to properly wire onResult and onClose callbacks
- Default tier to 'paid' (genuine opt-out) until actual tier detection
  is implemented; 'free' forced all users into required telemetry
- Add double-fire guard on handleSelect to prevent duplicate KV writes
- Fix Esc handler: free tier forces accept, paid tier declines
- Fix unsafe 'as boolean' cast on KV value with === true check
- Add consentShown guard to prevent re-show on rebootstrap cycles
- Remove unnecessary spread in <For> component

Branding (apply.ts, schema.ts, brand.json):
- Rewrite non-exclusive models.ts get() replacement to use actual
  upstream scope variables (filepath, Bun.file, data macro, refresh)
  instead of nonexistent readCache/writeCache/url
- Add match-failure assertions to all regex transforms so upstream
  refactors produce loud build errors instead of silent failures
- Remove unused 'default' field from ModelsSchema and brand.json

Quantum (client.ts, tools.ts):
- Add Zod schemas for QuantumDevice, QuantumJob, JobResult with
  runtime .parse() validation on all API responses
- Add pricingAvailable flag to CostEstimate; warn when pricing is
  unavailable instead of silently returning 0
- Add ctx.ask() permission gate to quantum_cancel_job (destructive op)
- Thread ctx.abort signal through all tool execute -> client calls
- Add 30s default timeout via AbortSignal.timeout on all fetch calls
- Cache resolveAuth() result for 5s to avoid repeated disk reads
- Truncate error response bodies to 500 chars
- Normalize job status with .toUpperCase() for consistent comparison
- Wrap getResult() in try/catch for TOCTOU race after status check
- Fix pricing display from '$' prefix to 'credits' suffix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant