Skip to content

Optimize large dataset queries and retention#1

Merged
peter-olom merged 1 commit into
mainfrom
feat/large-dataset-performance-retention
Apr 27, 2026
Merged

Optimize large dataset queries and retention#1
peter-olom merged 1 commit into
mainfrom
feat/large-dataset-performance-retention

Conversation

@peter-olom
Copy link
Copy Markdown
Owner

Summary

  • Optimize trace and wide-event list/search/query paths for multi-million row datasets with capped limits, list projections, keyset pagination, and drop-scoped composite indexes.
  • Update the React UI to load only the latest 100 rows by default, expose "load older" pagination, and lazy-load full event attributes for detail views.
  • Add repeatable large dataset seed/perf scripts and pagination/retention smoke tests.
  • Replace inline retention cleanup with a low-impact, non-overlapping background scheduler with admin diagnostics.

Retention behavior

  • No retention cleanup in dashboard/list/query/read request paths.
  • No heavy cleanup in ingestion paths.
  • Scheduler is bounded, indexed, batched, time-budgeted, alternates traces/wide_events, backs off on SQLite busy/locked errors, and avoids automatic VACUUM.
  • Added /api/admin/retention scheduler diagnostics and README env docs.

Verification

  • npm run test:pagination passed.
  • npm run test:retention-paths passed.
  • npm run test:retention passed on 200k traces / 50k events: indexed predicates used, deleted 5k traces + 5k events in 10 batches (~25ms), fresh rows preserved, WAL checkpoint busy=0.
  • Large retention run passed: npm run test:retention -- --traces=2100000 --events=200000 --batch=1000 --max-ms=100 deleted 16k traces + 15k events in 31 batches (~102.7ms), indexed predicates used, busy=false.
  • npm run build passed.
  • Large API perf verification on 2.1M traces / 200k events: list/search/query endpoints completed in low single-digit ms except stats (~154ms).
  • Browser proof captured the events tab on the generated large dataset showing large stats with only the capped latest rows loaded and pagination available.

@peter-olom peter-olom merged commit 03f4cf8 into main Apr 27, 2026
1 check passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e3ed79c00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/server/db/sqlite.ts
}
return db.prepare(`
SELECT * FROM wide_events
SELECT id, drop_id, trace_id, service_name, operation_type, field_name, outcome, duration_ms, user_id, error_count, rpc_call_count, created_at
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve attributes in dashboard samples

getRecentWideEvents now omits the attributes column, but /api/dashboards/generate still passes these rows into profileWideEvents, which parses attributes to discover candidate fields. As a result, generated dashboards lose all attribute-derived dimensions/metrics (and AI generation context) even when that data exists in wide_events, which materially reduces dashboard quality on real datasets.

Useful? React with 👍 / 👎.

Comment thread src/server/db/sqlite.ts

export function getRecentWideEvents(dropId: number, limit = 100, offset = 0) {
export function getRecentWideEvents(dropId: number, limit = 100, offset = 0, beforeId?: number | null) {
const lim = clampLimit(limit);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor requested dashboard sample limit

getRecentWideEvents now clamps every call to clampLimit (max 2000), but the dashboard generation route still accepts limits up to 20,000 for profiling. This means larger requested samples are silently truncated, so generated dashboards can be based on a much smaller and less representative slice than the caller requested.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants