Optimize large dataset queries and retention by peter-olom · Pull Request #1 · peter-olom/raphael

peter-olom · 2026-04-27T11:05:56Z

Summary

Optimize trace and wide-event list/search/query paths for multi-million row datasets with capped limits, list projections, keyset pagination, and drop-scoped composite indexes.
Update the React UI to load only the latest 100 rows by default, expose "load older" pagination, and lazy-load full event attributes for detail views.
Add repeatable large dataset seed/perf scripts and pagination/retention smoke tests.
Replace inline retention cleanup with a low-impact, non-overlapping background scheduler with admin diagnostics.

Retention behavior

No retention cleanup in dashboard/list/query/read request paths.
No heavy cleanup in ingestion paths.
Scheduler is bounded, indexed, batched, time-budgeted, alternates traces/wide_events, backs off on SQLite busy/locked errors, and avoids automatic VACUUM.
Added /api/admin/retention scheduler diagnostics and README env docs.

Verification

npm run test:pagination passed.
npm run test:retention-paths passed.
npm run test:retention passed on 200k traces / 50k events: indexed predicates used, deleted 5k traces + 5k events in 10 batches (~25ms), fresh rows preserved, WAL checkpoint busy=0.
Large retention run passed: npm run test:retention -- --traces=2100000 --events=200000 --batch=1000 --max-ms=100 deleted 16k traces + 15k events in 31 batches (~102.7ms), indexed predicates used, busy=false.
npm run build passed.
Large API perf verification on 2.1M traces / 200k events: list/search/query endpoints completed in low single-digit ms except stats (~154ms).
Browser proof captured the events tab on the generated large dataset showing large stats with only the capped latest rows loaded and pagination available.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e3ed79c00

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T11:10:42Z

+  }
  return db.prepare(`
-    SELECT * FROM wide_events
+    SELECT id, drop_id, trace_id, service_name, operation_type, field_name, outcome, duration_ms, user_id, error_count, rpc_call_count, created_at


Preserve attributes in dashboard samples

getRecentWideEvents now omits the attributes column, but /api/dashboards/generate still passes these rows into profileWideEvents, which parses attributes to discover candidate fields. As a result, generated dashboards lose all attribute-derived dimensions/metrics (and AI generation context) even when that data exists in wide_events, which materially reduces dashboard quality on real datasets.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T11:10:42Z


-export function getRecentWideEvents(dropId: number, limit = 100, offset = 0) {
+export function getRecentWideEvents(dropId: number, limit = 100, offset = 0, beforeId?: number | null) {
+  const lim = clampLimit(limit);


Honor requested dashboard sample limit

getRecentWideEvents now clamps every call to clampLimit (max 2000), but the dashboard generation route still accepts limits up to 20,000 for profiling. This means larger requested samples are silently truncated, so generated dashboards can be based on a much smaller and less representative slice than the caller requested.

Useful? React with 👍 / 👎.

feat: optimize large dataset queries and retention

5e3ed79

peter-olom merged commit 03f4cf8 into main Apr 27, 2026
1 check passed

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize large dataset queries and retention#1

Optimize large dataset queries and retention#1
peter-olom merged 1 commit into
mainfrom
feat/large-dataset-performance-retention

peter-olom commented Apr 27, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peter-olom commented Apr 27, 2026

Summary

Retention behavior

Verification

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants