Skip to content

feat: annotation queue testset export#4261

Open
ashrafchowdury wants to merge 31 commits intomainfrom
feat/frontend-annotation-queue-testset
Open

feat: annotation queue testset export#4261
ashrafchowdury wants to merge 31 commits intomainfrom
feat/frontend-annotation-queue-testset

Conversation

@ashrafchowdury
Copy link
Copy Markdown
Contributor

@ashrafchowdury ashrafchowdury commented May 5, 2026

What's new

  • Added a feature to create testset from the annotation queue, from both traces and test case types
  • Extend the toast message function to receive links and link-text for or demand entity navigation
  • Improved the annotation focus view ui to display the keyboard shortcuts

Fixes

  • Fix chat view rendering issue on trace drawer (that changes reflected on annotation page)
  • Fix the table column rendering issue to avoid redundant information display for messages specifically
  • Fixed the table to render the evaluator column properly

QA scope (Annotation-queue)

  • Create/update testset from both the focus and all tab
  • From the focus tab you should be able to add the currently viewed scenario to testsets
  • From the all view, you should be able to select scenarios from the table add those to testset
  • From the all view, you should be able to add the entire queue to the test set if you don't select any scenarios from the table
  • It should remember the last used testset on the commit modal

claude and others added 9 commits May 4, 2026 14:56
Adds PRD, RFC, and README to docs/designs/testset-annotation-queue/.

The design covers:
- Per-scenario "Add to Testset" button in the annotate tab
- "Done with queue" screen supporting both trace and testcase queues
- All-annotations tab with row selection and upgraded commit modal
- New AddToTestsetModal component with EntityPicker integration
- Controller actions for addScenariosToTestset (trace + testcase paths)
- Default target testset heuristics and last-used persistence

https://claude.ai/code/session_01B2uQKidAr1KJ4CR9sroY2H
Key corrections:
- Replace new AddToTestsetModal with EntityCommitModal + renderModeContent
- Replace useState for selected testset with pendingTestsetSelectionAtom
  in annotationSessionController — survives re-renders and can be read
  imperatively by addScenariosToTestset without closure capture
- selectedScenarioIdsAtom for row selection (also atom, not useState)
- openAddToTestsetModal seeds pendingTestsetSelectionAtom from default
- addScenariosToTestset reads target testset from atom, not payload
- Add state atom summary table documenting all atom lifecycles

https://claude.ai/code/session_01B2uQKidAr1KJ4CR9sroY2H
Inputs (agData.inputs) spread into N columns — one per input key.
Outputs (agData.outputs) always map to a single "output" column
regardless of value shape, matching extractOutputs() behaviour in
trace/utils/selectors.ts which treats outputs as a leaf.
Annotation values add one column per evaluator slug.
…n reset

1. Trace annotation resolution: use scenarioAnnotationsAtomFamily(scenarioId)
   not a raw query by traceId. The atom handles step-based resolution to avoid
   cross-queue bleed (as documented in the controller).

2. Scope label when scope="all": addToTestsetScenarioIds() is empty for the
   "all" case — read actual count from scenarioIds() instead of falling back
   to the "all" string.

3. Row selection reset: addScenariosToTestset clears selectedScenarioIdsAtom
   after a successful export so stale selections don't persist.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Error Error May 8, 2026 11:41am

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 5, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an "Add to Testset" feature: controller atoms/actions and types for modal/export jobs, background export implementation with row builders/remapping for traces/testcases, UI wiring and modal integration, annotation column child support, and related testset API adjustments.

Changes

Add to Testset Feature

Layer / File(s) Summary
Type Shape
web/packages/agenta-annotation/src/state/types.ts, web/packages/agenta-annotation-ui/src/components/AnnotationSession/type.ts
Adds AnnotationOutputColumnDef and optional outputColumns on AnnotationDataColumnDef; introduces AddToTestsetTargetSelection type for testset picker.
Controller State & API Types
web/packages/agenta-annotation/src/state/controllers/annotationSessionController.ts, web/packages/agenta-annotation/src/state/index.ts, web/packages/agenta-annotation/src/index.ts
New exported types AddToTestsetExportJob/AddToTestsetScope; adds atoms/selectors/actions/get/set for modal lifecycle, pending selection, selected scenario IDs, export job state, canAddToTestset, and session-close cleanup.
Export Row Builders & Normalization
web/packages/agenta-annotation/src/state/testsetSync.ts
Introduces helpers to read/flatten annotation outputs, build evaluator slugs, apply output entries, normalize trace content, and exported builders buildTraceTestsetRows and buildTestcaseExportRows; refactors buildTestsetSyncPreview to use shared helpers.
Export Execution & Remapping
web/packages/agenta-annotation/src/state/controllers/annotationSessionController.ts
Implements addScenariosToTestsetAtom background flow: resolves export scope, waits for annotation/trace data, builds rows, queries existing testset revision columns, remaps rows to existing columns or creates/patches revisions, updates job state, invalidates queries, and reports status.
Annotation Column Refactor
web/packages/agenta-annotation-ui/src/components/AnnotationSession/ScenarioListView.tsx
Annotation parent columns may now expose concrete outputColumns (fallback to outputKeys); export column state carries per-child annotationDef, outputKey, and title; export label/value resolution uses title.
UI Wiring & Modal Integration
web/packages/agenta-annotation-ui/src/components/AnnotationSession/index.tsx, .../SessionNavigation.tsx, .../FocusView.tsx, .../ScenarioListView.tsx
Integrates EntityCommitModal for Add-to-Testset, wires openAddToTestsetModal, pending selection handlers, submit flow, one-time success toast, and controller actions; adds Add-to-Testset buttons (FocusView, SessionNavigation, ScenarioListView); FocusView props signature reduced (removed onSyncToTestset/isSyncing).
New UI Components & Constants
web/packages/agenta-annotation-ui/src/components/AnnotationSession/constants.ts, .../EmptyQueueState.tsx, .../SessionHeaderRight.tsx, .../SessionTitle.tsx
Adds session tabs/constants, commit modes, ADD_TO_TESTSET_TARGET_ADAPTER, and new components EmptyQueueState, SessionHeaderRight, SessionTitle.
EntityPicker / Commit UX Changes
web/packages/agenta-entity-ui/src/selection/.../types.ts, .../CascadingVariant.tsx, .../useCascadingMode.ts, .../modals/commit/*
Adds optional onDeselect to cascading picker props and hooks; useCascadingMode invokes onDeselect when selection becomes null after user interaction; reorders EntityCommitContent elements; commit modal success path now closes modal without immediate reset.
Testset API Changes
web/packages/agenta-entities/src/testset/api/api.ts, .../mutations.ts, .../api/index.ts, .../index.ts
fetchRevisionWithTestcases accepts optional testcaseLimit and may use /retrieve; adds fetchLatestRevisionWithTestcases; commitRevision accepts testsetVariantId and forwards it in payload.
Public Export Propagation
web/packages/agenta-annotation/src/state/controllers/index.ts, web/packages/agenta-annotation/src/state/index.ts, web/packages/agenta-annotation/src/index.ts
Re-exports AddToTestsetExportJob and AddToTestsetScope across controller/state package entry points.

sequenceDiagram
participant User
participant UI as AnnotationSession UI
participant State as annotationSessionController
participant API as Testset API
participant DB as Database

User->>UI: Click "Add to Testset" (single/selected/all)
UI->>State: openAddToTestsetModal({scope, scenarioIds})
State->>UI: Modal opens
User->>UI: Choose existing testset or "New" and submit
UI->>State: addScenariosToTestset({scope, mode, selection})
State->>State: Resolve scenarios, wait for annotations/traces
State->>State: buildTraceTestsetRows / buildTestcaseExportRows
State->>API: fetch revision(s) / latest revision with testcases
API->>DB: retrieve testset revision & columns
State->>State: remap rows to existing columns or prepare new revision
State->>API: create or patch revision (commit)
API->>DB: persist revision changes
API-->>State: success
State->>UI: set job status "success", close modal, invalidate queries
UI-->>User: show success toast

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly Related PRs

  • Agenta-AI/agenta#4237: Touches Add-to-Testset export/mapping workflow and overlaps on testset export logic.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.00% which is insufficient. The required threshold is 60.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No pull request description was provided by the author, making it impossible to assess whether the description relates to the changeset. Add a pull request description explaining the feature, scope of changes, and any relevant implementation details or testing notes.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: annotation queue testset export' directly describes the main feature added: enabling testset export functionality within annotation queues.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/frontend-annotation-queue-testset

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
docs/designs/testset-annotation-queue/PRD.md (1)

42-48: 💤 Low value

Add blank lines around tables for markdown compliance.

The tables in this document are missing surrounding blank lines, which causes markdownlint warnings. This affects tables at lines 42, 89, 107, and 118.

📝 Example fix for the competitive reference table
 ## Competitive Reference
 
 ### LangSmith
+
 | Capability | Supported |
 |------------|-----------|
 | Create or extend a testset from both test cases and traces | Yes |
 | Add a single scenario to a testset | Yes |
 | Add to a testset from the focused/annotate tab | Yes |
 | Select table rows and add them to a testset | Yes |
+
 LangSmith supports all four capabilities; Agenta currently supports none of them in full.
docs/designs/testset-annotation-queue/RFC.md (1)

18-57: 💤 Low value

Add language specifiers to fenced code blocks.

Several code blocks in this RFC lack language specifiers, which affects syntax highlighting and readability. The architecture diagram at line 18 and examples at lines 244, 361, 366, 385, 391, 485, 500, 517 should specify a language (use text or plaintext for ASCII diagrams).

📝 Example fix for ASCII diagram
-```
+```text
 User action (button click)
         │
         ▼

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 22636d88-f5f8-408b-a088-51bb814f81e5

📥 Commits

Reviewing files that changed from the base of the PR and between bd447c0 and ee683a1.

📒 Files selected for processing (11)
  • docs/designs/testset-annotation-queue/PRD.md
  • docs/designs/testset-annotation-queue/README.md
  • docs/designs/testset-annotation-queue/RFC.md
  • web/packages/agenta-annotation-ui/src/components/AnnotationSession/FocusView.tsx
  • web/packages/agenta-annotation-ui/src/components/AnnotationSession/ScenarioListView.tsx
  • web/packages/agenta-annotation-ui/src/components/AnnotationSession/SessionNavigation.tsx
  • web/packages/agenta-annotation-ui/src/components/AnnotationSession/index.tsx
  • web/packages/agenta-entity-ui/src/modals/commit/components/EntityCommitContent.tsx
  • web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/types.ts
  • web/packages/agenta-entity-ui/src/selection/components/UnifiedEntityPicker/variants/CascadingVariant.tsx
  • web/packages/agenta-entity-ui/src/selection/hooks/modes/useCascadingMode.ts

@ashrafchowdury ashrafchowdury marked this pull request as ready for review May 5, 2026 12:35
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. feature Frontend labels May 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
web/packages/agenta-annotation-ui/src/components/AnnotationSession/ScenarioListView.tsx (1)

552-569: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep the testcase fallback key separate from the metric path.

AnnotationOutputKeyCell passes outputKey into useAnnotationCellFallback(), and after Line 912 that value is often the full mapped path (attributes.ag.data.outputs.score, etc.). The fallback hook indexes the testcase container object by that same string, so grouped annotation cells now miss synced testcase values and render until an annotation exists.

Possible fix
 const AnnotationOutputKeyCell = memo(function AnnotationOutputKeyCell({
     scenarioId,
     def,
     outputKey,
+    fallbackOutputKey,
     fallbackDataKey,
 }: {
     scenarioId: string
     def: AnnotationColumnDef
     outputKey: string
+    fallbackOutputKey?: string
     fallbackDataKey?: string | null
 }) {
     const runId = useAtomValue(annotationSessionController.selectors.activeRunId()) ?? undefined
     const PopoverWrapper = useMetricPopoverWrapper()
     const {fallbackValue, isPending} = useAnnotationCellFallback(
         scenarioId,
         fallbackDataKey,
-        outputKey,
+        fallbackOutputKey ?? outputKey,
     )
                         render: (_value: unknown, record: ScenarioTableRow) => (
                             <AnnotationOutputKeyCell
                                 scenarioId={record.scenarioId}
                                 def={outputColumn.annotationDef}
                                 outputKey={outputColumn.annotationDef.path ?? outputColumn.title}
+                                fallbackOutputKey={outputColumn.title}
                                 fallbackDataKey={def.fallbackDataKey}
                             />
                         ),

Also applies to: 899-913


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e691f741-e404-4527-b6c9-1dbc7e659952

📥 Commits

Reviewing files that changed from the base of the PR and between ee683a1 and 7bea0bc.

📒 Files selected for processing (12)
  • web/packages/agenta-annotation-ui/src/components/AnnotationSession/ScenarioListView.tsx
  • web/packages/agenta-annotation/src/index.ts
  • web/packages/agenta-annotation/src/state/controllers/annotationSessionController.ts
  • web/packages/agenta-annotation/src/state/controllers/index.ts
  • web/packages/agenta-annotation/src/state/index.ts
  • web/packages/agenta-annotation/src/state/testsetSync.ts
  • web/packages/agenta-annotation/src/state/types.ts
  • web/packages/agenta-entities/src/testset/api/api.ts
  • web/packages/agenta-entities/src/testset/api/index.ts
  • web/packages/agenta-entities/src/testset/index.ts
  • web/packages/agenta-entity-ui/src/modals/commit/components/EntityCommitModal.tsx
  • web/packages/agenta-entity-ui/src/modals/commit/state.ts
💤 Files with no reviewable changes (2)
  • web/packages/agenta-entity-ui/src/modals/commit/state.ts
  • web/packages/agenta-entity-ui/src/modals/commit/components/EntityCommitModal.tsx
✅ Files skipped from review due to trivial changes (1)
  • web/packages/agenta-annotation/src/state/index.ts

Comment thread web/packages/agenta-annotation/src/state/testsetSync.ts
Comment thread web/packages/agenta-entities/src/testset/api/api.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

Railway Preview Environment

Preview URL https://gateway-production-eac2.up.railway.app/w
Project agenta-oss-pr-4261
Image tag pr-4261-931b7f4
Status Deployed
Railway logs Open logs
Workflow logs View workflow run
Updated at 2026-05-06T19:39:14.604Z

- refactor code
- rendering json data in beautified mode by default
@ashrafchowdury ashrafchowdury marked this pull request as draft May 7, 2026 11:11
@ashrafchowdury ashrafchowdury marked this pull request as ready for review May 7, 2026 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Frontend size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement annotation queue test set flow

3 participants