Skip to content

Update evals docs: clarify global evaluators opt-in behavior (PR #14682)#618

Open
claude[bot] wants to merge 1 commit into
mainfrom
docs/TSP-1230
Open

Update evals docs: clarify global evaluators opt-in behavior (PR #14682)#618
claude[bot] wants to merge 1 commit into
mainfrom
docs/TSP-1230

Conversation

@claude
Copy link
Copy Markdown

@claude claude Bot commented May 12, 2026

Summary

  • Clarified that global Evaluators are not selected by default in evaluation modals (Run Test Set, Run Scenario, Evaluate Selected Tasks) — users must explicitly opt in via Additional global checks
  • Updated both the Global Evaluators note in the "Understanding Evaluators" section and the numbered steps in "Running evaluations" to reflect this opt-in behavior
  • Added FAQ entry documenting the 10-evaluator limit per scenario (increased from 5 to 10)

Closes Linear issue TSP-1230
Relates to GitHub PR #14682

Test plan

  • Verify "Running evaluations" step 2 accurately describes the opt-in flow for Additional global checks
  • Verify the Note under "Understanding Evaluators > Global Evaluators" clearly communicates opt-in behavior
  • Verify the new FAQ accordion renders correctly and content is accurate
  • Check all headings are sentence case and no banned words used

🤖 Generated with Claude Code

Global evaluators are not selected by default in evaluation modals.
Users must explicitly opt in via 'Additional global checks'. Also adds
FAQ entry documenting the 10-evaluator limit per scenario.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude claude Bot added the docs-drafter Documentation drafted by Claude label May 12, 2026
@linear
Copy link
Copy Markdown

linear Bot commented May 12, 2026

TSP-1230

@github-actions
Copy link
Copy Markdown
Contributor

🎯 Vibe check

Reviewed: 1 file (1 with issues, 0 clean)

Scores

Dimension Score What's holding it back
🟡 Consistency 7/10 Bold label inside <Info> callout (line 8); "tool" lowercase in multiple places where Relevance AI's Tool product feature is meant (lines 96–101, 157–160); bullet list inside an <Accordion> (lines 362–367).
🟢 Technical clarity 9/10 Edge cases are well covered (truncation behavior, global Evaluator opt-in, credit breakdown). UI element names are specific. Minor: "tool" vs "Tool" inconsistency could cause product-navigation confusion.
🟢 Non-technical clarity 9/10 Good overview before instructions, example scenarios in accordions are excellent anchors. FAQ is thorough.
🟡 Structure 7/10 Best practices section uses <CardGroup> for non-navigable tips; bullet list inside FAQ accordion; no closing CTA for what is largely a concept/overview page.

Score key: 🟢 9–10, 🟡 6–8, 🔴 1–5.

Overall vibe: Solid, thorough feature documentation — the example scenarios, edge-case callouts (truncation, Evaluator opt-in), and FAQs are genuinely useful and show real craft. A handful of mechanical CLAUDE.md violations (bold label in callout, bullet list inside accordion, inconsistent product-term casing) need tidying, but the content and organization are strong.

🔧 Issues (5)
  • build/agents/build-your-agent/evals.mdx:8**Rollout Status**: is a bold label inside a callout. CLAUDE.md explicitly prohibits bold labels inside callouts. Drop the label; the content stands on its own: Evals is currently being rolled out progressively, starting with Enterprise customers. If you're an Enterprise customer and don't see this feature yet, reach out to your account manager to discuss access.

  • build/agents/build-your-agent/evals.mdx:96,100,101 — Inside the "Tool Usage" accordion, "tool" is lowercase three times when referring to Relevance AI's Tool product feature. "Checks whether a specific tool was used""…a specific Tool was used". Same for "Select the tool to check for", "Whether the tool was used", and "if the tool was used".

  • build/agents/build-your-agent/evals.mdx:157–160 — Same capitalization issue in the Tool simulations description: "emulate tool usage without actually calling the tools""…without actually calling the Tools"; "Select a tool to simulate""Select a Tool to simulate". (The generic phrase "tool usage" in the same sentence is acceptable lowercase since it's a description, not the feature name.)

  • build/agents/build-your-agent/evals.mdx:362–367 — Bullet list inside an <Accordion> (FAQ: "How are credits calculated for evaluations?"). CLAUDE.md requires flowing sentences in accordion content. Convert to prose: "Credits for each scenario are calculated from three components: the Agent task run (the conversation itself), the simulator LLM that plays the user persona, and each Evaluator LLM that scores the conversation — both scenario-level and global."

  • build/agents/build-your-agent/evals.mdx:383–385 — The closing sentence of the "What happens when a conversation is truncated?" accordion reads "…disabling truncation and selecting a model with a larger context window is preferable." The subject-verb agreement is slightly off (two actions, singular is). Change to: "…disabling truncation and selecting a model with a larger context window are preferable."

🧩 Component suggestions (1)
  • build/agents/build-your-agent/evals.mdx:331–347 — The "Best practices" section uses <CardGroup cols={2}> for five non-navigable advisory tips. CLAUDE.md says <CardGroup> is not appropriate for "best practices that read naturally as flowing bullets." These tips read like a bullet list with a short title per item. A <CardGroup> is more defensible when items are at least navigable or represent equal parallel choices; here they're just advice. Consider converting to a plain numbered or bulleted list, or an <AccordionGroup> if you want to keep them skimmable.
🏗️ Page structure (1)
  • build/agents/build-your-agent/evals.mdx — The page is primarily a concept + overview page for a new feature but has no closing CTA. CLAUDE.md says concept and overview pages should end with a CTA so readers know where to go after learning what the feature is. A ## What's next? pointing to /build/agents/build-your-agent/triggers (to learn about automating Agents more broadly) and /build/agents/build-your-agent/build-overview (for a full picture of the build tab) would round the page off naturally. A link to contacting the account manager for Evals access could also be useful given the rollout is still in progress.
✅ Clean files (0)

(No files were fully clean.)

🔋 Credit usage
Item Count
Files reviewed 1
Context pages read 2
Total lines processed ~531

Files read: build/agents/build-your-agent/evals.mdx (395 lines), build/agents/build-your-agent/build-overview.mdx (25 lines), build/agents/build-your-agent/alerts.mdx (111 lines)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs-drafter Documentation drafted by Claude

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant