feat(llm-obs): add spans analytics command with multi-dimension group-by by mbldatadog · Pull Request #290 · datadog-labs/pup

mbldatadog · 2026-04-02T15:47:35Z

Note

RFC — please do not merge. This is a draft for team/customer feedback before we consider it ready for production.

Summary

Adds pup llm-obs spans analytics to support aggregating LLM Obs span data grouped by multiple dimensions simultaneously — e.g. span_name + @meta.error.type + @meta.error.message, or @meta.model_name + @meta.error.type + service_tier.

Changes

src/main.rs — new Analytics variant in LlmObsSpansActions with flags: --group-by, --compute, --limit, --query, --from, --to, --ml-app
src/commands/llm_obs.rs — new spans_analytics() function; calls POST /api/unstable/llm-obs-query-rewriter/timeseries; renders results as a table
src/test_commands.rs — 6 integration tests (200, 401, 403, 500, invalid --from, no auth) + 10 unit tests for the pure helper logic

Usage

# Split by error type + message + span name
pup llm-obs spans analytics --from 1d \
  --group-by "span_name,@meta.error.type,@meta.error.message"

+-----------------+------------------+------------------------+-------+
| span_name       | @meta.error.type | @meta.error.message    | count |
+=====================================================================+
| embeddings-call | timeout          | upstream connect error | 127   |
|-----------------+------------------+------------------------+-------|
| llm-call        | rate_limit_error | Rate limit exceeded    | 43    |
|-----------------+------------------+------------------------+-------|
| llm-call        |                  |                        | 9821  |
+-----------------+------------------+------------------------+-------+

# Filter to llm spans, split by model + error + custom tag
pup llm-obs spans analytics --from 1d --query "span.kind:llm" \
  --group-by "@meta.model_name,@meta.error.type,service_tier"

+-------------------+------------------+--------------+-------+
| @meta.model_name  | @meta.error.type | service_tier | count |
+=============================================================+
| gpt-4o            | rate_limit_error | standard     | 312   |
|-------------------+------------------+--------------+-------|
| gpt-4o            |                  | standard     | 14082 |
|-------------------+------------------+--------------+-------|
| claude-3-5-sonnet | timeout          | premium      | 17    |
|-------------------+------------------+--------------+-------|
| claude-3-5-sonnet |                  | premium      | 5930  |
+-------------------+------------------+--------------+-------+

# Scoped to a specific app
pup llm-obs spans analytics --from 6h --ml-app "my-rag-app" \
  --group-by "span_name,@meta.error.type" --limit 20

+-----------+------------------+-------+
| span_name | @meta.error.type | count |
+======================================+
| retrieval |                  | 4201  |
|-----------+------------------+-------|
| llm-call  |                  | 3887  |
|-----------+------------------+-------|
| llm-call  | timeout          | 94    |
|-----------+------------------+-------|
| retrieval | not_found        | 23    |
+-----------+------------------+-------+

Open questions for feedback

⚠️ Endpoint: llm-obs-query-rewriter/timeseries returns a 403 for Bearer token auth (requires session/cookie) and wraps the response in {"eventQueryResponse": "<escaped proto JSON>"} — both bad for a CLI. What's the right endpoint to call? Is there an MCP-layer equivalent, or should we call the EvP analytics endpoint directly?
Should --group-by be required, or silently no-op when omitted (returns a single global count)?
Any preferred field name conventions for dimensions (e.g. @meta.error.type vs error_type)?

🤖 Generated with Claude Code

Adds `pup llm-obs spans analytics` to support aggregating LLM Obs span data grouped by one or more dimensions simultaneously — e.g. span_name + error type + error message, or model name + service tier + error type. Calls the llm-obs-query-rewriter timeseries endpoint (same backend as the LLM Obs Analytics tab in the UI). - New CLI: `pup llm-obs spans analytics --group-by --compute --limit` - Supports --query, --from, --to, --ml-app filters - 16 new tests (unit + integration, including 401/403/500 failure cases) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

mbldatadog · 2026-04-02T15:58:27Z

Warning

Auth issue / endpoint TBD — when testing, POST /api/unstable/llm-obs-query-rewriter/timeseries returns a 403 with Bearer token auth (the endpoint requires session/cookie auth). Additionally, successful responses from this endpoint wrap the payload in {"eventQueryResponse": "<escaped proto JSON>"}, which is ugly for a CLI consumer.

The expected output shape in the PR description is representative but constructed — not from a live call. The first open question covers this; flagging here so it's easy to find.

platinummonkey · 2026-04-02T15:58:49Z

Endpoint: llm-obs-query-rewriter/timeseries returns a 403 for Bearer token auth (requires session/cookie)

The fix would be to work with @srosenthal-dd / AAA to ensure the oauth scope exists and is public

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

vpatel22 · 2026-04-02T19:23:18Z

Open questions for feedback

Should --group-by be required, or silently no-op when omitted (returns a single global count)?

I would say we should maintain the behavior we have in the LLM Obs UI which is group by not being required. It looks like just a global count (of either traces or spans depending on what you have selected) is returned

Any preferred field name conventions for dimensions (e.g. @meta.error.type vs error_type)?=

For field name conventions, I think we just use what we would use for querying (which I think is @meta.error.type as opposed to error_type)

Additional "Features"

Including a window rollup for additional "grouping". It's probably worth limiting this based on the --from value (if 1D, smallest window is 30/60m, if >7D, rollup is 6/12h). Not sure what that optimal grouping is but whatever makes sense from a computational standpoint
grouping by/querying by custom tags that are added

Validation
Want to make sure something like pup llm-obs trace analytics --from 1d --query "@child.@meta.error.type:ToolCallValidationException" --group_by "@name,@meta.error.message" would work!

srosenthal-dd · 2026-04-07T23:49:11Z

Hi from AAA! It looks like the LLM Observability scopes are set up correctly, but a few of the backend APIs (https://github.com/DataDog/dd-source/blob/main/domains/ml-observability/apps/apis/llm-obs-query-rewriter/main.go) are using the BuiltInFeatures permission, which isn't exposed yet for most customers as a configurable scope. Do you know why? Could the APIs be updated to use LlmObservabilityRead or LlmObservabilityWrite?

Happy to help if that's at all unclear! My main mission right now is to make API auth "just work" everywhere.

mbldatadog · 2026-04-08T19:38:33Z

Ah thanks @srosenthal-dd, yep, I wanted to make sure we had the OAuth scopes properly plumbed through everywhere before shipping this, will followup with you, thanks much...

Replace hand-rolled comfy_table rendering with formatter::output so that spans analytics respects the -o flag (json, yaml, table, csv, tsv) like every other command. Buckets are flattened into a list of row objects before being passed to the formatter. Add unit tests covering single/multi-facet, no-facet aggregate, empty buckets, missing buckets key, and null fallbacks. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

mbldatadog · 2026-04-09T20:31:32Z

okay @srosenthal-dd - I've got a PR up to try and plumb the correct scopes through, I need to take a closer look at it tomorrow, will follow up with you then!

mbldatadog · 2026-04-17T17:04:09Z

Hey @vpatel22 - turns out that this will require deeper auth surgery on our backend than my initial shot, working through it with @srosenthal-dd on my end, will let you know where things net out on Tuesday. It also turns out that Greg on our end is also building out a similar feature into the DD UI as well already, so I'll poke him next week and see what the plans are for that...

mbldatadog and others added 2 commits April 2, 2026 11:45

style: run cargo fmt

8fe7ded

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

feat(llm-obs): render spans analytics as a table instead of raw JSON

f3513f9

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

platinummonkey reviewed Apr 2, 2026

View reviewed changes

Comment thread src/commands/llm_obs.rs Outdated

platinummonkey added enhancement New feature or request product:ai-observability labels Apr 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-obs): add spans analytics command with multi-dimension group-by#290

feat(llm-obs): add spans analytics command with multi-dimension group-by#290
mbldatadog wants to merge 4 commits intodatadog-labs:mainfrom
mbldatadog:feat/llm-obs-analytics

mbldatadog commented Apr 2, 2026 •

edited

Loading

Uh oh!

mbldatadog commented Apr 2, 2026

Uh oh!

platinummonkey commented Apr 2, 2026

Uh oh!

Uh oh!

vpatel22 commented Apr 2, 2026 •

edited

Loading

Open questions for feedback

Uh oh!

srosenthal-dd commented Apr 7, 2026

Uh oh!

mbldatadog commented Apr 8, 2026

Uh oh!

mbldatadog commented Apr 9, 2026

Uh oh!

mbldatadog commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mbldatadog commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Open questions for feedback

Uh oh!

mbldatadog commented Apr 2, 2026

Uh oh!

platinummonkey commented Apr 2, 2026

Uh oh!

Uh oh!

vpatel22 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Open questions for feedback

Uh oh!

srosenthal-dd commented Apr 7, 2026

Uh oh!

mbldatadog commented Apr 8, 2026

Uh oh!

mbldatadog commented Apr 9, 2026

Uh oh!

mbldatadog commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mbldatadog commented Apr 2, 2026 •

edited

Loading

vpatel22 commented Apr 2, 2026 •

edited

Loading