Skip to content

feat(llm-obs): add spans analytics command with multi-dimension group-by#290

Open
mbldatadog wants to merge 4 commits intodatadog-labs:mainfrom
mbldatadog:feat/llm-obs-analytics
Open

feat(llm-obs): add spans analytics command with multi-dimension group-by#290
mbldatadog wants to merge 4 commits intodatadog-labs:mainfrom
mbldatadog:feat/llm-obs-analytics

Conversation

@mbldatadog
Copy link
Copy Markdown
Contributor

@mbldatadog mbldatadog commented Apr 2, 2026

Note

RFC — please do not merge. This is a draft for team/customer feedback before we consider it ready for production.

Summary

Adds pup llm-obs spans analytics to support aggregating LLM Obs span data grouped by multiple dimensions simultaneously — e.g. span_name + @meta.error.type + @meta.error.message, or @meta.model_name + @meta.error.type + service_tier.

Changes

  • src/main.rs — new Analytics variant in LlmObsSpansActions with flags: --group-by, --compute, --limit, --query, --from, --to, --ml-app
  • src/commands/llm_obs.rs — new spans_analytics() function; calls POST /api/unstable/llm-obs-query-rewriter/timeseries; renders results as a table
  • src/test_commands.rs — 6 integration tests (200, 401, 403, 500, invalid --from, no auth) + 10 unit tests for the pure helper logic

Usage

# Split by error type + message + span name
pup llm-obs spans analytics --from 1d \
  --group-by "span_name,@meta.error.type,@meta.error.message"
+-----------------+------------------+------------------------+-------+
| span_name       | @meta.error.type | @meta.error.message    | count |
+=====================================================================+
| embeddings-call | timeout          | upstream connect error | 127   |
|-----------------+------------------+------------------------+-------|
| llm-call        | rate_limit_error | Rate limit exceeded    | 43    |
|-----------------+------------------+------------------------+-------|
| llm-call        |                  |                        | 9821  |
+-----------------+------------------+------------------------+-------+
# Filter to llm spans, split by model + error + custom tag
pup llm-obs spans analytics --from 1d --query "span.kind:llm" \
  --group-by "@meta.model_name,@meta.error.type,service_tier"
+-------------------+------------------+--------------+-------+
| @meta.model_name  | @meta.error.type | service_tier | count |
+=============================================================+
| gpt-4o            | rate_limit_error | standard     | 312   |
|-------------------+------------------+--------------+-------|
| gpt-4o            |                  | standard     | 14082 |
|-------------------+------------------+--------------+-------|
| claude-3-5-sonnet | timeout          | premium      | 17    |
|-------------------+------------------+--------------+-------|
| claude-3-5-sonnet |                  | premium      | 5930  |
+-------------------+------------------+--------------+-------+
# Scoped to a specific app
pup llm-obs spans analytics --from 6h --ml-app "my-rag-app" \
  --group-by "span_name,@meta.error.type" --limit 20
+-----------+------------------+-------+
| span_name | @meta.error.type | count |
+======================================+
| retrieval |                  | 4201  |
|-----------+------------------+-------|
| llm-call  |                  | 3887  |
|-----------+------------------+-------|
| llm-call  | timeout          | 94    |
|-----------+------------------+-------|
| retrieval | not_found        | 23    |
+-----------+------------------+-------+

Open questions for feedback

  • ⚠️ Endpoint: llm-obs-query-rewriter/timeseries returns a 403 for Bearer token auth (requires session/cookie) and wraps the response in {"eventQueryResponse": "<escaped proto JSON>"} — both bad for a CLI. What's the right endpoint to call? Is there an MCP-layer equivalent, or should we call the EvP analytics endpoint directly?
  • Should --group-by be required, or silently no-op when omitted (returns a single global count)?
  • Any preferred field name conventions for dimensions (e.g. @meta.error.type vs error_type)?

🤖 Generated with Claude Code

mbldatadog and others added 2 commits April 2, 2026 11:45
Adds `pup llm-obs spans analytics` to support aggregating LLM Obs span
data grouped by one or more dimensions simultaneously — e.g. span_name +
error type + error message, or model name + service tier + error type.

Calls the llm-obs-query-rewriter timeseries endpoint (same backend as
the LLM Obs Analytics tab in the UI).

- New CLI: `pup llm-obs spans analytics --group-by --compute --limit`
- Supports --query, --from, --to, --ml-app filters
- 16 new tests (unit + integration, including 401/403/500 failure cases)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@mbldatadog
Copy link
Copy Markdown
Contributor Author

Warning

Auth issue / endpoint TBD — when testing, POST /api/unstable/llm-obs-query-rewriter/timeseries returns a 403 with Bearer token auth (the endpoint requires session/cookie auth). Additionally, successful responses from this endpoint wrap the payload in {"eventQueryResponse": "<escaped proto JSON>"}, which is ugly for a CLI consumer.

The expected output shape in the PR description is representative but constructed — not from a live call. The first open question covers this; flagging here so it's easy to find.

@platinummonkey
Copy link
Copy Markdown
Collaborator

Endpoint: llm-obs-query-rewriter/timeseries returns a 403 for Bearer token auth (requires session/cookie)

The fix would be to work with @srosenthal-dd / AAA to ensure the oauth scope exists and is public

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/commands/llm_obs.rs Outdated
@vpatel22
Copy link
Copy Markdown

vpatel22 commented Apr 2, 2026

Open questions for feedback

  • Should --group-by be required, or silently no-op when omitted (returns a single global count)?

I would say we should maintain the behavior we have in the LLM Obs UI which is group by not being required. It looks like just a global count (of either traces or spans depending on what you have selected) is returned

  • Any preferred field name conventions for dimensions (e.g. @meta.error.type vs error_type)?=

For field name conventions, I think we just use what we would use for querying (which I think is @meta.error.type as opposed to error_type)

Additional "Features"

  1. Including a window rollup for additional "grouping". It's probably worth limiting this based on the --from value (if 1D, smallest window is 30/60m, if >7D, rollup is 6/12h). Not sure what that optimal grouping is but whatever makes sense from a computational standpoint
  2. grouping by/querying by custom tags that are added

Validation
Want to make sure something like pup llm-obs trace analytics --from 1d --query "@child.@meta.error.type:ToolCallValidationException" --group_by "@name,@meta.error.message" would work!

@srosenthal-dd
Copy link
Copy Markdown
Contributor

Hi from AAA! It looks like the LLM Observability scopes are set up correctly, but a few of the backend APIs (https://github.com/DataDog/dd-source/blob/main/domains/ml-observability/apps/apis/llm-obs-query-rewriter/main.go) are using the BuiltInFeatures permission, which isn't exposed yet for most customers as a configurable scope. Do you know why? Could the APIs be updated to use LlmObservabilityRead or LlmObservabilityWrite?

Happy to help if that's at all unclear! My main mission right now is to make API auth "just work" everywhere.

@mbldatadog
Copy link
Copy Markdown
Contributor Author

Ah thanks @srosenthal-dd, yep, I wanted to make sure we had the OAuth scopes properly plumbed through everywhere before shipping this, will followup with you, thanks much...

Replace hand-rolled comfy_table rendering with formatter::output so that
spans analytics respects the -o flag (json, yaml, table, csv, tsv) like
every other command. Buckets are flattened into a list of row objects
before being passed to the formatter.

Add unit tests covering single/multi-facet, no-facet aggregate, empty
buckets, missing buckets key, and null fallbacks.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@mbldatadog
Copy link
Copy Markdown
Contributor Author

okay @srosenthal-dd - I've got a PR up to try and plumb the correct scopes through, I need to take a closer look at it tomorrow, will follow up with you then!

@mbldatadog
Copy link
Copy Markdown
Contributor Author

Hey @vpatel22 - turns out that this will require deeper auth surgery on our backend than my initial shot, working through it with @srosenthal-dd on my end, will let you know where things net out on Tuesday. It also turns out that Greg on our end is also building out a similar feature into the DD UI as well already, so I'll poke him next week and see what the plans are for that...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants