feat(pprof): emit per-sample process_language label by r1viollet · Pull Request #552 · DataDog/ddprof

r1viollet · 2026-05-29T15:03:17Z

What

Emit a per-sample pprof label process_language carrying a heuristic native-language family for the profiled process' main executable (go / rust / cpp). When detection is inconclusive, no label is set and the existing language="native" tag continues to be the sole signal.

Why

Today every native sample is reported as language="native" — Rust, C++, plain C and (mostly) Go are indistinguishable on the backend side. With this label, the backend can refine the per-process language for the dominant case where one language clearly drives a binary, without splitting profiles.

Mirrors the approach we will likely use in the OTel eBPF profiler / Full-Host Profiler (also no DWARF involvement, also per-sample label rather than a global tag), so the two producers will be consistent.

How

New NativeLanguage enum + cheap detection (include/native_language.hpp / src/native_language.cc):
- Go → .go.buildinfo / .gopclntab ELF section
- Rust → .note.rustc section, or rustc-mangled symbols in .symtab/.dynsym (_R… v0, or legacy …17h<16 hex>E tail)
- Cpp → any _Z… Itanium-mangled symbol
- else → kUnknown (label omitted)
Symbol-table scan bounded to 4096 entries.
Detection runs once per PID, lazily on first sample, against the Elf* already cached by libdwfl (dwfl_module_getelf) — no extra file open.
Result cached on Process, cleared via the existing ProcessHdr::clear(pid) on process exit.
A /proc/<pid>/exe fallback path is kept for callers without an Elf* in hand (tests / early bring-up).
Emitted as a pprof per-sample label process_language alongside process_id, process_name, etc. (k_max_pprof_labels raised 8 → 9).

Verification

Check	Result
`tools/style-check.sh`	clean
Full `ninja` (853 targets)	passes
`ctest -j4`	52/55 pass; the 3 failures (`ddprof_stats-ut`, `ipc-ut`, `simple_malloc-with-event-reordering`) reproduce on `main` without this patch — pre-existing, unrelated

Smoke-tested end-to-end with ddprof -p <pid>:

Busy Go binary → [NATIVE-LANG] -> go (single detection log across all samples; libdwfl-cached Elf* reused)
C++ binary → cpp
/bin/cat, sleep (C) → kUnknown → no label, falls back to native tag

Notes for review

Heuristic by design. Mixed-language processes (e.g. C++ with linked Rust crates) will resolve to whichever signal hits first (Rust wins over C++). Acceptable for the dominant-language reporting goal.
No DWARF reads. Symbol tables are scanned via libelf only; cost is one-shot per FileID and bounded.
No new dependency. Reuses already-linked elfutils.
Per-sample label vs global tag. Picked per-sample so multi-PID profiling stays correct; the global language="native" tag is unchanged.
Symbol-table availability. For fully stripped binaries with no .dynsym C++ symbols, Cpp detection may miss — expected, label stays unset → native.

Heuristically detect the native language family (go/rust/cpp) of each profiled process' main executable and attach it as a pprof per-sample label `process_language`. Unknown/mixed cases fall back to the existing "native" tag (no label emitted). Detection runs once per PID, lazily on first sample, using the Elf* already opened by libdwfl (via dwfl_module_getelf) -- no extra file open. The check is cheap and never reads DWARF: * Go -> .go.buildinfo / .gopclntab ELF section * Rust -> .note.rustc section, or rustc-mangled symbols (`_R...` v0, or legacy `...17h<16 hex>E` tail) * Cpp -> any `_Z...` Itanium-mangled symbol * else -> kUnknown (caller leaves label unset) Symbol-table scan is bounded to 4096 entries. Result is cached on the Process object and cleared with the rest of its state on PID exit. A fallback path opening /proc/<pid>/exe is provided for callers without an Elf* in hand (unit tests, early bring-up).

r1viollet · 2026-05-29T15:03:46Z

@codex review

chatgpt-codex-connector · 2026-05-29T15:07:14Z

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pprof): emit per-sample process_language label#552

feat(pprof): emit per-sample process_language label#552
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/native-language-label

r1viollet commented May 29, 2026

Uh oh!

r1viollet commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r1viollet commented May 29, 2026

What

Why

How

Verification

Notes for review

Uh oh!

r1viollet commented May 29, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant