Skip to content

feat(pprof): emit per-sample process_language label#552

Draft
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/native-language-label
Draft

feat(pprof): emit per-sample process_language label#552
r1viollet wants to merge 1 commit into
mainfrom
r1viollet/native-language-label

Conversation

@r1viollet
Copy link
Copy Markdown
Collaborator

What

Emit a per-sample pprof label process_language carrying a heuristic native-language family for the profiled process' main executable (go / rust / cpp). When detection is inconclusive, no label is set and the existing language="native" tag continues to be the sole signal.

Why

Today every native sample is reported as language="native" — Rust, C++, plain C and (mostly) Go are indistinguishable on the backend side. With this label, the backend can refine the per-process language for the dominant case where one language clearly drives a binary, without splitting profiles.

Mirrors the approach we will likely use in the OTel eBPF profiler / Full-Host Profiler (also no DWARF involvement, also per-sample label rather than a global tag), so the two producers will be consistent.

How

  • New NativeLanguage enum + cheap detection (include/native_language.hpp / src/native_language.cc):
    • Go.go.buildinfo / .gopclntab ELF section
    • Rust.note.rustc section, or rustc-mangled symbols in .symtab/.dynsym (_R… v0, or legacy …17h<16 hex>E tail)
    • Cpp → any _Z… Itanium-mangled symbol
    • else → kUnknown (label omitted)
  • Symbol-table scan bounded to 4096 entries.
  • Detection runs once per PID, lazily on first sample, against the Elf* already cached by libdwfl (dwfl_module_getelf) — no extra file open.
  • Result cached on Process, cleared via the existing ProcessHdr::clear(pid) on process exit.
  • A /proc/<pid>/exe fallback path is kept for callers without an Elf* in hand (tests / early bring-up).
  • Emitted as a pprof per-sample label process_language alongside process_id, process_name, etc. (k_max_pprof_labels raised 8 → 9).

Verification

Check Result
tools/style-check.sh clean
Full ninja (853 targets) passes
ctest -j4 52/55 pass; the 3 failures (ddprof_stats-ut, ipc-ut, simple_malloc-with-event-reordering) reproduce on main without this patch — pre-existing, unrelated

Smoke-tested end-to-end with ddprof -p <pid>:

  • Busy Go binary → [NATIVE-LANG] -> go (single detection log across all samples; libdwfl-cached Elf* reused)
  • C++ binary → cpp
  • /bin/cat, sleep (C) → kUnknown → no label, falls back to native tag

Notes for review

  • Heuristic by design. Mixed-language processes (e.g. C++ with linked Rust crates) will resolve to whichever signal hits first (Rust wins over C++). Acceptable for the dominant-language reporting goal.
  • No DWARF reads. Symbol tables are scanned via libelf only; cost is one-shot per FileID and bounded.
  • No new dependency. Reuses already-linked elfutils.
  • Per-sample label vs global tag. Picked per-sample so multi-PID profiling stays correct; the global language="native" tag is unchanged.
  • Symbol-table availability. For fully stripped binaries with no .dynsym C++ symbols, Cpp detection may miss — expected, label stays unset → native.

Heuristically detect the native language family (go/rust/cpp) of each
profiled process' main executable and attach it as a pprof per-sample
label `process_language`. Unknown/mixed cases fall back to the existing
"native" tag (no label emitted).

Detection runs once per PID, lazily on first sample, using the Elf*
already opened by libdwfl (via dwfl_module_getelf) -- no extra file
open. The check is cheap and never reads DWARF:
  * Go    -> .go.buildinfo / .gopclntab ELF section
  * Rust  -> .note.rustc section, or rustc-mangled symbols
            (`_R...` v0, or legacy `...17h<16 hex>E` tail)
  * Cpp   -> any `_Z...` Itanium-mangled symbol
  * else  -> kUnknown (caller leaves label unset)

Symbol-table scan is bounded to 4096 entries. Result is cached on the
Process object and cleared with the rest of its state on PID exit.

A fallback path opening /proc/<pid>/exe is provided for callers without
an Elf* in hand (unit tests, early bring-up).
@r1viollet
Copy link
Copy Markdown
Collaborator Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant