[DiagnosticsClient] Add cross-container support#5735
[DiagnosticsClient] Add cross-container support#5735mdh1418 wants to merge 3 commits intodotnet:mainfrom
Conversation
When a diagnostic tool runs in a container with 'pid: host', the target
process's IPC socket is named using its container-internal PID (NSpid)
and is located under /proc/{hostPid}/root/{tmpdir}/.
Add TryGetNamespacePid() to detect cross-namespace processes via
/proc/{pid}/status NSpid and GetProcessTmpDir() to read TMPDIR from
/proc/{pid}/environ. Both self-guard for Linux, returning safe defaults
on other platforms.
Merge TryGetDefaultAddress and the new TryGetAddressFromPath into a
single TryResolveAddress that handles both Windows named pipes and
Unix domain sockets. This eliminates the duplicated non-Windows socket
search logic that existed in the original TryGetDefaultAddress.
GetDefaultAddress now has a platform-agnostic flow: try cross-namespace
first (no-op on non-Linux), then fall back to local resolution.
Improve the connection error message with Linux-specific guidance for
container scenarios (pid:host, CAP_SYS_PTRACE).
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Extend GetPublishedProcesses() to discover .NET processes in different PID namespaces by scanning /proc for cross-namespace diagnostic sockets. Introduce ProcPath and GetDiagnosticSocketSearchPattern() in PidIpcEndpoint to avoid hardcoding /proc and the socket naming convention in the enumeration logic. Refactor both GetLocalPublishedProcesses() and GetCrossNamespacePublishedProcesses() to be self-contained: each handles its own file listing, exception handling, and returns a list of PIDs. GetPublishedProcesses() is now a clean symmetric union of both methods. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Extract TryParseNamespacePid() and ParseTmpDir() from their I/O wrapper methods to make the parsing logic independently testable with synthetic data. Add PidIpcEndpointTests with two categories of tests: Parsing tests (platform-independent, synthetic data): - NSpid parsing: cross-namespace, same namespace, nested containers, missing NSpid line, empty input, malformed values - TMPDIR parsing: set, not set, empty environ - GetDiagnosticSocketSearchPattern: format verification Behavioral tests (platform-specific via ConditionalFact): - TryGetNamespacePid on current process returns false (same namespace) - TryGetNamespacePid on non-Linux returns false (platform guard) - GetProcessTmpDir reads TMPDIR from a spawned child process - GetDefaultAddress throws ServerNotAvailableException for non-existent PID Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds Linux cross-container (cross-PID-namespace) diagnostic IPC resolution to Microsoft.Diagnostics.NETCore.Client so the diagnostic tools can attach to processes whose diagnostic socket lives under /proc/{hostPid}/root/{tmpdir}/ and is named with the container-internal PID (NSpid).
Changes:
- Add Linux
/proc-based helpers to resolve diagnostic socket paths across PID namespaces (NSpid + TMPDIR). - Extend
GetPublishedProcesses()to enumerate both local and cross-namespace published processes. - Add unit tests covering parsing and selected behavioral scenarios; add XUnit extensions dependency.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsIpc/IpcTransport.cs | Adds cross-namespace PID/TMPDIR parsing and cross-root socket resolution in GetDefaultAddress. |
| src/Microsoft.Diagnostics.NETCore.Client/DiagnosticsClient/DiagnosticsClient.cs | Updates process discovery to union local /tmp sockets and cross-namespace /proc/*/root/... sockets. |
| src/tests/Microsoft.Diagnostics.NETCore.Client/PidIpcEndpointTests.cs | Adds parsing/behavioral tests for namespace PID + TMPDIR logic and error paths. |
| src/tests/Microsoft.Diagnostics.NETCore.Client/Microsoft.Diagnostics.NETCore.Client.UnitTests.csproj | Adds Microsoft.DotNet.XUnitExtensions package for [ConditionalFact]. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| foreach (string line in statusLines) | ||
| { | ||
| if (line.StartsWith("NSpid:\t", StringComparison.Ordinal)) | ||
| { | ||
| string[] parts = line.Substring(7).Split(new[] { '\t' }, StringSplitOptions.RemoveEmptyEntries); | ||
| if (parts.Length > 1 && int.TryParse(parts[parts.Length - 1], NumberStyles.Integer, CultureInfo.InvariantCulture, out int parsedPid)) | ||
| { | ||
| nsPid = parsedPid; | ||
| return true; | ||
| } | ||
| return false; |
There was a problem hiding this comment.
TryParseNamespacePid is overly strict about the /proc/{pid}/status format: it only matches lines starting with "NSpid:\t" and splits only on tabs, but cat|grep output (and your own remark example) can contain spaces/variable whitespace. Consider matching "NSpid:" and splitting on generic whitespace so cross-namespace detection doesn't silently fail on systems that don't use tab separators.
| if (parts.Length > 1 && int.TryParse(parts[parts.Length - 1], NumberStyles.Integer, CultureInfo.InvariantCulture, out int parsedPid)) | ||
| { | ||
| nsPid = parsedPid; | ||
| return true; | ||
| } |
There was a problem hiding this comment.
TryParseNamespacePid currently returns true whenever the NSpid field has more than one value, even if the innermost PID ends up equal to hostPid (or if the first value doesn't match hostPid). To avoid false positives and unnecessary cross-namespace probing, consider returning true only when the parsed innermost PID is different from hostPid (and optionally validate the first value matches hostPid).
| if (parts.Length > 1 && int.TryParse(parts[parts.Length - 1], NumberStyles.Integer, CultureInfo.InvariantCulture, out int parsedPid)) | |
| { | |
| nsPid = parsedPid; | |
| return true; | |
| } | |
| if (parts.Length > 1) | |
| { | |
| // Optionally validate that the first NSpid value matches the host PID. | |
| if (!int.TryParse(parts[0], NumberStyles.Integer, CultureInfo.InvariantCulture, out int firstPid) || firstPid != hostPid) | |
| { | |
| return false; | |
| } | |
| // Only consider this a different namespace if the innermost PID is different from hostPid. | |
| if (int.TryParse(parts[parts.Length - 1], NumberStyles.Integer, CultureInfo.InvariantCulture, out int parsedPid) && parsedPid != hostPid) | |
| { | |
| nsPid = parsedPid; | |
| return true; | |
| } | |
| return false; | |
| } |
There was a problem hiding this comment.
I disagree with copilot here. This method's job is to report the pid in the innermost namespace, whatever it happens to be. Its the caller's job to decide what to do with it. I am not aware that Linux does anything which would prevent a process from having the same ID in more than one namespace. Assuming that same ID implies same namespace seems incorrect.
| if (TryResolveAddress($"{GetProcessRootPath(pid)}{GetProcessTmpDir(pid)}", nsPid, out string crossNsAddress)) | ||
| { |
There was a problem hiding this comment.
The cross-namespace base path is built via string concatenation: $"{GetProcessRootPath(pid)}{GetProcessTmpDir(pid)}". This assumes TMPDIR is always an absolute path starting with /. If the target process has a relative TMPDIR, this will produce an invalid path (e.g., /proc/{pid}/roottmp). Consider normalizing by ensuring TMPDIR is rooted and using Path.Combine (after trimming any leading directory separator) for correctness.
| if (TryResolveAddress($"{GetProcessRootPath(pid)}{GetProcessTmpDir(pid)}", nsPid, out string crossNsAddress)) | |
| { | |
| string rootPath = GetProcessRootPath(pid); | |
| string tmpDir = GetProcessTmpDir(pid); | |
| if (Path.IsPathRooted(tmpDir)) | |
| { | |
| tmpDir = tmpDir.TrimStart(Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar); | |
| } | |
| string crossNsBasePath = Path.Combine(rootPath, tmpDir); | |
| if (TryResolveAddress(crossNsBasePath, nsPid, out string crossNsAddress)) | |
| { |
| string[] procEntries; | ||
| try | ||
| { | ||
| string[] files = Directory.GetFiles(PidIpcEndpoint.IpcRootPath); | ||
| return GetAllPublishedProcesses(files).Distinct(); | ||
| procEntries = Directory.GetDirectories(PidIpcEndpoint.ProcPath); | ||
| } |
There was a problem hiding this comment.
GetCrossNamespacePublishedProcesses allocates the full /proc directory list via Directory.GetDirectories(...). On busy hosts this can be a large allocation and makes GetPublishedProcesses() more expensive. Consider switching to Directory.EnumerateDirectories(...) (streaming) and avoiding work for non-numeric entries to reduce memory/latency for dotnet-*-ps scenarios.
| try | ||
| { | ||
| string result = PidIpcEndpoint.GetProcessTmpDir(child.Id); | ||
| Assert.Equal(customTmpDir, result); |
There was a problem hiding this comment.
GetProcessTmpDir_ChildProcess_ReadsTmpdir assumes /proc/{pid}/environ is readable for the spawned process. On Linux systems with tightened /proc permissions (e.g., hidepid, Yama ptrace restrictions), GetProcessTmpDir will hit its catch-all and fall back to Path.GetTempPath(), causing this test to fail. Consider skipping when /proc/{childPid}/environ can't be read, or loosening the assertion to accept the fallback when access is denied.
| Assert.Equal(customTmpDir, result); | |
| // Prefer the TMPDIR value from the child process, but on systems with | |
| // tightened /proc permissions GetProcessTmpDir may fall back to the | |
| // default temporary directory. | |
| if (result == customTmpDir) | |
| { | |
| return; | |
| } | |
| string fallbackTmpDir = Path.GetTempPath(); | |
| Assert.Equal(fallbackTmpDir, result); |
| discoveredPids.Add(hostPid); | ||
| } | ||
| } | ||
| catch { } |
There was a problem hiding this comment.
When using catch all we should make the try block as narrow as possible to reduce the risk that we catch errors we didn't intend to catch. I assume the catch all is only needed for Directory.GetFiles()? My suggestion above would remove the call entirely, but if it does stick around I'd either narrow the try to only wrap that one line or narrow the catch block to specific Exception types we expect Directory.GetFiles would throw.
Proccess.GetProcessById() also throws but we know the benign Exception type we are looking for is ArgumentException. Given its called in multiple places it would be nice to factor it out into a little helper method CheckProcessExists(int pid) that can do the try/catch and convert it to a boolean result.
| try | ||
| { | ||
| string crossNsPath = $"{PidIpcEndpoint.GetProcessRootPath(hostPid)}{PidIpcEndpoint.GetProcessTmpDir(hostPid)}"; | ||
| string[] files = Directory.GetFiles(crossNsPath, PidIpcEndpoint.GetDiagnosticSocketSearchPattern(nsPid)); |
There was a problem hiding this comment.
This looks like it is doing similar logic to TryResolveAddress except TryResolveAddress handles the possibility that a past process might have had the same NSPID leading to multiple files that match the pattern. I think you'd be better off calling TryResolveAddress to handle it robustly.
|
|
||
| private static bool TryGetDefaultAddress(int pid, out string defaultAddress) | ||
| /// <summary> | ||
| /// Searches a base path for a diagnostic endpoint matching the given PID. |
There was a problem hiding this comment.
| /// Searches a base path for a diagnostic endpoint matching the given PID. | |
| /// Searches files in a directory for a diagnostic endpoint matching the given PID. |
Maybe its just me, but "searches a base path" seemed ambiguous. This seems clearer.
| /// Searches a base path for a diagnostic endpoint matching the given PID. | ||
| /// On Windows, resolves named pipes. On other platforms, searches for Unix domain sockets. | ||
| /// </summary> | ||
| private static bool TryResolveAddress(string basePath, int pid, out string address) |
There was a problem hiding this comment.
| private static bool TryResolveAddress(string basePath, int pid, out string address) | |
| private static bool TryResolveAddress(string searchDirectory, int pid, out string address) |
| { | ||
| return TryParseNamespacePid(File.ReadLines(string.Format(_procStatusPathFormat, hostPid)), hostPid, out nsPid); | ||
| } | ||
| catch { } |
There was a problem hiding this comment.
This try/catch shouldn't be here. If the exception was anticipated TryParse should already have caught it and converted it to a false return value. Unexpected exceptions shouldn't be caught in most cases because it obscures buggy code and makes diagnosis harder.
| { | ||
| return ParseTmpDir(File.ReadAllBytes(string.Format(_procEnvironPathFormat, hostPid))); | ||
| } | ||
| catch { } |
There was a problem hiding this comment.
Narrow this to only put File.ReadAllBytes() in the try/catch.
| } | ||
|
|
||
| return defaultAddress; | ||
| if (TryResolveAddress(IpcRootPath, pid, out string localAddress)) |
There was a problem hiding this comment.
Processes in the same namespace can also have modified TMP_DIR. Now that we have TMP_DIR checking we should use it everywhere.
| discoveredPids.Add(pid); | ||
| } | ||
|
|
||
| foreach (int pid in GetCrossNamespacePublishedProcesses()) |
There was a problem hiding this comment.
GetLocalPublishedProcesses only finds processes that have the same tmp dir as our current process. Ideally this extra search in /proc/ finds not only the cross namespace processes but also the same container processes with modified TMP_DIR.
| msg += " The diagnostic socket path may exceed the 108-character limit." | ||
| + " Try setting TMPDIR to a shorter path."; | ||
| } | ||
| msg += $" Ensure {IpcRootPath} is writable by the current user." |
There was a problem hiding this comment.
This part of the message assumes IpcRootDir == target process TMP_DIR which may be false. Its the target process tmp dir which needs to be writable.
| + " Try setting TMPDIR to a shorter path."; | ||
| } | ||
| msg += $" Ensure {IpcRootPath} is writable by the current user." | ||
| + " If the target process sets TMPDIR, set it to the same directory."; |
There was a problem hiding this comment.
We only need to include this in the message if either:
- we aren't on Linux
- attempting to access /proc/pid/environ failed.
If we can access environ then we know exactly whether the target has a TMP_DIR rather than needing to have the user speculate about it.
Summary
When diagnostic tools (dotnet-trace, dotnet-counters, dotnet-stack, dotnet-dump) run in a sidecar container with
pid: host, they cannot find the target process's diagnostic IPC socket. The socket is named using the container-internal PID (NSpid) and located under/proc/{hostPid}/root/{tmpdir}/, but the tools only look in local/tmp/.This PR adds cross-namespace IPC resolution to
Microsoft.Diagnostics.NETCore.Clientso all diagnostic tools work transparently across PID namespaces.Fixes #5694
Changes
Commit 1: Core IPC transport support
TryGetNamespacePid()— reads/proc/{pid}/statusNSpid to detect cross-namespace processesGetProcessTmpDir()— reads/proc/{pid}/environfor TMPDIRTryGetDefaultAddressinto parameterizedTryResolveAddress()— handles both Windows named pipes and Unix sockets from any base pathGetDefaultAddress()flow: try cross-namespace → try local → error with platform-specific guidanceCommit 2: Cross-namespace process enumeration
GetCrossNamespacePublishedProcesses()— scans/procfor processes in different namespaces with diagnostic socketsGetLocalPublishedProcesses()to be self-contained (handles its own file listing and exceptions)GetPublishedProcesses()is now a clean symmetric union of both methodsProcPathandGetDiagnosticSocketSearchPattern()constants to avoid hardcodingCommit 3: Unit tests
TryParseNamespacePid()andParseTmpDir()for testable parsing logic[ConditionalFact](same-namespace detection, child process TMPDIR reading, platform guards, error paths)How it works
No explicit platform branching in
GetDefaultAddress— platform logic is insideTryResolveAddressand the self-guarding helper methods.Cross-Namespace IPC Test Results
Setup
Docker Compose with two containers:
target-app: .NET 10.0-preview app running with PID 1 inside its container (host PID 2209)tracer: .NET 10.0 SDK container withprivileged: trueandpid: hostThe target app's IPC socket is named with its container-internal PID (
1) and is only accessible via/proc/2209/root/tmp/from the tracer's namespace. No socket exists in the tracer's local/tmp/.Baseline: Stock Tools v9.0.661903 (Before Fix)
All tools fail —
psfinds no processes, all connection attempts throwServerNotAvailableException.psNo supported .NET processes were foundpsNo supported .NET processes were foundcollect -p 2209ServerNotAvailableException: Unable to connect to Process 2209collect-linux --probe -p 2209ServerNotAvailableException: Unable to connect to Process 2209collect-linux -p 2209ServerNotAvailableException: Unable to connect to Process 2209monitor -p 2209ServerNotAvailableException: Unable to connect to Process 2209report -p 2209ServerNotAvailableException: Unable to connect to Process 2209collect -p 2209Unable to connect to Process 2209Full baseline output
Fix: Built Tools from
cross-namespace-ipcBranchAll tools successfully discover and connect to the cross-namespace process. The
collect-linuxtests connect successfully but report a runtime version mismatch (target runs 10.0.0-preview.7, collect-linux requires 10.0.0 RTM) — this is expected behavior, not a connection failure.ps2209 dotnet dotnet TargetApp.dllps2209 dotnet dotnet TargetApp.dllcollect -p 2209 --duration 5sTrace completed.(404KB nettrace)collect-linux --probe -p 2209collect-linux -p 2209 --duration 5smonitor -p 2209 --duration 5Status: Running(collected ~30 counter updates)report -p 2209TargetApp!Program.<Main>$collect -p 2209Complete(dump written)Key difference for collect-linux: Baseline throws
ServerNotAvailableException(can't find socket). Fix successfully connects and gets past to the version check, which correctly identifies the preview runtime as unsupported.Full fix output