feat: Linux introspection plugins (procfs, systemd, container)#263
Draft
feat: Linux introspection plugins (procfs, systemd, container)#263
Conversation
…nodes section The param names in the launch/YAML/CLI examples were already correct (discovery.mode, discovery.manifest_path), so no changes needed there. Replaced the "Handling Unmanifested Nodes" section which documented the nonexistent config.unmanifested_nodes parameter (with ignore/warn/error/ include_as_orphan policies) with "Controlling Gap-Fill in Hybrid Mode" documenting the actual discovery.merge_pipeline.gap_fill.* parameters. Added a note block to the Runtime Linking section explaining the layered merge pipeline architecture.
…info format
- Add missing endpoint categories to handle_root: logs, bulk-data,
cyclic-subscriptions, updates (conditional), DELETE /faults (global)
- Remove ghost snapshot endpoints (listed but never registered)
- Add missing capabilities: logs, bulk_data, cyclic_subscriptions, updates
- Fix hardcoded version "0.1.0" -> "0.3.0" in handle_root and version-info
- Change version-info response key from "sovd_info" to "items" (SOVD standard)
- Add bulk-data, logs, cyclic-subscriptions URIs to entity capability responses
- Update rest.rst: fix Server Capabilities example format, remove phantom
/manifest/status, document DELETE /{entity}/faults, update SOVD compliance
section, add areas/functions resource collection notes
- Update tests, integration tests, and Postman collection for sovd_info->items
Code fixes: - Remove areas/functions bulk-data from handle_root (validation rejects them) - Rename test HandleVersionInfoContainsSovdInfoArray -> HandleVersionInfoContainsItemsArray - Fix test_root_endpoint_includes_snapshots: verify legacy snapshot endpoints are NOT listed Docs fixes: - rest.rst: fix self -> href in area/component list examples - rest.rst: remove /bulk-data from areas and functions resource collections - plugin-system.rst: remove LogProvider include/export from UpdateProvider example - plugin-system.rst: clarify IntrospectionProvider metadata is plugin-internal - discovery-options.rst: fix Field Groups table (status, metadata fields) - discovery-options.rst: fix health endpoint JSON to match MergeReport::to_json() - manifest-discovery.rst: fix gap-fill disabled description
- rest.rst: add /version-info example response, remove stale `area` field from components list example - rest.rst: document sovd_info -> items rename in CHANGELOG as breaking change - discovery-options.rst: add local TOC, note case-sensitivity for policy values, fix strategy name "HybridDiscoveryStrategy" -> "hybrid" - manifest-discovery.rst, migration-to-manifest.rst: fix jq commands (.[] -> .items[]) - CHANGELOG.rst: add Breaking Changes section and new 0.3.0 features - test_plugin_vendor_extensions.test.py: add @verifies REQ_INTEROP_003
refresh_cache() was calling discover_topic_components() for both RUNTIME_ONLY and MANIFEST_ONLY modes. In manifest_only mode this added synthetic components from the runtime ROS 2 graph, violating the intent of "only manifest entities." Invert the condition so only RUNTIME_ONLY merges topic components. MANIFEST_ONLY and HYBRID both use discover_components() directly.
…ion test Rename 3 unit tests referencing old "SovdEntry" naming to "ItemsEntry" to match the sovd_info->items rename in handle_version_info. Add regression test verifying that topic-based components do not leak into the entity cache in manifest_only discovery mode (validates the fix in gateway_node.cpp discover_components).
…tions Enable resource collections (data, operations, configurations, faults, logs, bulk-data) on areas and functions. SOVD defines these only for apps/components - this is a pragmatic ros2_medkit extension. Add log routes for areas (namespace prefix match) and functions (aggregate from hosted apps). Update capability responses to include logs and bulk-data URIs. Fix entity_capabilities.cpp to match actual route registrations.
Document ros2_medkit's pragmatic approach to SOVD - we extend the spec where ROS 2 use cases benefit (resource collections on areas/functions, x-medkit vendor extensions). Add resource collection support matrix. Fix incorrect claims about areas supporting same collections as components. Add changelog entries for area/function log endpoints.
…t, docs Code fixes: - Fix faults sampler to scope by entity type (AREA: namespace, FUNCTION: host FQNs, COMPONENT: app FQNs) matching REST handler behavior - Fix logs sampler to use prefix/exact matching per entity type, matching log_handlers.cpp scoping logic - Hoist duplicated severity/context parameter validation in log_handlers - Add area/function bulk-data endpoints to handle_root endpoint list Docs fixes: - Fix SOVD Compliance RST heading level (~~~ -> --- for h2) - Update Logs Endpoints section to mention areas and functions - Restore See Also cross-references (authentication, server config) - Fix em dashes to hyphens in log configuration section
In manifest_only discovery mode, Apps never get bound_fqn set because runtime_linker only runs in hybrid mode. This caused handlers, samplers, and configuration aggregation to silently return empty results for all App-based lookups (logs, faults, configurations). Add App::effective_fqn() that prefers bound_fqn when available, falling back to deriving the FQN from ros_binding (namespace_pattern + node_name). Replace all direct bound_fqn accesses in handler_context, log_handlers, fault_handlers, gateway_node samplers, thread_safe_entity_cache config aggregation, and plugin_context with effective_fqn() calls. Update test_bulk_data_api for areas now returning 200 (entity capabilities extended) and test_scenario_discovery_manifest timeout for log aggregation.
…ering - Fix effective_fqn() to prepend "/" when namespace_pattern is empty, ensuring valid ROS 2 FQNs for fault filtering and bulk-data scoping - Reject glob patterns (containing "*") in effective_fqn() to prevent garbage FQNs from namespace patterns like "**" or "prefix*" - Add BULK_DATA and CYCLIC_SUBSCRIPTIONS to CapabilityBuilder enum and include them in caps vectors for all entity types in discovery handlers - Extract filter_faults_by_fqns() helper to eliminate duplication between FUNCTION and COMPONENT fault filtering blocks in gateway_node.cpp - Add EXPECT_FALSE(is_aggregated(BULK_DATA)) assertions for AREA/FUNCTION - Add effective_fqn() unit tests covering empty namespace, wildcards, globs - Add comments explaining unconditional bulk-data/cyclic endpoints in handle_root (depend on fault_manager, not optional plugins) - Update bulk-data handler with get_source_filters() for function aggregation - Fix integration test docstrings for bulk-data entity type coverage
…ages Add install(DIRECTORY include/) and ament_export_include_directories() so external packages can find_package(ros2_medkit_gateway) and get plugin interface headers (GatewayPlugin, IntrospectionProvider, etc.) and vendored tl::expected.
…egation Add method to enumerate child Apps for a Component via entity cache. Needed by introspection plugins for Component-level vendor endpoints.
Add new ROS 2 package for Linux introspection plugins with: - Static utility library (medkit_linux_utils) - Three MODULE plugin targets (procfs, systemd, container) - Stub source files and tests - CMake config with fPIC, ccache, linting
- read_process_info: parse /proc/{pid}/stat, status, cmdline, exe
- find_pid_for_node: scan /proc for ROS 2 nodes by __node:= and __ns:= args
- Tests use both real /proc/self and synthetic /proc in tmpdir
- extract_container_id: Docker, podman, containerd cgroup path patterns - detect_runtime: identify container runtime from cgroup path - is_containerized: check if PID runs in a container - read_cgroup_info: cgroup v2 resource limits (memory.max, cpu.max) - All tests use synthetic cgroup filesystem in tmpdir
- Scans /proc for ROS 2 nodes by parsing cmdline args - Thread-safe with shared_mutex (concurrent reads, exclusive refresh) - Auto-refresh on TTL expiry during lookup - Tests with synthetic /proc in tmpdir - Move parse_ros_args out of anonymous namespace for PidCache access - Change TTL type to steady_clock::duration for sub-second precision
- IntrospectionProvider: detects containerized Apps via cgroup path analysis
- Supports Docker, podman, containerd runtime detection
- Reads cgroup v2 resource limits (memory.max, cpu.max)
- Vendor routes: GET /apps/{id}/x-medkit-container, GET /components/{id}/x-medkit-container
- 404 when node not containerized, Component aggregation by container ID
- PidCache stored as unique_ptr to avoid shared_mutex move-assignment issue
- IntrospectionProvider: maps Apps to PIDs, returns ProcessInfo metadata
- Vendor routes: GET /apps/{id}/x-medkit-procfs, GET /components/{id}/x-medkit-procfs
- Component aggregation: unique processes with node_ids lists
- PidCache for efficient /proc scanning with configurable TTL
- 503 on PID lookup failure (transient - node may have crashed)
- IntrospectionProvider: maps Apps to systemd units via sd_pid_get_unit
- Queries unit properties via sd-bus (ActiveState, SubState, NRestarts, WatchdogUSec)
- Vendor routes: GET /apps/{id}/x-medkit-systemd, GET /components/{id}/x-medkit-systemd
- 404 when node not in a systemd unit, Component aggregation by unit name
- Configuration, API reference with curl examples, requirements - Troubleshooting for PID lookup, permissions, systemd access - Added to tutorials toctree after plugin-system
- procfs: PID mapping, resource usage, Component aggregation, 404 on nonexistent, capabilities - combined: procfs + container route isolation (200 + 404 coexistence on host)
- systemd: unit info, restart count, watchdog, non-unit 404 - container: container ID, runtime detection, resource limits from cgroup - Dockerfile.systemd with systemd as PID 1, unit files for demo nodes - Dockerfile.container for container detection with resource limits - Runner script for CI integration
Required for building the systemd_introspection plugin which uses sd-bus API (sd_pid_get_unit, sd_bus_open_system, etc.).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request
Summary
Add three Linux introspection plugins that enrich gateway discovery with OS-level metadata. Each plugin implements
IntrospectionProviderand registers vendor REST endpoints on Apps and Components:libprocfs_introspection.so) - reads/procfor process info (PID, RSS, CPU ticks, threads, exe path, cmdline)libsystemd_introspection.so) - maps ROS 2 nodes to systemd units viasd_pid_get_unit(), queries properties via sd-buslibcontainer_introspection.so) - detects Docker/podman/containerd via cgroup path analysis, reads cgroup v2 resource limitsAlso includes:
libmedkit_linux_utils.awithproc_reader,cgroup_reader, andPidCache(TTL-based, thread-safe)PluginContext::get_child_apps()for Component-level aggregationlibsystemd-devadded to devcontainer DockerfileIssue
Type
Testing
Unit tests (31 new, all in
ros2_medkit_linux_introspection):test_proc_reader- real/proc/self+ synthetic/procin tmpdir (4 tests)test_cgroup_reader- container ID extraction, runtime detection, resource limits (10 tests)test_pid_cache- TTL refresh, auto-refresh, missing nodes, empty/nonexistent proc dirs (6 tests)test_procfs_plugin- JSON serialization (1 test)test_systemd_plugin- JSON serialization, graceful skip (2 tests)test_container_plugin- JSON serialization, not-containerized skip (2 tests)Integration tests (launch_testing):
test_procfs_introspection- live PID mapping, resource usage, Component aggregation, capabilities, 404test_combined_introspection- procfs + container route isolation (200 + 404 coexistence on host)Docker integration tests (standalone pytest):
test_systemd_introspection- unit info, restart count, watchdog, aggregationtest_container_introspection- container ID, runtime, memory/CPU limits, aggregationFull suite: 1302 unit tests pass, 2066 lint tests pass, 0 failures.
Checklist