You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Epic: Multi-Process Collector Coordination via Event Bus Architecture
Parent Epic:#93 - Develop Core DaemonEye Monitoring System Related Issues:#87 (Event Bus Architecture), #121 (Topic Infrastructure - Completed) Specification:.kiro/specs/daemoneye-core-monitoring/tasks.md Section 2.6
Overview
This epic tracks the implementation of multi-process collector coordination capabilities that enable DaemonEye to scale within a single system by coordinating multiple collector processes. By implementing topic-based event distribution through the custom daemoneye-eventbus component, multiple collectors can coordinate their efforts, share workload, and provide redundancy without tight coupling.
Architecture: Cross-Process IPC with Topic-Based Messaging
Critical Architectural Clarification:
The daemoneye-eventbus is a cross-process IPC (Inter-Process Communication) system, NOT an in-process message broker. Key architectural points:
Process Architecture:
daemoneye-agent: Separate process that hosts the eventbus broker AND acts as a client
Collector Processes: Each collector (procmond, netmond, fsmond, etc.) runs as a separate OS process
Communication: All processes communicate via cross-process IPC using the interprocess crate
IPC Transport Layer:
Windows: Named pipes via CreateNamedPipe/CreateFile
Unix (Linux/macOS/FreeBSD): Unix domain sockets
Protocol: Protobuf-serialized messages over the IPC transport
Async: All IPC is async via Tokio runtime
Topic-Based Messaging Layer:
Layered Architecture: The pub/sub topic system is built ON TOP OF the cross-process IPC infrastructure
Custom Implementation: Built entirely on existing dependencies (no external brokers)
Broker Role: daemoneye-agent hosts the broker and routes messages between collector processes
However, the current architecture faces limitations when scaling to multiple collector processes on a single system:
Point-to-Point IPC: Current IPC is 1:1 between daemoneye-agent process and a single collector process
No Coordination: Multiple collector processes cannot coordinate tasks or share workload
No Automatic Failover: If a collector process fails, there's no automatic task redistribution
Static Routing: Tasks are sent to specific collector processes rather than routed by capability
No Load Balancing: Cannot distribute high-volume collection tasks across multiple process instances
Strategic Importance
Multi-collector coordination within a single system is essential for:
System Resource Utilization: Fully utilize multi-core CPUs by running parallel collector processes
Workload Isolation: Different collector types handling their domains (process, network, filesystem)
Fault Tolerance: Automatic failover when collector processes crash
Performance: Distribute high-load scenarios across multiple processes
Proposed Solution: Custom Cross-Process Event Bus
Implementation Approach
Build a custom event bus using only existing workspace dependencies:
Core Dependencies (Already in Use):
interprocess (v2.2.3) - Cross-platform IPC transport (named pipes/Unix sockets)
tokio (v1.47.1) - Async runtime for IPC operations
prost (v0.14.1) - Efficient protobuf serialization for cross-process messages
dashmap (v6+) - Lock-free concurrent HashMap for topic routing
Why Custom Implementation:
✅ No external libraries or unsafe code
✅ Full control over cross-process IPC patterns
✅ Optimized for local single-system performance
✅ Minimal dependencies and attack surface
✅ Tailored backpressure and queue strategies
Architecture Components
// Broker hosted in daemoneye-agent processpubstructCrossProcessEventBus{// Topic routing: topic pattern -> subscriber processestopics:Arc<DashMap<String,Vec<ProcessSubscription>>>,// Active IPC connections to collector processesipc_connections:Arc<DashMap<ProcessId,IpcConnection>>,// Load balancing state for queue groupsqueue_state:Arc<DashMap<String,QueueGroup>>,// Backpressure managementflow_control:Arc<FlowController>,}pubstructProcessSubscription{process_id:ProcessId,subscription_type:SubscriptionType,ipc_sender: mpsc::Sender<EventEnvelope>,}pubenumSubscriptionType{/// One-to-one: specific process subscriptionDirect,/// One-to-many: broadcast to all matching subscribersBroadcast,/// Queue: load-balanced across multiple processesQueue{group_name:String},}pubstructQueueGroup{members:Vec<ProcessId>,next_index:AtomicUsize,// Round-robin statestrategy:LoadBalancingStrategy,}pubenumLoadBalancingStrategy{RoundRobin,LeastLoaded,Random,}
Communication Patterns
1. One-to-One (Direct Task Assignment)
// Agent → specific procmond instance
eventbus.publish_direct(
process_id,"control.collector.task.process",
task
).await?;
2. One-to-Many (Broadcast)
// Agent → all procmond instances
eventbus.publish_broadcast("control.collector.config.reload",
config
).await?;
3. Pub/Sub (Topic-Based Routing)
// Collector subscribes to topic pattern
eventbus.subscribe("events.process.+",SubscriptionType::Broadcast).await?;// Agent publishes to topic
eventbus.publish("events.process.spawned", event).await?;
4. Queue (Load-Balanced Distribution)
// Multiple procmond processes join queue group
eventbus.subscribe_queue("control.collector.task.process","procmond-workers").await?;// Agent publishes task - only ONE process receives it
eventbus.publish_queue("control.collector.task.process", task).await?;
Backpressure Handling
pubstructFlowController{/// Per-process send buffer limitsmax_buffer_size:usize,/// Strategy when buffer fulloverflow_strategy:OverflowStrategy,}pubenumOverflowStrategy{/// Block producer until space availableBlock,/// Drop oldest messagesDropOldest,/// Drop newest messagesDropNewest,/// Return error to producerError,}
Key Components
1. Topic-Based Message Routing (Cross-Process)
Task Distribution Topics (agent → collectors via IPC):
control.collector.task.process - Process monitoring tasks sent to procmond processes
control.collector.task.network - Network monitoring tasks sent to netmond processes
control.collector.task.filesystem - Filesystem monitoring tasks sent to fsmond processes
control.collector.task.performance - Performance monitoring tasks sent to perfmond processes
Result Aggregation Topics (collectors → agent via IPC):
events.process.* - Process events from procmond processes
events.network.* - Network events from netmond processes
events.filesystem.* - Filesystem events from fsmond processes
events.performance.* - Performance events from perfmond processes
Health & Control Topics (bidirectional cross-process):
control.health.heartbeat - Collector process heartbeat signals
control.health.status - Process health status updates
control.collector.lifecycle - Process start/stop/config commands
2. Capability-Based Routing (Cross-Process)
Collector processes advertise their capabilities through the event bus:
Capability Advertisement: Each collector process publishes its SourceCaps on startup via IPC
Dynamic Routing: Agent routes tasks to appropriate collector processes based on capabilities
Automatic Discovery: New collector processes automatically join the coordination mesh via IPC
Graceful Departure: Collector processes unsubscribe topics on shutdown
3. Load Balancing Strategies (Cross-Process)
Strategies:
Round-Robin: Distribute tasks evenly across available collector processes (lowest latency)
Least-Loaded: Route to collector process with lowest queue depth (best for fairness)
Random: Random selection (simple, minimal state)
Failover:
Heartbeat Monitoring: Detect failed collector processes via missed heartbeats
Task Redistribution: Automatically reassign tasks from failed processes
Process Restart: Automatically restart crashed collector processes
4. Result Aggregation & Correlation (Cross-Process)
Correlation Metadata:
structCorrelationMetadata{correlation_id:Uuid,// Unique workflow identifierparent_correlation_id:Option<Uuid>,// For hierarchical workflowsroot_correlation_id:Uuid,// Original workflow rootsequence_number:u64,// Ordering within workflowworkflow_stage:String,// Current stage (e.g., "collection", "analysis")source_process:String,// Originating collector processcorrelation_tags:HashMap<String,String>,// Flexible tagging}
Aggregation Strategies:
Stream-Based: Real-time aggregation as results arrive from different processes via IPC
Batch-Based: Collect results until timeout or threshold met
Correlation-Based: Group results by correlation IDs for multi-stage workflows across processes
Cross-Platform Support Matrix
Component
Windows
Linux
macOS
FreeBSD
Notes
interprocess
✅ Primary
✅ Primary
✅ Primary
✅ Secondary
Named pipes (Win) / Unix sockets - Cross-process IPC
tokio
✅ Primary
✅ Primary
✅ Primary
✅ Secondary
Full async runtime support
tokio::sync
✅ Primary
✅ Primary
✅ Primary
✅ Secondary
In-process channels (within broker process only)
prost
✅ Primary
✅ Primary
✅ Primary
✅ Secondary
Pure Rust, no platform deps
dashmap
✅ Primary
✅ Primary
✅ Primary
✅ Secondary
Lock-free concurrent data structures
Cross-Process Transport Strategy:
Primary: Unix domain sockets (Linux/macOS/FreeBSD) or Named pipes (Windows) via interprocess for local cross-process IPC
Fallback: TCP loopback (127.0.0.1) for platforms without Unix socket support (rare)
Not Used: Network sockets to external systems (out of scope)
Dependency Updates Required
No new external dependencies - use only existing workspace dependencies:
[workspace.dependencies]
# Already in use - sufficient for custom event businterprocess = "2.2.3"# Cross-process IPC transport (CRITICAL)prost = "0.14.1"# Cross-process message serializationtokio = { version = "1.47.1", features = ["full"] } # Async runtime# Add for custom event bus implementationdashmap = "6.1"# Lock-free concurrent HashMap for topic routingparking_lot = "0.12"# Faster sync primitives (optional optimization)
Epic: Multi-Process Collector Coordination via Event Bus Architecture
Parent Epic: #93 - Develop Core DaemonEye Monitoring System
Related Issues: #87 (Event Bus Architecture), #121 (Topic Infrastructure - Completed)
Specification:
.kiro/specs/daemoneye-core-monitoring/tasks.mdSection 2.6Overview
This epic tracks the implementation of multi-process collector coordination capabilities that enable DaemonEye to scale within a single system by coordinating multiple collector processes. By implementing topic-based event distribution through the custom
daemoneye-eventbuscomponent, multiple collectors can coordinate their efforts, share workload, and provide redundancy without tight coupling.Architecture: Cross-Process IPC with Topic-Based Messaging
Critical Architectural Clarification:
The
daemoneye-eventbusis a cross-process IPC (Inter-Process Communication) system, NOT an in-process message broker. Key architectural points:Process Architecture:
daemoneye-agent: Separate process that hosts the eventbus broker AND acts as a clientinterprocesscrateIPC Transport Layer:
CreateNamedPipe/CreateFileTopic-Based Messaging Layer:
Design Goals
Scope: Single-system, multi-process coordination (NOT distributed systems)
Primary Goals
Performance Optimization:
Cross-Platform Compatibility:
Security:
Code Safety & Dependencies:
interprocess,tokio,prost,dashmapCommunication Patterns:
Background & Current Limitations
Current Architecture
DaemonEye's collector-core framework provides a solid foundation with:
Scaling Challenges
However, the current architecture faces limitations when scaling to multiple collector processes on a single system:
Strategic Importance
Multi-collector coordination within a single system is essential for:
Proposed Solution: Custom Cross-Process Event Bus
Implementation Approach
Build a custom event bus using only existing workspace dependencies:
Core Dependencies (Already in Use):
interprocess(v2.2.3) - Cross-platform IPC transport (named pipes/Unix sockets)tokio(v1.47.1) - Async runtime for IPC operationsprost(v0.14.1) - Efficient protobuf serialization for cross-process messagesdashmap(v6+) - Lock-free concurrent HashMap for topic routingWhy Custom Implementation:
Architecture Components
Communication Patterns
1. One-to-One (Direct Task Assignment)
2. One-to-Many (Broadcast)
3. Pub/Sub (Topic-Based Routing)
4. Queue (Load-Balanced Distribution)
Backpressure Handling
Key Components
1. Topic-Based Message Routing (Cross-Process)
Task Distribution Topics (agent → collectors via IPC):
control.collector.task.process- Process monitoring tasks sent to procmond processescontrol.collector.task.network- Network monitoring tasks sent to netmond processescontrol.collector.task.filesystem- Filesystem monitoring tasks sent to fsmond processescontrol.collector.task.performance- Performance monitoring tasks sent to perfmond processesResult Aggregation Topics (collectors → agent via IPC):
events.process.*- Process events from procmond processesevents.network.*- Network events from netmond processesevents.filesystem.*- Filesystem events from fsmond processesevents.performance.*- Performance events from perfmond processesHealth & Control Topics (bidirectional cross-process):
control.health.heartbeat- Collector process heartbeat signalscontrol.health.status- Process health status updatescontrol.collector.lifecycle- Process start/stop/config commands2. Capability-Based Routing (Cross-Process)
Collector processes advertise their capabilities through the event bus:
SourceCapson startup via IPC3. Load Balancing Strategies (Cross-Process)
Strategies:
Failover:
4. Result Aggregation & Correlation (Cross-Process)
Correlation Metadata:
Aggregation Strategies:
Cross-Platform Support Matrix
Cross-Process Transport Strategy:
interprocessfor local cross-process IPCDependency Updates Required
No new external dependencies - use only existing workspace dependencies:
Implementation Plan
Phase 1: Foundation ✅ (Completed - #113, #121)
+and#patterns)Phase 2: Coordination Workflows (#115)
tokio::sync::mpsc(bounded channels),dashmap(topic routing)Phase 3: Advanced Routing (#116, #117)
dashmapfor routing tables, queue group logic for cross-process coordinationPhase 4: Testing & Validation (#118)
criterion,proptest,instaPhase 5: Complete Topic Hierarchy (#119)
events.*topic hierarchy for cross-process event routingcontrol.*topic structure for cross-process control planePhase 6: Correlation & Forensics (#120)
Benefits & Value Proposition
Performance
Scalability (Single System)
Reliability
Security
Maintainability
Requirements Mapping
This epic implements the following requirements from the specification:
Subtasks
Success Criteria
Technical Considerations
Cross-Platform Support
interprocessSecurity
Performance
Monitoring & Observability
tracinginfrastructure with process-aware spansStatus: In Progress
Priority: High - Essential for multi-process coordination
Estimated Effort: 3-4 weeks across all subtasks
Next Steps:
interprocess+tokio