-
Notifications
You must be signed in to change notification settings - Fork 0
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
The node spawns multiple background workers (Primary, Worker, network handlers) without a centralized supervision strategy, making coordinated shutdown and failure handling difficult.
Problem Locations
crates/data-chain/src/primary/runner.rs:181- Primary spawnedcrates/data-chain/src/worker/core.rs:225- Worker spawnedcrates/node/src/main.rs:706- Node handle spawnedcrates/node/src/network.rs:106- Network listener spawned
Current Pattern
// crates/data-chain/src/primary/runner.rs:180-183
let config_clone = config.clone();
let handle = tokio::spawn(async move {
let mut primary = Primary::new_with_storage(...).await;
// ...
});Each component spawns tasks independently with no coordination.
Issues
- No Graceful Shutdown: Components can't coordinate shutdown order
- Dependency Blindness: Network may shutdown before consensus flushes
- Partial Failure Handling: One failed component doesn't trigger others to gracefully stop
- Resource Cleanup: No guarantee storage is flushed before process exit
Recommended Fix
Implement a task supervision tree similar to Erlang/OTP:
use tokio_util::task::TaskTracker;
use tokio_util::sync::CancellationToken;
pub struct NodeSupervisor {
tracker: TaskTracker,
token: CancellationToken,
}
impl NodeSupervisor {
pub fn new() -> Self {
Self {
tracker: TaskTracker::new(),
token: CancellationToken::new(),
}
}
pub fn spawn<F>(&self, name: &str, future: F)
where
F: Future<Output = Result<(), Error>> + Send + 'static
{
let token = self.token.clone();
self.tracker.spawn(async move {
tokio::select! {
result = future => {
if let Err(e) = result {
error!("{} failed: {:?}", name, e);
}
}
_ = token.cancelled() => {
info!("{} shutting down", name);
}
}
});
}
pub async fn shutdown(&self) {
info!("Initiating graceful shutdown");
self.token.cancel();
self.tracker.close();
self.tracker.wait().await;
info!("All tasks terminated");
}
}Shutdown Order Recommendation
- Stop accepting new network connections
- Drain in-flight consensus rounds
- Flush pending storage writes
- Close database connections
- Exit
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request