Skip to content

Conversation

@moarshy
Copy link
Contributor

@moarshy moarshy commented Feb 13, 2025

This PR addresses issues with graceful shutdown of the node-app container in our Docker environment (systemd deployments remain unaffected). The changes ensure that the HTTP server properly receives SIGTERM so that node unregistration is executed before termination.

docker-compose.yml

  • Added init: true and set stop_grace_period: 30s to allow the container sufficient time for graceful shutdown.

Dockerfile-node-dev

  • Updated the CMD to execute an entrypoint script, ensuring the main process receives signals directly.

entrypoint.sh

  • Modified to launch the WS server and Celery worker in detached mode using nohup, and run the HTTP server as the foreground process (via exec) so it properly handles SIGTERM.

@moarshy
Copy link
Contributor Author

moarshy commented Feb 13, 2025

@K-Mistele just some background on this. We have this mechanism to unregister the node from the hub when node shutdown. This wasn't working for docker properly. Here we are trying to address it. Was wondering what you think of this method and if there are any concerns? Thanks a lot.

@K-Mistele
Copy link
Contributor

@K-Mistele just some background on this. We have this mechanism to unregister the node from the hub when node shutdown. This wasn't working for docker properly. Here we are trying to address it. Was wondering what you think of this method and if there are any concerns? Thanks a lot.

Where does this mechanism live? I can dig into it. I'm guessing because the node used to be shut down with a launch script, but it's not with docker compose, that that's where the issue is. Probably fixable by listening for SIGINT and de-registering before the process terminates.

in the long term though you can't ever 100% control this since someone could force-terminate a naptha node with a SIGTERM, or a compute node could go offline or become unreachable. It may be desirable for the hub to track clients & automatically de-register dead ones after some period of unreachability; shouldn't be too hard since clients have to have a public IP / domain name.

@moarshy
Copy link
Contributor Author

moarshy commented Feb 14, 2025

Where does this mechanism live? I can dig into it. I'm guessing because the node used to be shut down with a launch script, but it's not with docker compose, that that's where the issue is. Probably fixable by listening for SIGINT and de-registering before the process terminates.

@K-Mistele Here is the code to shutdown where unregister_node is triggered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants