-
Notifications
You must be signed in to change notification settings - Fork 205
Added supervisord to survive process restart #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mishushakov
wants to merge
17
commits into
main
Choose a base branch
from
jupyter-supervisord
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
82a9992
use systemd
mishushakov c36ff29
switched to supervisord
mishushakov 1914c4a
kill uvicorn on jupyter death
mishushakov 3910e65
return chmod
mishushakov 12dc7f4
update
mishushakov 27ae39d
copy start jupyter
mishushakov f86f1ca
bugbot comment
mishushakov 730f76a
kill as group
mishushakov b2144d3
duplicate conf
mishushakov 535057c
kill by pid to avoid killing similar processes
mishushakov ced28e2
Merge branch 'main' into jupyter-supervisord
mishushakov dd60d00
added tests
mishushakov b9eedf0
update tests
mishushakov 2e145cc
kill as root
mishushakov a31d702
added try/catch blocks and ignore non-zero code on kill
mishushakov 443c546
added try/catch blocks to avoid throwing on negative err code
mishushakov 35f17c9
lint
mishushakov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| import { expect } from 'vitest' | ||
| import { sandboxTest, wait } from './setup' | ||
|
|
||
| async function waitForHealth(sandbox: any, maxRetries = 10, intervalMs = 100) { | ||
| for (let i = 0; i < maxRetries; i++) { | ||
| try { | ||
| const result = await sandbox.commands.run( | ||
| 'curl -s -o /dev/null -w "%{http_code}" http://0.0.0.0:49999/health' | ||
| ) | ||
| if (result.stdout.trim() === '200') { | ||
| return true | ||
| } | ||
| } catch { | ||
| // Connection refused or other error, retry | ||
| } | ||
| await wait(intervalMs) | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| sandboxTest('restart after jupyter kill', async ({ sandbox }) => { | ||
| // Verify health is up initially | ||
| const initialHealth = await waitForHealth(sandbox) | ||
| expect(initialHealth).toBe(true) | ||
|
|
||
| // Kill the jupyter process as root | ||
| // The command handle may get killed too (since killing jupyter cascades to code-interpreter), | ||
| // so we catch the error. | ||
| try { | ||
| await sandbox.commands.run("kill -9 $(pgrep -f 'jupyter server')", { | ||
| user: 'root', | ||
| }) | ||
| } catch { | ||
| // Expected — the kill cascade may terminate the command handle | ||
| } | ||
|
|
||
| // Wait for supervisord to restart both services (jupyter startup + code-interpreter startup) | ||
| const recovered = await waitForHealth(sandbox, 60, 500) | ||
| expect(recovered).toBe(true) | ||
|
|
||
| // Verify code execution works after recovery | ||
| const result = await sandbox.runCode('x = 1; x') | ||
| expect(result.text).toEqual('1') | ||
| }) | ||
|
|
||
| sandboxTest('restart after code-interpreter kill', async ({ sandbox }) => { | ||
| // Verify health is up initially | ||
| const initialHealth = await waitForHealth(sandbox) | ||
| expect(initialHealth).toBe(true) | ||
|
|
||
| // Kill the code-interpreter process as root | ||
| try { | ||
| await sandbox.commands.run('kill -9 $(cat /var/run/code-interpreter.pid)', { | ||
| user: 'root', | ||
| }) | ||
| } catch { | ||
| // Expected — killing code-interpreter may terminate the command handle | ||
| } | ||
|
|
||
| // Wait for supervisord to restart it and health to come back | ||
| const recovered = await waitForHealth(sandbox, 60, 500) | ||
| expect(recovered).toBe(true) | ||
|
|
||
| // Verify code execution works after recovery | ||
| const result = await sandbox.runCode('x = 1; x') | ||
| expect(result.text).toEqual('1') | ||
| }) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| import asyncio | ||
|
|
||
| from e2b_code_interpreter.code_interpreter_async import AsyncSandbox | ||
|
|
||
|
|
||
| async def wait_for_health(sandbox: AsyncSandbox, max_retries=10, interval_ms=100): | ||
| for _ in range(max_retries): | ||
| try: | ||
| result = await sandbox.commands.run( | ||
| 'curl -s -o /dev/null -w "%{http_code}" http://0.0.0.0:49999/health' | ||
| ) | ||
| if result.stdout.strip() == "200": | ||
| return True | ||
| except Exception: | ||
| pass | ||
| await asyncio.sleep(interval_ms / 1000) | ||
| return False | ||
|
|
||
|
|
||
| async def test_restart_after_jupyter_kill(async_sandbox: AsyncSandbox): | ||
| # Verify health is up initially | ||
| assert await wait_for_health(async_sandbox) | ||
|
|
||
| # Kill the jupyter process as root | ||
| # The command handle may get killed too (killing jupyter cascades to code-interpreter), | ||
| # so we catch the error. | ||
| try: | ||
| await async_sandbox.commands.run( | ||
| "kill -9 $(pgrep -f 'jupyter server')", user="root" | ||
| ) | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Wait for supervisord to restart both services | ||
| assert await wait_for_health(async_sandbox, 60, 500) | ||
|
|
||
| # Verify code execution works after recovery | ||
| result = await async_sandbox.run_code("x = 1; x") | ||
| assert result.text == "1" | ||
|
|
||
|
|
||
| async def test_restart_after_code_interpreter_kill(async_sandbox: AsyncSandbox): | ||
| # Verify health is up initially | ||
| assert await wait_for_health(async_sandbox) | ||
|
|
||
| # Kill the code-interpreter process as root | ||
| try: | ||
| await async_sandbox.commands.run( | ||
| "kill -9 $(cat /var/run/code-interpreter.pid)", user="root" | ||
| ) | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Wait for supervisord to restart it and health to come back | ||
| assert await wait_for_health(async_sandbox, 60, 500) | ||
|
|
||
| # Verify code execution works after recovery | ||
| result = await async_sandbox.run_code("x = 1; x") | ||
| assert result.text == "1" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| import time | ||
|
|
||
| from e2b_code_interpreter.code_interpreter_sync import Sandbox | ||
|
|
||
|
|
||
| def wait_for_health(sandbox: Sandbox, max_retries=10, interval_ms=100): | ||
| for _ in range(max_retries): | ||
| try: | ||
| result = sandbox.commands.run( | ||
| 'curl -s -o /dev/null -w "%{http_code}" http://0.0.0.0:49999/health' | ||
| ) | ||
| if result.stdout.strip() == "200": | ||
| return True | ||
| except Exception: | ||
| pass | ||
| time.sleep(interval_ms / 1000) | ||
| return False | ||
|
|
||
|
|
||
| def test_restart_after_jupyter_kill(sandbox: Sandbox): | ||
| # Verify health is up initially | ||
| assert wait_for_health(sandbox) | ||
|
|
||
| # Kill the jupyter process as root | ||
| # The command handle may get killed too (killing jupyter cascades to code-interpreter), | ||
| # so we catch the error. | ||
| try: | ||
| sandbox.commands.run("kill -9 $(pgrep -f 'jupyter server')", user="root") | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Wait for supervisord to restart both services | ||
| assert wait_for_health(sandbox, 60, 500) | ||
|
|
||
| # Verify code execution works after recovery | ||
| result = sandbox.run_code("x = 1; x") | ||
| assert result.text == "1" | ||
|
|
||
|
|
||
| def test_restart_after_code_interpreter_kill(sandbox: Sandbox): | ||
| # Verify health is up initially | ||
| assert wait_for_health(sandbox) | ||
|
|
||
| # Kill the code-interpreter process as root | ||
| try: | ||
| sandbox.commands.run( | ||
| "kill -9 $(cat /var/run/code-interpreter.pid)", user="root" | ||
| ) | ||
| except Exception: | ||
| pass | ||
|
|
||
| # Wait for supervisord to restart it and health to come back | ||
| assert wait_for_health(sandbox, 60, 500) | ||
|
|
||
| # Verify code execution works after recovery | ||
| result = sandbox.run_code("x = 1; x") | ||
| assert result.text == "1" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| #!/bin/bash | ||
|
|
||
| echo "Waiting for Jupyter server to be ready..." | ||
| until curl -s -o /dev/null -w '%{http_code}' http://localhost:8888/api/status | grep -q '200'; do | ||
| sleep 0.5 | ||
| done | ||
| echo "Jupyter server is ready, starting Code Interpreter..." | ||
|
|
||
| echo $$ > /var/run/code-interpreter.pid | ||
| exec /root/.server/.venv/bin/uvicorn main:app --host 0.0.0.0 --port 49999 --workers 1 --no-access-log --no-use-colors --timeout-keep-alive 640 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| #!/bin/bash | ||
|
|
||
| /usr/local/bin/jupyter server --IdentityProvider.token="" | ||
|
|
||
| # Jupyter exited — kill code-interpreter so supervisord restarts both | ||
| echo "Jupyter exited, killing code-interpreter..." | ||
| kill "$(cat /var/run/code-interpreter.pid)" 2>/dev/null | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,22 +1,4 @@ | ||
| #!/bin/bash | ||
|
|
||
| function start_jupyter_server() { | ||
| counter=0 | ||
| response=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:8888/api/status") | ||
| while [[ ${response} -ne 200 ]]; do | ||
| let counter++ | ||
| if ((counter % 20 == 0)); then | ||
| echo "Waiting for Jupyter Server to start..." | ||
| sleep 0.1 | ||
| fi | ||
|
|
||
| response=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:8888/api/status") | ||
| done | ||
|
|
||
| cd /root/.server/ | ||
| .venv/bin/uvicorn main:app --host 0.0.0.0 --port 49999 --workers 1 --no-access-log --no-use-colors --timeout-keep-alive 640 | ||
| } | ||
|
|
||
| echo "Starting Code Interpreter server..." | ||
| start_jupyter_server & | ||
| MATPLOTLIBRC=/root/.config/matplotlib/.matplotlibrc jupyter server --IdentityProvider.token="" >/dev/null 2>&1 | ||
| supervisord -c /etc/supervisord.conf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| [supervisord] | ||
| nodaemon=true | ||
| logfile=/var/log/supervisord.log | ||
| pidfile=/var/run/supervisord.pid | ||
|
|
||
| [program:jupyter] | ||
| command=/root/.jupyter/start-jupyter.sh | ||
| environment=MATPLOTLIBRC="/root/.config/matplotlib/.matplotlibrc" | ||
| stdout_logfile=/dev/null | ||
| stderr_logfile=/dev/fd/1 | ||
| stderr_logfile_maxbytes=0 | ||
| autorestart=true | ||
mishushakov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| stopasgroup=true | ||
| killasgroup=true | ||
| priority=10 | ||
mishushakov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| [program:code-interpreter] | ||
| command=/root/.jupyter/start-code-interpreter.sh | ||
| directory=/root/.server | ||
| stdout_logfile=/dev/fd/1 | ||
| stdout_logfile_maxbytes=0 | ||
| stderr_logfile=/dev/fd/1 | ||
| stderr_logfile_maxbytes=0 | ||
| autorestart=true | ||
| stopasgroup=true | ||
| killasgroup=true | ||
| priority=20 | ||
| startsecs=0 | ||
mishushakov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unguarded PID file read may produce spurious errors
Low Severity
The
kill "$(cat /var/run/code-interpreter.pid)" 2>/dev/nullcommand doesn't check if the PID file exists before reading it. The2>/dev/nullonly suppresseskill's stderr, notcat's. If Jupyter exits before code-interpreter has written its PID file (e.g., very early crash on first startup),catemits an unhandled error to stderr (visible in supervisord logs), andkillreceives an empty-string argument. Adding a file-existence guard (e.g., wrapping in[ -f ... ] &&) would make this robust.