-
Notifications
You must be signed in to change notification settings - Fork 10
Self-hosted server crashes silently during second agent deploy via CLI #297
Description
Bug Report
Description
The self-hosted Blink server (blink-server Docker container) crashes silently and repeatedly when deploying a second agent via the CLI. The first agent was deployed successfully through the UI. Subsequent deploys via blink deploy (from a GitLab CI/CD pipeline) cause the server process to die with no error output, no stack trace, and no OOM indication.
Environment
- Blink Server Version:
0.1.4+3aa5336 - Deployment: Self-hosted, Docker (
ghcr.io/coder/blink-server:latest) - Database: Amazon RDS (Postgres, external)
- Host: EC2 instance, 3.7GB RAM
- Node.js heap limit: ~1965 MB
- Docker memory limit: None (
0) - Agent bundle size: ~14MB (
agent.jsis 13,978,840 bytes)
Steps to Reproduce
- Deploy Blink server via Docker with Postgres on RDS
- Successfully deploy a first agent via the web UI (this works fine)
- Create a second agent and deploy via CLI (
blink deployfrom CI/CD pipeline) - Server crashes silently during the deploy
Expected Behavior
The server should deploy the second agent container and report success, or fail with an error message.
Actual Behavior
The server logs show:
Deploying agent <agent-id> (<deployment-id>)
Then immediately crashes with no further output. No error, no stack trace, no "Writing files to" log line (which should be the next log statement in deployAgentWithDocker).
Docker restarts the container (via restart policy), the server picks up the pending deployment from the DB, and crashes again — creating a crash loop. Each CLI retry also creates a new agent record, resulting in orphaned agents.
Diagnostics Performed
docker inspect blink-server --format '{{.State.OOMKilled}}'→falsedocker inspect blink-server --format '{{.HostConfig.Memory}}'→0(no limit)docker exec blink-server node -e "console.log(v8.getHeapStatistics().heap_size_limit / 1024 / 1024 + ' MB')"→1965.25 MB- Host has ~2.9GB available RAM, existing containers using ~315MB combined
- Only two containers running:
blink-server(~180MB) and one existingblink-agent(~135MB)
Analysis
The crash occurs between the "Deploying agent" console.log in packages/server/src/agent-deployment.ts and the "Writing files to" log. This narrows it to either:
querier.updateAgentDeployment()(status → "deploying")downloadFile()→db.selectFileByID()(reading the ~14MB file from RDS as a bytea column)
The silent crash with zero error output suggests either an unhandled promise rejection that kills the process before the error handler fires, or a crash in the Postgres client when handling a large binary column.
Workaround
None identified yet. The first agent deployed via UI works fine, suggesting the issue may be specific to the CLI deploy path or related to deploying a second agent alongside an existing one.
Additional Context
The CLI blink deploy successfully:
- Builds the agent (esbuild, 2107ms)
- Uploads all 11 files (13.3MB total) to the server
The server receives the deploy request and begins processing, but crashes during the file download/write phase.