Summary
The e2e tests can lease and connect to exporters (test 47) and can lease and connect to exporters by name (test 48) fail intermittently with Error: Connection to exporter lost. The client successfully acquires a lease but then times out waiting for the ready connection on the Unix socket, never reaching the beforeLease hook completion or LEASE_READY status monitoring on the client side.
Failing CI Run
Reproduction Timeline (test 47)
08:23:01 INFO [jumpstarter.client.lease] Acquiring lease 019d6c30-0a9b-7b81-bc30-cbd918008be8
08:23:01 INFO Lease acquired successfully! (0:00:00)
08:23:01 INFO Waiting for ready connection at /run/user/1001/jumpstarter-nkjy63wk/socket
← 20 seconds of silence — no beforeLease hook log, no status_monitor update on the client side
08:23:21 INFO Releasing Lease 019d6c30-0a9b-7b81-bc30-cbd918008be8
Error: Connection to exporter lost
The same pattern repeats for test 48 (jmp shell --client test-client-oidc --name test-exporter-oidc j power on), also dying after ~20s with the same error.
Observations
-
The exporter side appears healthy — the exporter logs from this test show it had been successfully handling previous leases (sessions created, power on commands executed, sessions closed cleanly).
-
No beforeLease hook activity on the client — In passing runs, the client logs show Waiting for beforeLease hook to complete... followed by Status changed: None -> LEASE_READY. In the failing run, neither of these messages appears — the client goes straight from "Waiting for ready connection" to "Releasing Lease" after ~20s.
-
No exporter-side log entry for the failing lease — The exporter logs dumped on failure don't show a Starting new lease: 019d6c30-0a9b... entry, suggesting the exporter never received or processed the lease assignment for this specific lease.
-
Flaky, not deterministic — The re-run (run 24125318534) passed all 52 tests on the same commit, suggesting a race condition or transient infrastructure issue.
Possible Root Causes
- Race condition in lease routing: The controller assigned the lease, but the exporter hadn't fully re-registered after the previous lease teardown, causing the router to fail to connect the client to the exporter.
- Socket readiness timeout: The client may have a hardcoded ~20s timeout waiting for the Unix socket to become ready, and the exporter-side session setup took too long or never started.
- Router/controller propagation delay: The lease was marked
Ready in k8s but the router hadn't yet updated its routing table for the new lease.
Environment
- Runner:
ubuntu-24.04 (x86_64)
- Test file:
e2e/tests.bats, lines 471-484
Summary
The e2e tests
can lease and connect to exporters(test 47) andcan lease and connect to exporters by name(test 48) fail intermittently withError: Connection to exporter lost. The client successfully acquires a lease but then times out waiting for the ready connection on the Unix socket, never reaching thebeforeLeasehook completion orLEASE_READYstatus monitoring on the client side.Failing CI Run
e2e-tests (ubuntu-24.04), step 6 "Run e2e tests"Reproduction Timeline (test 47)
The same pattern repeats for test 48 (
jmp shell --client test-client-oidc --name test-exporter-oidc j power on), also dying after ~20s with the same error.Observations
The exporter side appears healthy — the exporter logs from this test show it had been successfully handling previous leases (sessions created,
power oncommands executed, sessions closed cleanly).No
beforeLeasehook activity on the client — In passing runs, the client logs showWaiting for beforeLease hook to complete...followed byStatus changed: None -> LEASE_READY. In the failing run, neither of these messages appears — the client goes straight from "Waiting for ready connection" to "Releasing Lease" after ~20s.No exporter-side log entry for the failing lease — The exporter logs dumped on failure don't show a
Starting new lease: 019d6c30-0a9b...entry, suggesting the exporter never received or processed the lease assignment for this specific lease.Flaky, not deterministic — The re-run (run 24125318534) passed all 52 tests on the same commit, suggesting a race condition or transient infrastructure issue.
Possible Root Causes
Readyin k8s but the router hadn't yet updated its routing table for the new lease.Environment
ubuntu-24.04(x86_64)e2e/tests.bats, lines 471-484