add cancel request protocol for ZMQ env client-server#962
add cancel request protocol for ZMQ env client-server#962mikasenghaas wants to merge 1 commit intomainfrom
Conversation
When a rollout or group request is cancelled (e.g. scheduler timeout), the client now sends a CancelRequest message to the server so it can stop processing the request instead of wasting inference compute. The server tracks request_id→task mappings and cancels the asyncio task on receiving a cancel message. Cancel messages are fire-and-forget on the client side — failures are logged but do not affect the cancellation flow. The server handles cancels inline in the serve loop (no task spawned) for minimal latency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| self.logger.warning( | ||
| f"Failed to deserialize message {request_id[:7]}" | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Deserialization failure silently drops request without responding
Low Severity
Moving msgpack.unpackb from process_request (where failures were caught by the except Exception handler that sends an error BaseResponse back to the client) into the serve loop (where failures just continue without sending any response) changes error behavior. Previously, a deserialization failure produced an immediate error response to the client; now the client's future is never resolved and it silently hangs until timeout. While rare in practice, this makes debugging serialization mismatches significantly harder.


Summary
CancelRequestmessage type to the ZMQ client-server protocol so the client can notify the server to stop processing cancelled rollout/group requestsCancelledError(e.g. scheduler timeout) and fromcancel_all_pending()— fire-and-forget, failures are logged but don't affect the cancellation flowrequest_id → asyncio.Taskmappings and cancels the corresponding task when a cancel message arrives, handled inline in the serve loop for minimal latencyBefore: Cancellation was one-directional and local-only. The server continued burning inference compute on cancelled requests until the response was either silently ignored or hit a ZMQError.
After: The server receives a cancel message and cancels the in-flight asyncio task, stopping inference work promptly.
Test plan
CancelRequestserialization/deserialization roundtrip (Pydantic + msgpack)send_cancel()sends properly formatted message, no-ops on empty list, swallows errorssend_request()catchesCancelledError, cleans up pending entry, sends cancel to servercancel_all_pending()sends cancel for all pending request IDs_handle_cancel()cancels tracked tasks, ignores unknown/done IDs, handles invalid requestsalphabet_sortenv failure)🤖 Generated with Claude Code
Note
Medium Risk
Adds a new cross-process cancellation path and changes the ZMQ server receive loop to parse and branch on message type, which can affect in-flight request handling and concurrency edge cases.
Overview
Adds a new
CancelRequestmessage type and exports it viaverifiers.workersto support server-side cancellation of in-flight work.Updates
ZMQEnvClientto fire-and-forgetsend_cancel()calls whensend_request()is externally cancelled and whencancel_all_pending()clears pending futures.Updates
ZMQEnvServerto trackrequest_id → asyncio.Task, detectcancelmessages inline in the serve loop, and cancel/remove the corresponding task; includes new tests covering serialization, client send behavior, and server cancellation, plus a crash-recovery test tweak to ignore cancel frames.Written by Cursor Bugbot for commit fd55552. This will update automatically on new commits. Configure here.