Add: hashid-based callable registration#891
Conversation
- Document local handle namespaces and resolver ownership for callable registration. - Specify descriptor hashes and map Python serializer ids to the existing SPYC header.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR adds a comprehensive specification document for callable identity registration in local hierarchical worker systems. It defines content-derived stable identities ( ChangesCallable Identity Registration Specification
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new design document, docs/callable-identity-registration.md, which defines the callable identity registration contract for local hierarchical workers using a stable hashid instead of public integer execution slots. The review feedback highlights a few areas for improvement in the documentation: correcting a terminological inconsistency regarding whether local mailbox task frames carry the hashid or the raw 32-byte digest, explicitly documenting parent-side tracking of CallableHandle instances to prevent double-unregistration issues, and standardizing the naming of descriptor_schema_version throughout the document.
| - `hashid` is valid across the parent and local child workers. | ||
| - `local_slot` is valid only inside the target namespace that allocated it. | ||
| - Public APIs and task records carry `CallableHandle`, not local slots. | ||
| - Local mailbox task frames carry `hashid`, not slots. |
There was a problem hiding this comment.
There is a terminological inconsistency regarding what the local mailbox task frames carry. Line 109 states they carry hashid (which is defined as a 71-character string sha256:\u003c64 lowercase hex characters\u003e), whereas line 491 states they carry the raw 32-byte digest. To maintain precise terminology, this should be updated to refer to the raw 32-byte digest.
| - Local mailbox task frames carry `hashid`, not slots. | |
| - Local mailbox task frames carry the raw 32-byte digest, not slots. |
| The handle is an opaque parent-side registration object. Its `hashid` is the | ||
| stable callable identity used in task frames, but repeated registrations of | ||
| the same callable may return distinct handle objects with the same `hashid`. | ||
| Unregistering one handle must not invalidate another live handle that shares | ||
| the same `hashid`. |
There was a problem hiding this comment.
To ensure that unregistering one handle does not prematurely invalidate another live handle sharing the same hashid (e.g., if a user double-unregisters a handle), the parent-side Worker must track active CallableHandle instances. It is recommended to explicitly document this parent-side tracking requirement to prevent double-unregistration from incorrectly decrementing the target-local refcount.
References
- Ensure documentation and diagrams accurately reflect implementation details regarding resource lifecycles, especially when persistence is used to maintain internal state like caches.
| hashid -> { | ||
| local_slot, | ||
| callable_kind, | ||
| descriptor_version, |
There was a problem hiding this comment.
Inconsistent field naming: descriptor_version is used here and in line 458, but the canonical descriptor schema defined in lines 136 and 161 uses descriptor_schema_version. It is recommended to use descriptor_schema_version consistently throughout the document to avoid confusion.
| descriptor_version, | |
| descriptor_schema_version, |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/callable-identity-registration.md (2)
159-175: ⚡ Quick winSpecify the type for
signature_schema_hashin the schema.The
CHIP_CALLABLEschema listssignature_schema_hashwithout a type annotation, while line 208 clarifies it's "encoded as the raw 32-byte SHA-256 digest." For consistency with the encoding ofcallable_blob_sha256and to improve clarity, the schema should explicitly specify the type asuint8[32]or similar notation.📝 Suggested fix
CHIP_CALLABLE: descriptor_schema_version callable_kind target_arch platform runtime callable_blob_sha256 - signature_schema_hash + signature_schema_hash: uint8[32]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/callable-identity-registration.md` around lines 159 - 175, Add an explicit type for signature_schema_hash in the CHIP_CALLABLE schema so it matches the documented encoding; update the CHIP_CALLABLE section to declare signature_schema_hash as a 32-byte unsigned array (e.g., uint8[32] or equivalent) just like callable_blob_sha256, ensuring the schema and the descriptive text remain consistent (refer to CHIP_CALLABLE and the signature_schema_hash field).
570-574: ⚡ Quick winLink the "Descriptor mismatch" test to the required error code.
Line 573 specifies a "Descriptor mismatch" test expecting rejection when "same hashid with different descriptor" is registered. For completeness, this test description should reference the
HASHID_DESCRIPTOR_MISMATCHerror code defined at line 439, ensuring that the validation confirms both the rejection behavior and the correct error code.📝 Suggested fix
| Test | Expected result | | ---- | --------------- | | Stable descriptor hash | Same descriptor bytes produce same hashid. | -| Descriptor mismatch | Same hashid with different descriptor is rejected. | +| Descriptor mismatch | Same hashid with different descriptor is rejected with `HASHID_DESCRIPTOR_MISMATCH`. | | No public slot | Public APIs never return child-local slots. |🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/callable-identity-registration.md` around lines 570 - 574, Update the "Descriptor mismatch" test row in the table so it references the specific error code constant HASHID_DESCRIPTOR_MISMATCH; change the Expected result text for that row to indicate the registration is rejected with error HASHID_DESCRIPTOR_MISMATCH (so the test asserts both rejection and the exact error code), locating the row by its label "Descriptor mismatch" in the Tests table.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/callable-identity-registration.md`:
- Around line 481-494: Document that changing MAILBOX_OFF_CALLABLE to reserved
and moving the callable identity to a 32-byte callable_hash_digest at
MAILBOX_OFF_ARGS is a breaking wire-format change: update the spec to state this
is incompatible with the prior layout (where _OFF_CALLABLE carried the callable
id), require coordinated deployments of parent and child processes, and
explicitly state whether a migration path exists or this is a clean-break; also
update MAILBOX_ARGS_CAPACITY to subtract 32 bytes to account for the digest
prefix and amend the comment that mailbox layout "must match worker_manager.h
MAILBOX_OFF_*" so it specifies the new offsets and alignment expectations for
both sides.
- Around line 397-406: The spec's cleanup uncertainty handling is
underspecified: add explicit parent-side storage and protocols so the parent
records uncertain target/hashid pairs in a durable in-memory map named e.g.
uncertainCleanupMap keyed by (target, hashid) with values {timestamp, attempts,
pendingConfirmations}; define that targets also keep a local pendingCleanup set
to allow idempotent re-confirmations; specify that cleanup confirmation is
verified via an explicit ack message ("cleanupConfirmed") from targets or by
timeout+retry logic using fields in uncertainCleanupMap, and that entries are
cleared on ack or after worker restart; finally clarify that an uncertain pair
must be excluded from dispatch and from accepting new registrations for that
hashid on the parent (but targets may retry cleanup idempotently), and document
these behaviors in the "parent removes the unpublished handle entry" and "marked
uncertain" sections so implementers know which component maintains the state and
how confirmation is processed.
- Around line 92-93: Clarify that the receiving child mailbox derives the target
namespace from the child worker type: map chip children to LOCAL_CHIP, Python
sub-workers to LOCAL_PYTHON, and any foreign/interop workers to their declared
namespace (e.g., the caller-provided namespace or a header on the frame); update
the sentence after "local task frames carry only the raw `sha256` digest" to
explicitly state this mapping and that the mailbox uses the child's
type/registration metadata to resolve the namespace before looking up the
digest.
---
Nitpick comments:
In `@docs/callable-identity-registration.md`:
- Around line 159-175: Add an explicit type for signature_schema_hash in the
CHIP_CALLABLE schema so it matches the documented encoding; update the
CHIP_CALLABLE section to declare signature_schema_hash as a 32-byte unsigned
array (e.g., uint8[32] or equivalent) just like callable_blob_sha256, ensuring
the schema and the descriptive text remain consistent (refer to CHIP_CALLABLE
and the signature_schema_hash field).
- Around line 570-574: Update the "Descriptor mismatch" test row in the table so
it references the specific error code constant HASHID_DESCRIPTOR_MISMATCH;
change the Expected result text for that row to indicate the registration is
rejected with error HASHID_DESCRIPTOR_MISMATCH (so the test asserts both
rejection and the exact error code), locating the row by its label "Descriptor
mismatch" in the Tests table.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: be2f85ca-46c9-4d68-8b2a-4d7c54ca8a6a
📒 Files selected for processing (1)
docs/callable-identity-registration.md
| own namespace, so local task frames carry only the raw `sha256` digest. | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Clarify how the receiving child determines the target namespace.
Lines 92-93 state that "local task frames carry only the raw sha256 digest," and line 493-494 confirms "the receiving child mailbox determines the target namespace." However, the mechanism for this determination is not explicitly specified. If the namespace is implicit from the child worker type (e.g., chip children always use LOCAL_CHIP, Python sub-workers always use LOCAL_PYTHON), this should be stated clearly to avoid ambiguity in implementation.
📝 Suggested clarification
Add after line 93:
the same `hashid`.
+The target namespace is determined by the child worker's resolver type:
+chip child loops resolve against `LOCAL_CHIP`, Python sub/next-level
+child loops resolve against `LOCAL_PYTHON`. The mailbox protocol does
+not carry namespace because each child has exactly one active resolver
+domain.
+
Each target namespace owns a private mapping:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/callable-identity-registration.md` around lines 92 - 93, Clarify that
the receiving child mailbox derives the target namespace from the child worker
type: map chip children to LOCAL_CHIP, Python sub-workers to LOCAL_PYTHON, and
any foreign/interop workers to their declared namespace (e.g., the
caller-provided namespace or a header on the frame); update the sentence after
"local task frames carry only the raw `sha256` digest" to explicitly state this
mapping and that the mailbox uses the child's type/registration metadata to
resolve the namespace before looking up the digest.
|
|
||
| - The parent does not return the handle. | ||
| - The parent removes the unpublished handle entry. | ||
| - The parent sends cleanup to targets that may have installed the hashid. | ||
| - If cleanup cannot be confirmed, that target/hashid pair is marked uncertain | ||
| and must not be used again until cleanup is confirmed or the worker restarts. | ||
|
|
||
| This is failure cleanup with conservative uncertainty handling. It stays on | ||
| the same synchronous install path and does not introduce a separate recovery | ||
| API. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Clarify the cleanup uncertainty tracking mechanism.
Lines 401-402 state that if cleanup cannot be confirmed, "that target/hashid pair is marked uncertain and must not be used again until cleanup is confirmed or the worker restarts." However, the specification does not detail:
- Where this uncertain state is stored (parent-side? target-side? both?)
- What data structure maintains the uncertain entries
- How cleanup confirmation is verified later
- Whether this affects only dispatch or also future registration attempts for the same hashid
Adding these details will help implementers build the correct state-tracking infrastructure and avoid gaps in the cleanup protocol.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/callable-identity-registration.md` around lines 397 - 406, The spec's
cleanup uncertainty handling is underspecified: add explicit parent-side storage
and protocols so the parent records uncertain target/hashid pairs in a durable
in-memory map named e.g. uncertainCleanupMap keyed by (target, hashid) with
values {timestamp, attempts, pendingConfirmations}; define that targets also
keep a local pendingCleanup set to allow idempotent re-confirmations; specify
that cleanup confirmation is verified via an explicit ack message
("cleanupConfirmed") from targets or by timeout+retry logic using fields in
uncertainCleanupMap, and that entries are cleared on ack or after worker
restart; finally clarify that an uncertain pair must be excluded from dispatch
and from accepting new registrations for that hashid on the parent (but targets
may retry cleanup idempotently), and document these behaviors in the "parent
removes the unpublished handle entry" and "marked uncertain" sections so
implementers know which component maintains the state and how confirmation is
processed.
|
|
||
| ```text | ||
| MAILBOX_OFF_CALLABLE: | ||
| reserved uint64 = 0 | ||
|
|
||
| MAILBOX_OFF_ARGS: | ||
| callable_hash_digest: uint8[32] | ||
| task_args_blob: bytes | ||
| ``` | ||
|
|
||
| The local mailbox task frame is fixed to `sha256` and carries only the raw | ||
| 32-byte digest. It does not carry `target_namespace`; the receiving child | ||
| mailbox determines the target namespace and resolves the digest in its own | ||
| `identity_table`. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Document the breaking mailbox layout change and migration strategy.
Lines 483-489 introduce a breaking change to the unified mailbox layout: MAILBOX_OFF_CALLABLE is now reserved (set to 0), and the callable identity moves to a 32-byte callable_hash_digest prefix at MAILBOX_OFF_ARGS, shifting the existing task_args_blob by 32 bytes.
This is incompatible with the current layout shown in the code (context snippet 1, line 110: _OFF_CALLABLE = 8 carries the callable id). The specification should explicitly:
- Note this as a breaking wire format change
- Confirm that this requires coordinated updates across parent and child processes
- Clarify whether there's a migration path or if this is a clean-break deployment
- Verify that
MAILBOX_ARGS_CAPACITYwill be reduced by 32 bytes to account for the digest prefix
This is especially important since the comment in context snippet 1 notes that the mailbox layout "must match worker_manager.h MAILBOX_OFF_*" and requires careful alignment.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/callable-identity-registration.md` around lines 481 - 494, Document that
changing MAILBOX_OFF_CALLABLE to reserved and moving the callable identity to a
32-byte callable_hash_digest at MAILBOX_OFF_ARGS is a breaking wire-format
change: update the spec to state this is incompatible with the prior layout
(where _OFF_CALLABLE carried the callable id), require coordinated deployments
of parent and child processes, and explicitly state whether a migration path
exists or this is a clean-break; also update MAILBOX_ARGS_CAPACITY to subtract
32 bytes to account for the digest prefix and amend the comment that mailbox
layout "must match worker_manager.h MAILBOX_OFF_*" so it specifies the new
offsets and alignment expectations for both sides.
- Remove int equality and hashing from CallableHandle - Keep raw slot unregister on the internal control path only - Update white-box tests to inspect identity registry slots explicitly
- Keep Worker.prepare_callable and Worker.unregister_callable as the public API - Remove public Worker.register and Worker.unregister usage from code, docs, examples, and tests - Update prepared-callable tests to use opaque CallableHandle dispatch
- Update dynamic_register ST to exercise hashid duplicate-handle semantics and distinct post-start identities. - Keep slot overflow and unregister reuse coverage aligned with handle-based preparation. - Fix SceneTestCase run indentation restored by the API-name cleanup.
…identity-registration-contract
…identity-registration-contract # Conflicts: # python/simpler/worker.py # tests/st/a2a3/host_build_graph/prepared_callable/test_prepared_callable.py # tests/ut/py/test_worker/test_host_worker.py
- Move SharedMemory buffer assertions before cleanup blocks so pyright narrows the memoryview type. - Make chip register white-box tests pass exact payload sizes and avoid platform-dependent trailing shm slices.
Summary
This PR implements hashid-based callable identity for local hierarchical
workers.
It replaces public hierarchical callable ids with opaque
CallableHandleobjects returned by
Worker.register. Handles carry a stable SHA-256 callabledigest, callable kind, and local target namespace. Parent-side task slots and
local mailbox task frames now carry the 32-byte callable digest instead of a
child-local integer slot.
The implementation covers:
CHIP_CALLABLEandPYTHON_SERIALIZED;CallableHandleownership/liveness validation;Orchestrator.submit_*and C++CallableIdentity;children;
flow, orchestrator, and worker-manager internals.
Notes
The low-level
ChipWorker.prepare_callable/run/unregister_callableintegerslot API is intentionally kept for direct L2 prepared-callable tests and
debugging. Those slots remain local to a
ChipWorkerand are not validhierarchical submit arguments.
The chip child TASK_READY lazy-prepare path is inherited from the previous
cid implementation. This PR changes the task frame from cid to digest, but it
does not remove that safety net: if the digest is already locally registered
and prewarm was missed, the child may prepare its private slot before running.
It does not fetch missing callable bytes from the parent.
Testing
git diff --checksource .venv/bin/activate && pytest tests/ut/py/test_callable_identity.py -q11 passed in 0.05sBad owner or permissions on /etc/crypto-policies/back-ends/openssh.config