Skip to content

Add: hashid-based callable registration#891

Open
puddingfjz wants to merge 15 commits into
hw-native-sys:mainfrom
puddingfjz:feat/define-callable-identity-registration-contract
Open

Add: hashid-based callable registration#891
puddingfjz wants to merge 15 commits into
hw-native-sys:mainfrom
puddingfjz:feat/define-callable-identity-registration-contract

Conversation

@puddingfjz
Copy link
Copy Markdown
Contributor

@puddingfjz puddingfjz commented May 29, 2026

Summary

This PR implements hashid-based callable identity for local hierarchical
workers.

It replaces public hierarchical callable ids with opaque CallableHandle
objects returned by Worker.register. Handles carry a stable SHA-256 callable
digest, callable kind, and local target namespace. Parent-side task slots and
local mailbox task frames now carry the 32-byte callable digest instead of a
child-local integer slot.

The implementation covers:

  • canonical descriptor/hash helpers for CHIP_CALLABLE and
    PYTHON_SERIALIZED;
  • CallableHandle ownership/liveness validation;
  • digest-based Python Orchestrator.submit_* and C++ CallableIdentity;
  • target-local digest-to-slot tables for chip, SUB, and next-level Worker
    children;
  • digest-based register/unregister control messages;
  • duplicate registration refcounts and partial-register cleanup handling;
  • documentation updates for callable identity, Python serialization, task
    flow, orchestrator, and worker-manager internals.

Notes

The low-level ChipWorker.prepare_callable/run/unregister_callable integer
slot API is intentionally kept for direct L2 prepared-callable tests and
debugging. Those slots remain local to a ChipWorker and are not valid
hierarchical submit arguments.

The chip child TASK_READY lazy-prepare path is inherited from the previous
cid implementation. This PR changes the task frame from cid to digest, but it
does not remove that safety net: if the digest is already locally registered
and prewarm was missed, the child may prepare its private slot before running.
It does not fetch missing callable bytes from the parent.

Testing

  • git diff --check
  • source .venv/bin/activate && pytest tests/ut/py/test_callable_identity.py -q
    • 11 passed in 0.05s
    • emitted a non-fatal pto-isa update warning:
      Bad owner or permissions on /etc/crypto-policies/back-ends/openssh.config

- Document local handle namespaces and resolver ownership for callable registration.

- Specify descriptor hashes and map Python serializer ids to the existing SPYC header.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c5e19d2-75c8-47ac-81f7-a7b92a83b2a9

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a comprehensive specification document for callable identity registration in local hierarchical worker systems. It defines content-derived stable identities (hashid), canonical descriptor hashing rules, target namespaces, runtime registration and dispatch contracts, state management, and an implementation roadmap with validation requirements.

Changes

Callable Identity Registration Specification

Layer / File(s) Summary
Foundation and target namespace concepts
docs/callable-identity-registration.md
Introduces stable hashid derived from content, separated from target-local execution slots via target_namespace scoping (LOCAL_CHIP, LOCAL_PYTHON). Defines identity/slot separation and refcount invariants.
Descriptor encoding and canonical hashing
docs/callable-identity-registration.md
Specifies canonical descriptor encoding for computing stable hashid values using schema versioning, deterministic primitive encoding, CHIP_CALLABLE and PYTHON_SERIALIZED descriptor field definitions, signature-schema hashing, and explicit identity-excluding fields.
Runtime contracts for registration and dispatch
docs/callable-identity-registration.md
Defines Worker.register API returning opaque CallableHandle, whole-scope installation, target-side registry with state transitions, failure cleanup semantics, dispatch scheduling plus local resolution, error codes, mailbox task frames carrying hashid digest only, and unregister/refcount/tombstone sequencing.
Implementation plan and validation requirements
docs/callable-identity-registration.md
Provides ordered implementation milestones and validation checklist covering descriptor hashing, public API, target registry, submission/frame updates, local payload changes, and tests for hash stability, refcount behavior, whole-scope guarantees, and cleanup uncertainty.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A hash-derived identity is born,
Content-stable from registration's dawn,
Namespaces slot them, refcounts align,
Specs and milestones—the blueprint's fine! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: introduction of hashid-based callable registration, which is the core subject of the new specification document.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly relates to the changeset, detailing the implementation of hashid-based callable identity for local hierarchical workers, which matches the specification document added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new design document, docs/callable-identity-registration.md, which defines the callable identity registration contract for local hierarchical workers using a stable hashid instead of public integer execution slots. The review feedback highlights a few areas for improvement in the documentation: correcting a terminological inconsistency regarding whether local mailbox task frames carry the hashid or the raw 32-byte digest, explicitly documenting parent-side tracking of CallableHandle instances to prevent double-unregistration issues, and standardizing the naming of descriptor_schema_version throughout the document.

- `hashid` is valid across the parent and local child workers.
- `local_slot` is valid only inside the target namespace that allocated it.
- Public APIs and task records carry `CallableHandle`, not local slots.
- Local mailbox task frames carry `hashid`, not slots.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a terminological inconsistency regarding what the local mailbox task frames carry. Line 109 states they carry hashid (which is defined as a 71-character string sha256:\u003c64 lowercase hex characters\u003e), whereas line 491 states they carry the raw 32-byte digest. To maintain precise terminology, this should be updated to refer to the raw 32-byte digest.

Suggested change
- Local mailbox task frames carry `hashid`, not slots.
- Local mailbox task frames carry the raw 32-byte digest, not slots.

Comment on lines +294 to +298
The handle is an opaque parent-side registration object. Its `hashid` is the
stable callable identity used in task frames, but repeated registrations of
the same callable may return distinct handle objects with the same `hashid`.
Unregistering one handle must not invalidate another live handle that shares
the same `hashid`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure that unregistering one handle does not prematurely invalidate another live handle sharing the same hashid (e.g., if a user double-unregisters a handle), the parent-side Worker must track active CallableHandle instances. It is recommended to explicitly document this parent-side tracking requirement to prevent double-unregistration from incorrectly decrementing the target-local refcount.

References
  1. Ensure documentation and diagrams accurately reflect implementation details regarding resource lifecycles, especially when persistence is used to maintain internal state like caches.

Comment thread docs/callable-identity-registration.md Outdated
hashid -> {
local_slot,
callable_kind,
descriptor_version,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Inconsistent field naming: descriptor_version is used here and in line 458, but the canonical descriptor schema defined in lines 136 and 161 uses descriptor_schema_version. It is recommended to use descriptor_schema_version consistently throughout the document to avoid confusion.

Suggested change
descriptor_version,
descriptor_schema_version,

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
docs/callable-identity-registration.md (2)

159-175: ⚡ Quick win

Specify the type for signature_schema_hash in the schema.

The CHIP_CALLABLE schema lists signature_schema_hash without a type annotation, while line 208 clarifies it's "encoded as the raw 32-byte SHA-256 digest." For consistency with the encoding of callable_blob_sha256 and to improve clarity, the schema should explicitly specify the type as uint8[32] or similar notation.

📝 Suggested fix
 CHIP_CALLABLE:
   descriptor_schema_version
   callable_kind
   target_arch
   platform
   runtime
   callable_blob_sha256
-  signature_schema_hash
+  signature_schema_hash: uint8[32]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/callable-identity-registration.md` around lines 159 - 175, Add an
explicit type for signature_schema_hash in the CHIP_CALLABLE schema so it
matches the documented encoding; update the CHIP_CALLABLE section to declare
signature_schema_hash as a 32-byte unsigned array (e.g., uint8[32] or
equivalent) just like callable_blob_sha256, ensuring the schema and the
descriptive text remain consistent (refer to CHIP_CALLABLE and the
signature_schema_hash field).

570-574: ⚡ Quick win

Link the "Descriptor mismatch" test to the required error code.

Line 573 specifies a "Descriptor mismatch" test expecting rejection when "same hashid with different descriptor" is registered. For completeness, this test description should reference the HASHID_DESCRIPTOR_MISMATCH error code defined at line 439, ensuring that the validation confirms both the rejection behavior and the correct error code.

📝 Suggested fix
 | Test | Expected result |
 | ---- | --------------- |
 | Stable descriptor hash | Same descriptor bytes produce same hashid. |
-| Descriptor mismatch | Same hashid with different descriptor is rejected. |
+| Descriptor mismatch | Same hashid with different descriptor is rejected with `HASHID_DESCRIPTOR_MISMATCH`. |
 | No public slot | Public APIs never return child-local slots. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/callable-identity-registration.md` around lines 570 - 574, Update the
"Descriptor mismatch" test row in the table so it references the specific error
code constant HASHID_DESCRIPTOR_MISMATCH; change the Expected result text for
that row to indicate the registration is rejected with error
HASHID_DESCRIPTOR_MISMATCH (so the test asserts both rejection and the exact
error code), locating the row by its label "Descriptor mismatch" in the Tests
table.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/callable-identity-registration.md`:
- Around line 481-494: Document that changing MAILBOX_OFF_CALLABLE to reserved
and moving the callable identity to a 32-byte callable_hash_digest at
MAILBOX_OFF_ARGS is a breaking wire-format change: update the spec to state this
is incompatible with the prior layout (where _OFF_CALLABLE carried the callable
id), require coordinated deployments of parent and child processes, and
explicitly state whether a migration path exists or this is a clean-break; also
update MAILBOX_ARGS_CAPACITY to subtract 32 bytes to account for the digest
prefix and amend the comment that mailbox layout "must match worker_manager.h
MAILBOX_OFF_*" so it specifies the new offsets and alignment expectations for
both sides.
- Around line 397-406: The spec's cleanup uncertainty handling is
underspecified: add explicit parent-side storage and protocols so the parent
records uncertain target/hashid pairs in a durable in-memory map named e.g.
uncertainCleanupMap keyed by (target, hashid) with values {timestamp, attempts,
pendingConfirmations}; define that targets also keep a local pendingCleanup set
to allow idempotent re-confirmations; specify that cleanup confirmation is
verified via an explicit ack message ("cleanupConfirmed") from targets or by
timeout+retry logic using fields in uncertainCleanupMap, and that entries are
cleared on ack or after worker restart; finally clarify that an uncertain pair
must be excluded from dispatch and from accepting new registrations for that
hashid on the parent (but targets may retry cleanup idempotently), and document
these behaviors in the "parent removes the unpublished handle entry" and "marked
uncertain" sections so implementers know which component maintains the state and
how confirmation is processed.
- Around line 92-93: Clarify that the receiving child mailbox derives the target
namespace from the child worker type: map chip children to LOCAL_CHIP, Python
sub-workers to LOCAL_PYTHON, and any foreign/interop workers to their declared
namespace (e.g., the caller-provided namespace or a header on the frame); update
the sentence after "local task frames carry only the raw `sha256` digest" to
explicitly state this mapping and that the mailbox uses the child's
type/registration metadata to resolve the namespace before looking up the
digest.

---

Nitpick comments:
In `@docs/callable-identity-registration.md`:
- Around line 159-175: Add an explicit type for signature_schema_hash in the
CHIP_CALLABLE schema so it matches the documented encoding; update the
CHIP_CALLABLE section to declare signature_schema_hash as a 32-byte unsigned
array (e.g., uint8[32] or equivalent) just like callable_blob_sha256, ensuring
the schema and the descriptive text remain consistent (refer to CHIP_CALLABLE
and the signature_schema_hash field).
- Around line 570-574: Update the "Descriptor mismatch" test row in the table so
it references the specific error code constant HASHID_DESCRIPTOR_MISMATCH;
change the Expected result text for that row to indicate the registration is
rejected with error HASHID_DESCRIPTOR_MISMATCH (so the test asserts both
rejection and the exact error code), locating the row by its label "Descriptor
mismatch" in the Tests table.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: be2f85ca-46c9-4d68-8b2a-4d7c54ca8a6a

📥 Commits

Reviewing files that changed from the base of the PR and between 1de6a90 and 5376b5f.

📒 Files selected for processing (1)
  • docs/callable-identity-registration.md

Comment on lines +92 to +93
own namespace, so local task frames carry only the raw `sha256` digest.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Clarify how the receiving child determines the target namespace.

Lines 92-93 state that "local task frames carry only the raw sha256 digest," and line 493-494 confirms "the receiving child mailbox determines the target namespace." However, the mechanism for this determination is not explicitly specified. If the namespace is implicit from the child worker type (e.g., chip children always use LOCAL_CHIP, Python sub-workers always use LOCAL_PYTHON), this should be stated clearly to avoid ambiguity in implementation.

📝 Suggested clarification

Add after line 93:

 the same `hashid`.

+The target namespace is determined by the child worker's resolver type:
+chip child loops resolve against `LOCAL_CHIP`, Python sub/next-level
+child loops resolve against `LOCAL_PYTHON`. The mailbox protocol does
+not carry namespace because each child has exactly one active resolver
+domain.
+
 Each target namespace owns a private mapping:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/callable-identity-registration.md` around lines 92 - 93, Clarify that
the receiving child mailbox derives the target namespace from the child worker
type: map chip children to LOCAL_CHIP, Python sub-workers to LOCAL_PYTHON, and
any foreign/interop workers to their declared namespace (e.g., the
caller-provided namespace or a header on the frame); update the sentence after
"local task frames carry only the raw `sha256` digest" to explicitly state this
mapping and that the mailbox uses the child's type/registration metadata to
resolve the namespace before looking up the digest.

Comment on lines +397 to +406

- The parent does not return the handle.
- The parent removes the unpublished handle entry.
- The parent sends cleanup to targets that may have installed the hashid.
- If cleanup cannot be confirmed, that target/hashid pair is marked uncertain
and must not be used again until cleanup is confirmed or the worker restarts.

This is failure cleanup with conservative uncertainty handling. It stays on
the same synchronous install path and does not introduce a separate recovery
API.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Clarify the cleanup uncertainty tracking mechanism.

Lines 401-402 state that if cleanup cannot be confirmed, "that target/hashid pair is marked uncertain and must not be used again until cleanup is confirmed or the worker restarts." However, the specification does not detail:

  1. Where this uncertain state is stored (parent-side? target-side? both?)
  2. What data structure maintains the uncertain entries
  3. How cleanup confirmation is verified later
  4. Whether this affects only dispatch or also future registration attempts for the same hashid

Adding these details will help implementers build the correct state-tracking infrastructure and avoid gaps in the cleanup protocol.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/callable-identity-registration.md` around lines 397 - 406, The spec's
cleanup uncertainty handling is underspecified: add explicit parent-side storage
and protocols so the parent records uncertain target/hashid pairs in a durable
in-memory map named e.g. uncertainCleanupMap keyed by (target, hashid) with
values {timestamp, attempts, pendingConfirmations}; define that targets also
keep a local pendingCleanup set to allow idempotent re-confirmations; specify
that cleanup confirmation is verified via an explicit ack message
("cleanupConfirmed") from targets or by timeout+retry logic using fields in
uncertainCleanupMap, and that entries are cleared on ack or after worker
restart; finally clarify that an uncertain pair must be excluded from dispatch
and from accepting new registrations for that hashid on the parent (but targets
may retry cleanup idempotently), and document these behaviors in the "parent
removes the unpublished handle entry" and "marked uncertain" sections so
implementers know which component maintains the state and how confirmation is
processed.

Comment on lines +481 to +494

```text
MAILBOX_OFF_CALLABLE:
reserved uint64 = 0

MAILBOX_OFF_ARGS:
callable_hash_digest: uint8[32]
task_args_blob: bytes
```

The local mailbox task frame is fixed to `sha256` and carries only the raw
32-byte digest. It does not carry `target_namespace`; the receiving child
mailbox determines the target namespace and resolves the digest in its own
`identity_table`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Document the breaking mailbox layout change and migration strategy.

Lines 483-489 introduce a breaking change to the unified mailbox layout: MAILBOX_OFF_CALLABLE is now reserved (set to 0), and the callable identity moves to a 32-byte callable_hash_digest prefix at MAILBOX_OFF_ARGS, shifting the existing task_args_blob by 32 bytes.

This is incompatible with the current layout shown in the code (context snippet 1, line 110: _OFF_CALLABLE = 8 carries the callable id). The specification should explicitly:

  1. Note this as a breaking wire format change
  2. Confirm that this requires coordinated updates across parent and child processes
  3. Clarify whether there's a migration path or if this is a clean-break deployment
  4. Verify that MAILBOX_ARGS_CAPACITY will be reduced by 32 bytes to account for the digest prefix

This is especially important since the comment in context snippet 1 notes that the mailbox layout "must match worker_manager.h MAILBOX_OFF_*" and requires careful alignment.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/callable-identity-registration.md` around lines 481 - 494, Document that
changing MAILBOX_OFF_CALLABLE to reserved and moving the callable identity to a
32-byte callable_hash_digest at MAILBOX_OFF_ARGS is a breaking wire-format
change: update the spec to state this is incompatible with the prior layout
(where _OFF_CALLABLE carried the callable id), require coordinated deployments
of parent and child processes, and explicitly state whether a migration path
exists or this is a clean-break; also update MAILBOX_ARGS_CAPACITY to subtract
32 bytes to account for the digest prefix and amend the comment that mailbox
layout "must match worker_manager.h MAILBOX_OFF_*" so it specifies the new
offsets and alignment expectations for both sides.

puddingfjz added 14 commits May 29, 2026 16:37
- Remove int equality and hashing from CallableHandle
- Keep raw slot unregister on the internal control path only
- Update white-box tests to inspect identity registry slots explicitly
- Keep Worker.prepare_callable and Worker.unregister_callable as the public API

- Remove public Worker.register and Worker.unregister usage from code, docs, examples, and tests

- Update prepared-callable tests to use opaque CallableHandle dispatch
- Update dynamic_register ST to exercise hashid duplicate-handle semantics and distinct post-start identities.

- Keep slot overflow and unregister reuse coverage aligned with handle-based preparation.

- Fix SceneTestCase run indentation restored by the API-name cleanup.
…identity-registration-contract

# Conflicts:
#	python/simpler/worker.py
#	tests/st/a2a3/host_build_graph/prepared_callable/test_prepared_callable.py
#	tests/ut/py/test_worker/test_host_worker.py
- Move SharedMemory buffer assertions before cleanup blocks so pyright narrows the memoryview type.
- Make chip register white-box tests pass exact payload sizes and avoid platform-dependent trailing shm slices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant