Skip to content

fix(verifier): reject unwritten Out/InOut params, not just tensor-creating reassigns (#1525)#1607

Closed
lyfne123 wants to merge 3 commits into
mainfrom
issue-1525-unbound-out-diagnostic
Closed

fix(verifier): reject unwritten Out/InOut params, not just tensor-creating reassigns (#1525)#1607
lyfne123 wants to merge 3 commits into
mainfrom
issue-1525-unbound-out-diagnostic

Conversation

@lyfne123
Copy link
Copy Markdown
Collaborator

Summary

Fixes the real root cause of #1525: a bare reassignment of a pl.Out/pl.InOut parameter to a detached value (e.g. y = pl.matmul(...); return y) rebinds the local name and leaves the external output buffer unwritten, so the kernel silently produces its init value (all-zero). This is a front-end output-binding footgun — not a device/pto-isa bug (the earlier device-side diagnosis was walked back; pto-isa#149 was closed as not-reproduced).

What changed

Generalize the OutParamNotShadowed verifier (src/ir/verifier/verify_out_param_pass.cpp) from a tensor-creating-op blacklist (tensor.create/tensor.full only) to the precise invariant:

A call-valued reassignment of an Out/InOut param is invalid unless the call threads the param through its arguments.

  • ✅ Threading reassigns stay valid: out = pl.assemble(out, tile, off), out = pl.store(value, off, out), indexed write out[...] = value.
  • ❌ Detached reassigns now fail compilation with a clear diagnostic: out = pl.matmul(...), out = tensor.create(...).

Make the check actually fire: add OutParamNotShadowed to GetVerifiedProperties() (src/ir/transforms/ir_property.cpp). It was already in the structural set but absent from the verified set, so the inline pipeline-input check skipped it and the footgun compiled silently. It now runs once at pipeline input (pre-SSA) — where the bare reassignment's AssignStmt.var_ still targets the param Var (post-SSA it detaches into a dead value).

Why not a broader "no = to parameters" rule

Refuted empirically: an AST scan of examples/+tests/ found 889 bare-name = assignments to parameters, 741 of which are the blessed accumulator idiom param = pl.store(t, off, param) / pl.assemble. The distinguishing factor is whether the param is threaded through the rhs, not whether = is used.

Validation

Closes #1525.

…ating reassigns (#1525)

A bare reassignment of a `pl.Out`/`pl.InOut` parameter to a detached value
(e.g. `y = pl.matmul(...)`) rebinds the name and leaves the external buffer
unwritten, so the kernel silently produces its init value (all-zero). This was
the real root cause of #1525 — not a device/pto-isa bug.

Generalize the `OutParamNotShadowed` verifier from a tensor-creating-op
blacklist (`tensor.create`/`tensor.full`) to the precise invariant: a call-valued
reassignment of an Out/InOut param is invalid unless the call threads the param
through its arguments. Threading reassigns (`out = pl.assemble(out, ...)`,
`out = pl.store(..., out)`) stay valid; detached ones (`out = pl.matmul(...)`)
now fail compilation with a clear diagnostic pointing at the idiomatic fix.

Make the check actually fire by adding `OutParamNotShadowed` to
`GetVerifiedProperties()`. It was already structural but absent from the
verified set, so the inline pipeline-input check skipped it and the footgun
compiled silently. It now runs once at pipeline input (pre-SSA, where the bare
reassignment's AssignStmt still targets the param Var).

- Regression tests for the matmul and InOut cases.
- Update the error_handling example and the #1525 device test to the new
  behavior / the supported idiom.
- Update verifier docs (en + zh).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Review Change Stack

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13b13f39-cb13-4a38-806d-bf5e239fc5f7

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR refactors the OutParamNotShadowed verifier to detect detached value reassignments through call-threading analysis rather than checking specific tensor-creating operations. Documentation, property registration, examples, and comprehensive test coverage are updated accordingly.

Changes

OutParamNotShadowed Verifier Refactor

Layer / File(s) Summary
Documentation of the OutParamNotShadowed rule
docs/en/dev/passes/99-verifier.md, docs/zh-cn/dev/passes/99-verifier.md
Clarified that out/inout parameters must not be reassigned to detached values; threading-style reassignments are allowed, referencing issue #1525.
Core verification logic refactor
src/ir/verifier/verify_out_param_pass.cpp
Replaced tensor-creating-op checking with call-threading detection: new CallThreadsParam helper uses VarDefUseCollector to determine if a call expression arguments reference the parameter, updated core condition and error diagnostics.
Property declaration and verifier registry
include/pypto/ir/transforms/ir_property.h, src/ir/transforms/ir_property.cpp
Updated enum documentation and registered OutParamNotShadowed in GetVerifiedProperties() with comments on verification timing and failure behavior.
Error handling example update
examples/utils/error_handling.py
Revised module docstring and changed exception handling from PartialCodegenError to generic Exception, inspecting message for OutParamNotShadowed or issue #1525 markers.
Unit and regression test coverage
tests/ut/ir/transforms/test_verifier.py, tests/st/runtime/ops/test_issue1525_chained_matmul.py
Added unit tests for Out and InOut parameter detached reassignments, and regression test demonstrating correct pl.assemble usage in fused chained-matmul program.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A verifier grows more wise today,
Threading calls in just the right way!
Out params safe from detached delight,
pl.assemble writes the buffer right! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: fixing the OutParamNotShadowed verifier to reject unwritten Out/InOut parameters (detached reassignments) rather than just tensor-creating operations.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about the root cause of issue #1525, the specific changes to the verifier logic, and validation across tests and documentation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the OutParamNotShadowed verifier to detect when Out or InOut parameters are reassigned to a detached value (i.e., a call that does not thread the parameter, such as out = pl.matmul(...)), which previously caused the external buffer to remain unwritten and silently produce initial values (issue #1525). The verifier now checks if the parameter is referenced anywhere in the argument subtree of the reassignment call. Documentation, examples, and tests have been updated accordingly. Feedback on the changes suggests generalizing the threading check from CallPtr to ExprPtr to prevent potential loopholes where assigning to an intermediate variable first could bypass the verifier.

Comment on lines 49 to 57
/// True when `call` references `param` anywhere in its argument subtree — i.e.
/// the reassignment threads the Out/InOut parameter through the call, keeping
/// the external buffer connected (e.g. `out = tensor.assemble(out, tile, off)`,
/// `out = tile.store(value, idx, out)`).
bool CallThreadsParam(const CallPtr& call, const Var* param) {
var_collectors::VarDefUseCollector uses;
uses.VisitExpr(call);
return uses.var_uses.count(param) > 0;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To prevent a loophole where assigning to an intermediate variable first bypasses the verifier (e.g., temp = pl.matmul(a, b); out = temp), we should generalize the threading check from Call to any Expr. This helper function can be generalized to accept ExprPtr instead of CallPtr.

Suggested change
/// True when `call` references `param` anywhere in its argument subtree i.e.
/// the reassignment threads the Out/InOut parameter through the call, keeping
/// the external buffer connected (e.g. `out = tensor.assemble(out, tile, off)`,
/// `out = tile.store(value, idx, out)`).
bool CallThreadsParam(const CallPtr& call, const Var* param) {
var_collectors::VarDefUseCollector uses;
uses.VisitExpr(call);
return uses.var_uses.count(param) > 0;
}
/// True when expr references param anywhere in its subtree - i.e.
/// the reassignment threads the Out/InOut parameter through the expression, keeping
/// the external buffer connected (e.g. out = tensor.assemble(out, tile, off),
/// out = tile.store(value, idx, out)).
bool ExprThreadsParam(const ExprPtr& expr, const Var* param) {
var_collectors::VarDefUseCollector uses;
uses.VisitExpr(expr);
return uses.var_uses.count(param) > 0;
}

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@examples/utils/error_handling.py`:
- Around line 43-48: The current except block treats any Exception as a
successful compile-time rejection; change it so only exceptions whose message
contains "OutParamNotShadowed" or "issue `#1525`" are treated as OK, and for any
other exception re-raise the exception (raise) or exit non-zero (sys.exit(1)) so
unrelated import/runtime errors fail the demo; update the except block that
binds Exception as e (and uses msg = str(e)) to perform this conditional and
either print the OK message for the expected verifier diagnostic or
propagate/exit on all other errors.

In `@src/ir/verifier/verify_out_param_pass.cpp`:
- Around line 49-56: CallThreadsParam currently treats any read of the Out/InOut
parameter as a write-through because it uses
var_collectors::VarDefUseCollector::VisitExpr and checks uses.var_uses; change
it to detect real write-through semantics: walk the Call AST (and nested Calls)
and only return true if the param is used in an argument position that the call
propagates to its result (e.g., the explicit output buffer arg such as the
“out”/destination arg or known in-place positions) or if the callee is a known
write-through primitive; update CallThreadsParam to inspect Call nodes (their op
and arg indices) instead of any var use, and apply the same logic to the
analogous predicate at lines 95-97 so only true write-through patterns (e.g.,
tensor.assemble/out-store/inplace primitives) count as threading the param.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 15b7cf4b-d96a-4044-a68f-7badccc35a49

📥 Commits

Reviewing files that changed from the base of the PR and between 203047d and a967c33.

📒 Files selected for processing (8)
  • docs/en/dev/passes/99-verifier.md
  • docs/zh-cn/dev/passes/99-verifier.md
  • examples/utils/error_handling.py
  • include/pypto/ir/transforms/ir_property.h
  • src/ir/transforms/ir_property.cpp
  • src/ir/verifier/verify_out_param_pass.cpp
  • tests/st/runtime/ops/test_issue1525_chained_matmul.py
  • tests/ut/ir/transforms/test_verifier.py

Comment thread examples/utils/error_handling.py Outdated
Comment on lines +43 to +48
except Exception as e: # noqa: BLE001 -- demo: any compile-time rejection is the success path
msg = str(e)
if "OutParamNotShadowed" in msg or "issue #1525" in msg:
print("OK -- rejected at compile time by the OutParamNotShadowed verifier")
else:
print(f"OK -- rejected at compile time: {type(e).__name__}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unexpected exceptions should still fail this demo.

This now treats any exception as success. If the script hits an unrelated import/runtime/config error, it will still print OK and exit 0, so the example no longer proves that OutParamNotShadowed was the rejection path. Please re-raise or exit non-zero when the exception does not match the expected verifier diagnostic.

Suggested fix
     except Exception as e:  # noqa: BLE001 -- demo: any compile-time rejection is the success path
         msg = str(e)
         if "OutParamNotShadowed" in msg or "issue `#1525`" in msg:
             print("OK -- rejected at compile time by the OutParamNotShadowed verifier")
         else:
-            print(f"OK -- rejected at compile time: {type(e).__name__}")
+            print(f"ERROR: unexpected exception {type(e).__name__}: {e}")
+            sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/utils/error_handling.py` around lines 43 - 48, The current except
block treats any Exception as a successful compile-time rejection; change it so
only exceptions whose message contains "OutParamNotShadowed" or "issue `#1525`"
are treated as OK, and for any other exception re-raise the exception (raise) or
exit non-zero (sys.exit(1)) so unrelated import/runtime errors fail the demo;
update the except block that binds Exception as e (and uses msg = str(e)) to
perform this conditional and either print the OK message for the expected
verifier diagnostic or propagate/exit on all other errors.

Comment on lines +49 to +56
/// True when `call` references `param` anywhere in its argument subtree — i.e.
/// the reassignment threads the Out/InOut parameter through the call, keeping
/// the external buffer connected (e.g. `out = tensor.assemble(out, tile, off)`,
/// `out = tile.store(value, idx, out)`).
bool CallThreadsParam(const CallPtr& call, const Var* param) {
var_collectors::VarDefUseCollector uses;
uses.VisitExpr(call);
return uses.var_uses.count(param) > 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't treat every read of the parameter as a write-through.

CallThreadsParam() only checks whether the Out/InOut param appears anywhere under the RHS call, so out = tensor.add(out, x) / acc = tensor.mul(acc, w) will now pass even though those ops still produce detached results and leave the external buffer unwritten. That reopens the silent-init-value bug for any pure op that merely reads the accumulator. This predicate needs to recognize actual write-through calls, not generic parameter usage.

Also applies to: 95-97

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/ir/verifier/verify_out_param_pass.cpp` around lines 49 - 56,
CallThreadsParam currently treats any read of the Out/InOut parameter as a
write-through because it uses var_collectors::VarDefUseCollector::VisitExpr and
checks uses.var_uses; change it to detect real write-through semantics: walk the
Call AST (and nested Calls) and only return true if the param is used in an
argument position that the call propagates to its result (e.g., the explicit
output buffer arg such as the “out”/destination arg or known in-place positions)
or if the callee is a known write-through primitive; update CallThreadsParam to
inspect Call nodes (their op and arg indices) instead of any var use, and apply
the same logic to the analogous predicate at lines 95-97 so only true
write-through patterns (e.g., tensor.assemble/out-store/inplace primitives)
count as threading the param.

lyfne123 added 2 commits June 1, 2026 08:52
…harden example (#1525)

Address review feedback on the OutParamNotShadowed generalization.

The previous check treated the Out/InOut param appearing *anywhere* in the
reassigning call's args as "threading". That misses a real footgun: a call that
only reads the param still returns a fresh, detached value, e.g.
`out = pl.add(out, x)` — the external buffer is never written (silent all-zero),
same class as #1525.

Refine CallThreadsParam to mirror the orchestration codegen's TraceReturnedToParam,
so the verifier flags exactly the reassignments codegen would fail to bind:
- builtin output-side ops alias only their target arg (tensor.assemble[0],
  tile.store[2], tensor.set_validshape[0]);
- any other builtin op (tensor.add, tensor.matmul, ...) produces a fresh value
  and never threads the param, even when it appears as an input operand;
- user-function calls may write the param through an Out/InOut argument, so the
  param appearing in the args still counts as threading (keeps
  `out = self.kernel(a, out)` valid).

Verified against the full unit suite: zero false positives (the legit
`param = pl.store(...)` / `pl.assemble(...)` accumulator idioms and user-call
threading all still pass), and `out = pl.add(out, x)` is now caught.

Also harden the error_handling example to re-raise unexpected exceptions instead
of treating any failure as the expected rejection, and update verifier docs.
@lyfne123
Copy link
Copy Markdown
Collaborator Author

lyfne123 commented Jun 1, 2026

Thanks @coderabbitai and @gemini-code-assist — addressed in 18264aa.

coderabbit (write-through vs any read): valid and now fixed. I confirmed on a board-equivalent codegen dump that out = pl.add(out, x) (out: pl.Out) produces add_input(ext_out) + a detached add_output(ret0__out) — i.e. the external buffer is never written, the same silent-zero footgun as #1525. CallThreadsParam now mirrors the orchestration codegen's TraceReturnedToParam:

  • builtin output-side ops alias only their target arg (tensor.assemble→args[0], tile.store→args[2], tensor.set_validshape→args[0]);
  • any other builtin op (tensor.add/tensor.matmul/…) produces a fresh value → flagged even when the param is an input operand;
  • user-function calls may write the param through an Out/InOut arg, so the param appearing in the args still counts as threading.

I did not apply a pure "write-target position" rule globally, because that would false-positive on the legitimate user-function threading idiom out = self.kernel(a, out) (84+ occurrences across examples/tests). Mirroring TraceReturnedToParam keeps the verifier consistent with what codegen actually binds. Validated against the full unit suite: zero false positives (the 182 param = pl.store(...) / 151 pl.assemble(...) accumulator reassigns and the user-call threading all still pass), and out = pl.add(out, x) is now caught. Added a regression test (test_out_param_inplace_read_op_detected).

gemini (CallPtr → ExprPtr / intermediate-variable loophole): acknowledged. A non-Call rebind such as tmp = pl.matmul(...); out = tmp is not yet flagged. Closing it correctly requires aliasing/lineage tracing through out = tmp (codegen does this via VarLineageCollector), and a naive "flag any out = <var>" would false-positive on legit chains like tmp = pl.assemble(out, …); out = tmp. I've left this as a follow-up rather than risk false positives here; the high-frequency footgun (out = <op>(...)) is covered.

coderabbit (example except too broad): hardened — error_handling.py now re-raises when the caught exception isn't the expected verifier diagnostic, so unrelated failures don't masquerade as success.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Thanks for the detailed update, @lyfne123. Your approach to the OutParamNotShadowed verifier is well-reasoned. Prioritizing the high-frequency footgun while avoiding false positives by mirroring TraceReturnedToParam is a solid engineering trade-off. Deferring the intermediate-variable case (tmp = pl.matmul(...); out = tmp) is also the right call here, as it requires more robust lineage tracing to avoid regressing on legitimate chaining patterns. The updated regression tests and documentation provide excellent clarity for future maintainers.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Tip

For best results, initiate chat on the files or code changes.

@lyfne123 Thanks for the thorough write-up! The implementation rationale is clear:

  • Write-through fix: The CallThreadsParamTraceReturnedToParam mirroring is the right design — keeping the verifier semantically consistent with what codegen actually binds avoids a whole class of false positives. The per-category rules (output-side builtins by target-arg position, other builtins always flagged, user-calls allowed if param appears in args) are sensible and well-validated by the zero-false-positive suite run.

  • Intermediate-variable loophole: The deferral reasoning is sound — a naive out = <var> flag would hit the tmp = pl.assemble(out, …); out = tmp idiom and other legit lineage chains. The high-frequency footgun is covered. Would you like me to open a follow-up issue to track the tmp = pl.matmul(...); out = tmp case so it doesn't get lost?

  • error_handling.py hardening: Re-raising on unexpected exceptions is the correct fix — masking unrelated failures as success defeats the purpose of the example.

🐇 ✨


✏️ Learnings added
Learnt from: lyfne123
URL: https://github.com/hw-native-sys/pypto/pull/1607

Timestamp: 2026-06-01T01:14:13.387Z
Learning: In the pypto codebase (src/ir/verifier/verify_out_param_pass.cpp), `CallThreadsParam` is designed to mirror the orchestration codegen's `TraceReturnedToParam`. The rules are: (1) builtin output-side ops alias only their designated target arg (tensor.assemble→args[0], tile.store→args[2], tensor.set_validshape→args[0]) and are allowed; (2) any other builtin op (tensor.add, tensor.matmul, etc.) produces a fresh/detached value and is flagged even when the param appears as an input operand; (3) user-function calls may write through an Out/InOut arg, so the param appearing in the args still counts as threading. A pure "write-target position" rule is intentionally not applied globally because of the legitimate `out = self.kernel(a, out)` user-function threading idiom (84+ occurrences across examples/tests).

Learnt from: lyfne123
URL: https://github.com/hw-native-sys/pypto/pull/1607

Timestamp: 2026-06-01T01:14:13.387Z
Learning: In pypto PR `#1607` (OutParamNotShadowed verifier), the intermediate-variable loophole — where `tmp = pl.matmul(...); out = tmp` escapes the verifier — is a known limitation intentionally deferred. A naive "flag any `out = <var>`" would false-positive on `tmp = pl.assemble(out, …); out = tmp` and other valid lineage chains. Closing it correctly requires aliasing/lineage tracing analogous to `VarLineageCollector` used in codegen.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@lyfne123 lyfne123 closed this Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

1 participant