Skip to content

[Feature] Simplify selective tensor dump API: drop enable_dump_tensor_selective() and add per-task dump-all #903

@ChaoZheng109

Description

@ChaoZheng109

Follow-up to #838 / #844.

Summary

#844 added selective tensor dump (enable_dump_tensor_selective() + Arg::dump(...)), resolving #838. Two usability gaps remain:

  1. No "dump all tensors of this task" shortcut. In selective mode, dumping a whole task means enumerating every tensor argument by hand — args.dump(x, y, z). Arg::dump(...) even static_asserts on sizeof...(Args) >= 1, so there is no terse "all args of this task" form.
  2. enable_dump_tensor_selective() is a redundant mode toggle. Selective mode can be inferred from whether any Arg::dump(...) marker was placed: if at least one task marks tensors → selective; if none → full dump. The explicit enable call is an extra step users must remember, and forgetting it silently falls back to full dump even when Arg::dump(...) markers are present (current documented behavior).

Motivation / Use Case

The current flow forces two decisions on the user where one suffices:

enable_dump_tensor_selective();   // (1) remember to flip the mode
...
args.dump(x, y, z);               // (2) then enumerate every tensor by hand
  • Forgetting (1) makes every args.dump(...) a silent no-op — the run dumps everything, which is exactly what selective mode was meant to avoid.
  • For "dump this entire task, nothing else", the user must list all tensor args, which is verbose and drifts out of sync as the task signature changes.

Removing the toggle and adding a dump-all shortcut reduces the API to a single intuitive call site and removes a silent-fallback footgun.

Proposed API / Behavior

Infer selective mode from markers — remove enable_dump_tensor_selective():

  • If any Arg::dump(...) marker is present in the orchestration, AICPU collection runs in selective mode (only marked tasks / args dumped).
  • If no Arg::dump(...) marker is present anywhere, behavior is the legacy full dump (every task, every tensor) — unchanged default.
  • --dump-tensor remains the top-level host enable switch; nothing here changes that.

Add a per-task dump-all shortcut:

Arg args;
args.add_input(x);
args.add_input(y);
args.add_output(z);
args.dump();        // dump-all: mark every tensor arg on this Arg
rt_submit_aiv_task(FUNC_ADD, args);

i.e. relax dump() so a no-argument call (or an explicit dump_all()) marks all tensor args currently on the Arg, instead of static_assert-ing on ≥1 argument.

Alternatives Considered

  • Keep enable_dump_tensor_selective() — current state; redundant call and silent-fallback footgun remain.
  • A separate "dump whole task by id" host-side selector — heavier, and duplicates the per-Arg marker mechanism feat: support selective tensor dump by tensor argument #844 already established on the device side.

Additional Context

  • Baseline: feat: support selective tensor dump by tensor argument #844 (feat: support selective tensor dump by tensor argument), Fixes [Feature] Support partial task selection for tensor dump #838.
  • Current API surface:
    • enable_dump_tensor_selective()src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/orchestration/pto_orchestration_api.h:141
    • Arg::dump(...) (static_assert sizeof...(Args) >= 1) — src/{a2a3,a5}/runtime/tensormap_and_ringbuffer/runtime/pto_types.h:206
    • Documented behavior — docs/dfx/tensor-dump.md §3.2 (the doc notes that without enable_dump_tensor_selective(), dump(...) markers are ignored — the silent fallback this issue removes).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions