Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export by indigo1973 · Pull Request #500 · hw-native-sys/simpler

indigo1973 · 2026-04-09T18:50:05Z

Replace the boolean enable_profiling flag with a 4-level perf_level (0=off, 1=AICore-only, 2=task+fanout, 3=full with AICPU phase records). The tensormap_and_ringbuffer runtime honors all four levels, while legacy host_build_graph / aicpu_build_graph paths continue to treat any non-zero value as a simple on/off and stay on their existing bool member (synchronized via a shared SFINAE helper in runtime_profiling_mode.h).

CLI and JSON are lifted to match:

--enable-profiling in run_example.py now takes an optional int (default 3 when flag given, 0 otherwise).
The swimlane JSON schema gains a new version=0 (level 1: AICore-only) that omits dispatch/finish/fanout fields, and swimlane_converter.py accepts it.
Phase buffer allocation, scheduler-phase recording and orchestrator summary writes in aicpu_executor.cpp are gated on perf_level>=3 so lower levels no longer pay the phase-profiling overhead; fanout/dispatch_timestamp collection is gated on perf_level>=2. Additionally:
CallConfig and WorkerPayload switch from bool to int; Python bindings accept both bool and int for backward compatibility (_normalize_perf_level in code_runner.py, getter/setter shim in task_interface.cpp).
PerformanceCollector skips phase-buffer shared-memory allocation and phase-thread management when perf_level < 3 (calc_perf_data_size path).
device_runner.cpp (onboard + sim): all enable_profiling guards replaced with perf_level > 0; set_perf_level() called before initialize().
Unit tests updated for int-based profiling values.

…ngbuffer swimlane export Replace the boolean `enable_profiling` flag with a 4-level `perf_level` (0=off, 1=AICore-only, 2=task+fanout, 3=full with AICPU phase records). The tensormap_and_ringbuffer runtime honors all four levels, while legacy host_build_graph / aicpu_build_graph paths continue to treat any non-zero value as a simple on/off and stay on their existing bool member (synchronized via a shared SFINAE helper in runtime_profiling_mode.h). CLI and JSON are lifted to match: - `--enable-profiling` in run_example.py now takes an optional int (default 3 when flag given, 0 otherwise). - The swimlane JSON schema gains a new version=0 (level 1: AICore-only) that omits dispatch/finish/fanout fields, and swimlane_converter.py accepts it. - Phase buffer allocation, scheduler-phase recording and orchestrator summary writes in aicpu_executor.cpp are gated on perf_level>=3 so lower levels no longer pay the phase-profiling overhead; fanout/dispatch_timestamp collection is gated on perf_level>=2. Additionally: - CallConfig and WorkerPayload switch from bool to int; Python bindings accept both bool and int for backward compatibility (_normalize_perf_level in code_runner.py, getter/setter shim in task_interface.cpp). - PerformanceCollector skips phase-buffer shared-memory allocation and phase-thread management when perf_level < 3 (calc_perf_data_size path). - device_runner.cpp (onboard + sim): all enable_profiling guards replaced with perf_level > 0; set_perf_level() called before initialize(). - Unit tests updated for int-based profiling values.

gemini-code-assist

Code Review

This pull request introduces a more granular performance profiling system by replacing the boolean 'enable_profiling' flag with an integer 'perf_level' across the codebase. This change allows for multiple profiling modes (0=off, 1=AICore-only, 2=task+fanout, 3=full) and includes updates to the runtime, device runner, and performance collector to handle these levels. Additionally, it updates the swimlane JSON export logic to support versioning based on the profiling level and improves robustness in the swimlane converter by handling optional fanout data. I have no feedback to provide as there are no review comments to evaluate.

gemini-code-assist bot reviewed Apr 9, 2026

View reviewed changes

indigo1973 added this to pto project Apr 10, 2026

indigo1973 removed this from pto project Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500

Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500
indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
indigo1973:prof_0409

indigo1973 commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

indigo1973 commented Apr 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant