Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500
Open
indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
Open
Refactor: introduce tiered profiling levels for a2a3 tensormap_and_ringbuffer swimlane export#500indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
indigo1973 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
…ngbuffer swimlane export Replace the boolean `enable_profiling` flag with a 4-level `perf_level` (0=off, 1=AICore-only, 2=task+fanout, 3=full with AICPU phase records). The tensormap_and_ringbuffer runtime honors all four levels, while legacy host_build_graph / aicpu_build_graph paths continue to treat any non-zero value as a simple on/off and stay on their existing bool member (synchronized via a shared SFINAE helper in runtime_profiling_mode.h). CLI and JSON are lifted to match: - `--enable-profiling` in run_example.py now takes an optional int (default 3 when flag given, 0 otherwise). - The swimlane JSON schema gains a new version=0 (level 1: AICore-only) that omits dispatch/finish/fanout fields, and swimlane_converter.py accepts it. - Phase buffer allocation, scheduler-phase recording and orchestrator summary writes in aicpu_executor.cpp are gated on perf_level>=3 so lower levels no longer pay the phase-profiling overhead; fanout/dispatch_timestamp collection is gated on perf_level>=2. Additionally: - CallConfig and WorkerPayload switch from bool to int; Python bindings accept both bool and int for backward compatibility (_normalize_perf_level in code_runner.py, getter/setter shim in task_interface.cpp). - PerformanceCollector skips phase-buffer shared-memory allocation and phase-thread management when perf_level < 3 (calc_perf_data_size path). - device_runner.cpp (onboard + sim): all enable_profiling guards replaced with perf_level > 0; set_perf_level() called before initialize(). - Unit tests updated for int-based profiling values.
There was a problem hiding this comment.
Code Review
This pull request introduces a more granular performance profiling system by replacing the boolean 'enable_profiling' flag with an integer 'perf_level' across the codebase. This change allows for multiple profiling modes (0=off, 1=AICore-only, 2=task+fanout, 3=full) and includes updates to the runtime, device runner, and performance collector to handle these levels. Additionally, it updates the swimlane JSON export logic to support versioning based on the profiling level and improves robustness in the swimlane converter by handling optional fanout data. I have no feedback to provide as there are no review comments to evaluate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the boolean
enable_profilingflag with a 4-levelperf_level(0=off, 1=AICore-only, 2=task+fanout, 3=full with AICPU phase records). The tensormap_and_ringbuffer runtime honors all four levels, while legacy host_build_graph / aicpu_build_graph paths continue to treat any non-zero value as a simple on/off and stay on their existing bool member (synchronized via a shared SFINAE helper in runtime_profiling_mode.h).CLI and JSON are lifted to match:
--enable-profilingin run_example.py now takes an optional int (default 3 when flag given, 0 otherwise).