-
Notifications
You must be signed in to change notification settings - Fork 421
vegeta intrinsics and sparse sgemm implemented #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a comprehensive regression test for Vegeta tile load-store operations, testing T, U, V, and M register types. The test validates that data can be correctly loaded from memory into tile registers and stored back, verifying the functionality of the new VEG extension's load/store unit (LSU) operations. The PR also includes necessary simulator updates to handle the new LSU operations and adds performance counter support for tracking VEGETA scoreboard statistics.
- Updates tile intrinsic function signatures to use
size_tfor pointer parameters - Fixes M-register load implementation to pack two uint4 values per byte (reducing memory footprint from 512B to 256B)
- Adds scoreboard performance tracking for VEGETA functional unit
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/regression/veg_ls/main.cpp | Host-side test driver that allocates buffers, uploads test data, executes kernel, and verifies results for T, U, V, and M tile registers |
| tests/regression/veg_ls/kernel.cpp | Kernel implementing tile load/store operations for all register types with memory barriers between test phases |
| tests/regression/veg_ls/common.h | Defines tile sizes (T=2KB, U=4KB, V=8KB, M=256B) and kernel argument structure for passing memory addresses |
| tests/regression/veg_ls/Makefile | Build configuration integrating the new test into the regression suite |
| tests/regression/Makefile | Adds veg_ls test to all, run-simx, run-rtlsim, and clean targets |
| sim/simx/sparse_unit.cpp | Updates tile operation handling to support VegetaLsuType operations with proper delay cycles, fixes M-register packing format |
| sim/simx/emulator.cpp | Adds CSR read support for VEGETA scoreboard performance counter |
| sim/simx/core.h | Adds scrb_vegeta field to PerfStats structure for tracking VEGETA unit activity |
| sim/simx/core.cpp | Increments scrb_vegeta counter when VEGETA functional unit causes scoreboard stalls |
| kernel/include/vx_intrinsics.h | Changes parameter types from int to size_t for pointer/address parameters, updates store instruction assembly format |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // TILE LOAD T: Load 1KB from ptr[TILE] to tile register index 'dst_treg' | ||
| // Each load uses I-type encoding: rd=dst tile index, rs1=src_gpr, imm=ptr immediate |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "Load 1KB" but T-regs are 2KB (2048 bytes) as defined in common.h. Update the comment to reflect the correct size: "Load 2KB from ptr[TILE] to tile register index 'dst_treg'".
| :: "i"(RISCV_CUSTOM1), "i"(dst_ureg), "r"(src_gpr), "i"(ptr_imm) : "memory"); | ||
| } | ||
|
|
||
| // TILE LOAD V: Load 1KB from ptr[TILE] to vreg index 'dst_vreg' |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "Load 1KB" but V-regs are 8KB (8192 bytes) as defined in common.h. Update the comment to reflect the correct size: "Load 8KB from ptr[TILE] to vreg index 'dst_vreg'".
| // TILE LOAD V: Load 1KB from ptr[TILE] to vreg index 'dst_vreg' | |
| // TILE LOAD V: Load 8KB from ptr[TILE] to vreg index 'dst_vreg' |
| :: "i"(RISCV_CUSTOM1), "i"(dst_mreg), "r"(src_gpr), "i"(ptr_imm) : "memory"); | ||
| } | ||
|
|
||
| // TILE STORE T: Store 1KB from treg index 'src_treg' to ptr[TILE] |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "Store 1KB" but T-regs are 2KB (2048 bytes) as defined in common.h. Update the comment to reflect the correct size: "Store 2KB from treg index 'src_treg' to ptr[TILE]".
| // TILE STORE T: Store 1KB from treg index 'src_treg' to ptr[TILE] | |
| // TILE STORE T: Store 2KB from treg index 'src_treg' to ptr[TILE] |
|
|
||
|
|
||
|
|
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove unnecessary blank lines (lines 225-227) before the cleanup section.
| CSR_READ_64(VX_CSR_MPM_SCRB_TCU, core_perf.scrb_vpu); | ||
| #endif | ||
| #ifdef EXT_VEGETA_ENABLE | ||
| CSR_READ_64(VX_CSR_MPM_SCRB_TCU, core_perf.scrb_vegeta); |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using wrong CSR identifier. This line reads from VX_CSR_MPM_SCRB_TCU but stores to core_perf.scrb_vegeta. All three conditional blocks (lines 500, 503, 506) are reading from the same CSR, which will cause incorrect performance counter values. Change to use a dedicated VEGETA CSR identifier, such as VX_CSR_MPM_SCRB_VEGETA (if it exists), or verify the correct CSR address for VEGETA scoreboard statistics.
| CSR_READ_64(VX_CSR_MPM_SCRB_TCU, core_perf.scrb_vegeta); | |
| CSR_READ_64(VX_CSR_MPM_SCRB_VEGETA, core_perf.scrb_vegeta); |
| :: "i"(RISCV_CUSTOM1), "i"(dst_treg), "r"(src_gpr), "i"(ptr_imm) : "memory"); | ||
| } | ||
|
|
||
| // TILE LOAD U: Load 1KB from ptr[TILE] to ureg index 'dst_ureg' |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "Load 1KB" but U-regs are 4KB (4096 bytes) as defined in common.h. Update the comment to reflect the correct size: "Load 4KB from ptr[TILE] to ureg index 'dst_ureg'".
| // TILE LOAD U: Load 1KB from ptr[TILE] to ureg index 'dst_ureg' | |
| // TILE LOAD U: Load 4KB from ptr[TILE] to ureg index 'dst_ureg' |
| :: "i"(RISCV_CUSTOM1), "i"(dst_vreg), "r"(src_gpr), "i"(ptr_imm) : "memory"); | ||
| } | ||
|
|
||
| // TILE LOAD M: Load 1KB from ptr[TILE] to mreg index 'dst_mreg' |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "Load 1KB" but M-regs are 256 bytes as defined in common.h. Update the comment to reflect the correct size: "Load 256B from ptr[TILE] to mreg index 'dst_mreg'".
| // TILE LOAD M: Load 1KB from ptr[TILE] to mreg index 'dst_mreg' | |
| // TILE LOAD M: Load 256B from ptr[TILE] to mreg index 'dst_mreg' |
|
|
||
|
|
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove unnecessary blank lines (lines 95-96) before the device memory allocation section.
|
|
||
|
|
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove unnecessary blank line between VX_SRCS definition and the common.mk include.
| // Verify M tiles by comparing debug output | ||
| std::cout << "M tiles loaded successfully (verified by error-free execution)" << std::endl; |
Copilot
AI
Dec 2, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M tiles verification is incomplete. The comment claims verification via "error-free execution", but the kernel only loads M tiles without storing them back. This means h_dst_m contains uninitialized data, and the test doesn't actually verify M tile load/store functionality. Either add proper M tile verification or document why M tiles cannot be verified.
No description provided.