Skip to content

Conversation

@diaconuccalin
Copy link
Contributor

@diaconuccalin diaconuccalin commented Oct 16, 2025

Key Changes:

  1. TinyViT Demo Support: Successfully enables minimal TinyViT model on tiled Siracusa platform (running on as low as 4kB L1!)
  • New ReduceMean and Slice tiling constraints
  • Added broadcasting handling and improved constraint logic for MatMul
  • Changed to tanh approximation in the PULP GELU kernel
  1. Input Tiling for regular and DW Conv2D: Feature enabling smaller L1 memory requirements
  • Implemented input tiling on X and Y axes (not just channel-wise)
  • Fixed padding computation using absolute offsets (now considering interior tiles for FP solution as well)
  • Cleaned up constraints and reduced code duplication
  • Added documentation explaining motivation for each constraint
  1. Bug Fixes: Addresses critical bugs that would cause runtime failures
  • Pointer Arithmetic Bugs - Byte offsets were being used with typed pointer arithmetic, causing 4x overflow for float32 types.
  • Buffer Name Parsing Bug - A check matched intermediate buffers containing "input" or "output" in their names, causing IndexError.
  1. Test Coverage: Adds 12+ new test configurations to CI
  • Added minimal TinyViT to all test matrices (single/double buffer, different memory levels)
  • Ohterwise, focus on updated Conv operators (regular & DW, with & without bias)
  • Reduced L1 limit of existent tests, enabled by new input conv tiling
  • Added skip connection test for tiled Siracusa

Added

  • Support for input and bias tiling for PULP FP regular and DW conv 2D.
  • PULP ReduceMean and Slice tile constraints.
  • Broadcast support for MatMul tiling constraints
  • CI tests for tiled Siracusa FP regular and DW conv 2D, with and without bias, for skip connections, and for the demo version of TinyViT.
  • Documentation for PULP FP regular and DW conv 2D and MatMul tile constraints.

Changed

  • Decreased L1 maximal memory limit for CI pipeline tests where compatible thanks to the implementation of Conv2D input tiling support.

Fixed

  • Fixed PULP FP32 regular and DW Conv2D, and MatMul tile constraints.
  • Fixed type casting for tiling code generation.
  • Fixed bug in buffer name identification in code generation for tests with L3 default memory level.
  • Fixed PULP GELU kernel to use tanh approximation.

PR Merge Checklist

  1. If the docker was modified, change back its link after review.
  2. The PR is rebased on the latest devel commit and pointing to devel.
  3. The CHANGELOG.md file has been updated.
  4. All checks are passing.
  5. Your PR reviewed and approved.

@diaconuccalin diaconuccalin self-assigned this Oct 16, 2025
@diaconuccalin diaconuccalin added Bug Something isn't working Feature Addition of new features labels Oct 16, 2025
@Xeratec Xeratec modified the milestones: Release 0.2.1, Release xxx Oct 23, 2025
@diaconuccalin diaconuccalin force-pushed the DemoTinyViT_Tiled_Siracusa branch 2 times, most recently from 95d30e5 to 19a04dd Compare November 4, 2025 14:53
…ffer size computation. Added fixes to remove warnings
…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa
…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings
… FP DW Conv2D tiling constraints. Fixed bugs in input tiling for PULP FP Conv2D. Added new tiled siracusa tests and reduced size of existent ones that now support input tiling for conv operations
… for new constraint set. Bugifx for input/output L3 buffer name check, where similar naming of internal nodes would confuse the dumper
@diaconuccalin diaconuccalin marked this pull request as ready for review November 18, 2025 23:17
@diaconuccalin diaconuccalin changed the title [DRAFT] Demo TinyViT compatibility with tiled Siracusa Demo TinyViT compatibility with tiled Siracusa Nov 18, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 18, 2025

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added tiling support for ReduceMean and Slice operations.
    • Expanded bias-aware convolution tiling capabilities.
  • Bug Fixes

    • Fixed pointer arithmetic for offset calculations.
    • Improved error context in DMA transfer code generation.
  • Changed

    • Enhanced shape handling for scalar tensor dimensions.
    • Optimized GELU computation with per-core parallelism.
    • Refined binding configurations for enhanced operation support.
  • Tests

    • Expanded test coverage with bias and depthwise convolution variants.
    • Added new model tests.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

This PR expands tiling support in the Deeploy framework for PULPOpen targets by introducing new tile constraint classes (ReduceMean, Slice), enhancing existing constraints with bias and grouping awareness, refactoring pointer arithmetic for byte-wise addressing, and updating bindings and platform configuration to use tiling-ready variants.

Changes

Cohort / File(s) Summary
Test Configuration
.github/workflows/ci-platform-siracusa-tiled.yml
Expanded and rebalanced test configurations across Siracusa tiled workflows; replaced single convolution entry with bias and depthwise variants; introduced TinyViT tests; updated multi-budget L1 configurations for mixed-size workloads.
Documentation
CHANGELOG.md
Added entries under Unreleased v0.2.1 documenting new features, tests, and bug fixes across Added, Changed, and Fixed sections.
Pointer Arithmetic Refactoring
Deeploy/DeeployTypes.py, Deeploy/TilingExtension/AsyncDma.py, Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py
Changed pointer arithmetic from element-wise addition to byte-wise addressing by casting pointers to char* before applying offsets, then re-casting to target type; updated code generation templates to use explicit typed pointer casts.
PULPOpen Platform & Bindings
Deeploy/Targets/PULPOpen/Platform.py, Deeploy/Targets/PULPOpen/Bindings.py, Deeploy/Targets/PULPOpen/Tiler.py
Narrowed public bindings imports; updated node mappers to use tiling-ready binding variants for ReduceMean, DWConv2D, and Slice; introduced new tiling-ready bindings; reworked PULPSliceBindings to use cross-product of float and integer types.
Convolution Tile Constraints
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py
Added bias-aware tiling support; extended geometrical and policy constraints to handle bias buffers; refactored computeInputCube with optional parameters for input dimensions and output absolute offsets; enhanced tiling serialization to track and apply bias in memory layout.
Depthwise Convolution Constraints
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py
Renamed original DWConv2DTileConstraint to RQDWConv2DTileConstraint; introduced new DWConv2DTileConstraint extending Conv2DTileConstraint with channel-equivalence constraints for input and output channel enforcement.
MatMul Tile Constraints
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
Refactored to extract per-tensor dimension variables for A, B, and output matrices; added batch-consistency constraints; enhanced serialization with broadcasting-aware batch handling and per-tensor input cube computation; added strict pairing in schedule construction.
New Tile Constraints
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py, Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
Introduced ReduceMeanTileConstraint and SliceTileConstraint classes with methods for geometrical constraints, symbolic representation, input cube computation from output cubes, and tiling serialization; support reduction and slicing semantics in memory layout planning.
Code Generation & Shape Handling
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py, Deeploy/TilingExtension/TilerExtension.py, Deeploy/TilingExtension/TilerModel.py
Added error-context wrapping for DMA transfer generation; updated shape handling to support both scalar (int) and sequence shapes in memory constraint resolution and tensor dimension registration.
Buffer Utilities
DeeployTest/testUtils/codeGenerate.py
Refined buffer name detection using regex pattern matching for exact input_N and output_N patterns instead of substring checks.
Target Libraries
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c, TargetLibraries/PULPOpen/src/GELU.c
Replaced log2 function call with LOG2 macro; refactored GELU computation with per-core parallelism and vectorized cdf calculation using single expression instead of intermediate variables.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Pointer arithmetic refactoring: Byte-wise casting changes across multiple files require careful verification of correctness and consistency.
  • New tile constraint implementations: ReduceMeanTileConstraint and SliceTileConstraint are entirely new classes with multiple public methods that need validation against existing tiling infrastructure.
  • Method signature changes: ConvTileConstraint.computeInputCube now accepts optional parameters and returns a tuple; impact on all callers must be verified.
  • MatMul batch dimension handling: Significant logic changes to support per-tensor dimensions and broadcasting; requires verification of GEMM correctness.
  • Binding configuration changes: PULPSliceBindings type-checking logic uses cross-product iteration; consistency with constraint logic must be confirmed.
  • Shape handling polymorphism: Multiple files now handle both int and sequence shapes; edge cases and consistency across call sites need attention.

Possibly related PRs

Suggested reviewers

  • Victor-Jung

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.52% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective of this PR: enabling TinyViT demo compatibility with the tiled Siracusa platform.
Description check ✅ Passed The description is directly related to the changeset, providing detailed context about TinyViT support, input tiling, bug fixes, and test coverage.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

76-95: Fix three critical issues with MatMul transpose handling and broadcasting

  1. Line 82-88 (addPolicyConstraint): Broadcasting issue with different tensor ranks
    BFirstDimVar calculation uses tensorsShapeLen = len(bufferA.shape), but this length is applied to bufferB. If bufferA and bufferB have different ranks due to broadcasting, the index will be incorrect. Use separate tensorsShapeLenB = len(bufferB.shape) for the B tensor, following the pattern in addGeometricalConstraint.

  2. Lines 154-189 (serializeTilingSolution): Missing transpose flag handling
    NSize, AMatrixOffsets/Shape, and BMatrixOffsets/Shape are hardcoded without considering transA and transB flags. Read these flags from operatorRepresentation and conditionally swap offsets and dimensions as shown in FloatGEMMTileConstraint.serializeTilingSolution (lines 118-119 read the flags, then lines 126-135 and 137-146 conditionally swap based on transpose state).

  3. Line 209 (serializeTilingSolution): Use strict=True with zip
    Change zip(inputACubes, inputBCubes) to zip(inputACubes, inputBCubes, strict=True) to fail fast if cube lists diverge in length (Python 3.10+ is required, which matches the project's requires-python = ">=3.10").

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (1)

467-598: Now let me search for the Python version requirements in the repository:

Based on the verification, I can see that:

  1. The Deeploy package supports Python 3.10, 3.11, and 3.12, which means the strict parameter for zip() is available since Python 3.10.

  2. The "NULL" bias sentinel is used consistently across the codebase:

    • ConvTileConstraint.py: lines 479, 573, 582
    • Parsers.py: lines 111, 169

All uses are consistent string literal comparisons. The review comment's suggestion about centralizing the constant is valid for maintainability.

Bias-aware tiling serialization looks correct; consider strengthening zip validation and centralizing the NULL sentinel

The bias-aware tiling serialization is coherent: inputCSize = weightCin * group aligns with ch_im_in constraints, ch_im_in/ch_im_out replacements match the operator representation, and bias base addresses and cubes are correctly threaded into inputLoadSchedule when varBias != "NULL". Two improvements to consider:

  • Both zip calls (lines 583 and 586) will silently drop extra elements if list lengths diverge. Since the minimum Python version is 3.10+, use zip(..., strict=True) to surface tiling bugs early and align with Ruff's B905 hint.
  • The "NULL" bias sentinel appears in both ConvTileConstraint.py (lines 479, 573, 582) and Parsers.py (lines 111, 169). Centralizing it as a module-level constant would prevent drift if the representation changes.
🧹 Nitpick comments (7)
CHANGELOG.md (1)

118-121: Maintain consistent formatting in the Fixed section.

Line 121 breaks the pattern of other items in the Fixed section—it lacks the "Fixed" verb prefix that lines 118–120 use. For consistency, consider restructuring to: "Fixed PULP GELU kernel to use tanh approximation."

The static analysis tool also flagged the three successive "Fixed" beginnings (lines 118–120); while repetition is acceptable in changelogs, tightening the wording could improve readability if desired.

TargetLibraries/PULPOpen/src/GELU.c (1)

24-30: Hoist constants and avoid powf in the inner loop for better performance

The tanh‑based GELU approximation is fine numerically, but as written it:

  • Calls sqrtf(2.0f / (float)M_PI) once per element.
  • Calls powf(x, 3.0f) once per element.

Both are relatively expensive on PULP cores and not needed per iteration. You can precompute the sqrt(2/π) constant once and compute as simple multiplies, which should significantly reduce cycles without changing behavior.

-  // Compute GELU on the assigned chunk
-  for (uint32_t i = chunk_start; i < chunk_stop; i++) {
-    float32_t x = data_in[i];
-    float32_t cdf = 0.5f * (1.0f + tanhf((sqrtf(2.0f / (float)M_PI) *
-                                          (x + 0.044715f * powf(x, 3.0f)))));
-
-    data_out[i] = x * cdf;
-  }
+  // Compute GELU on the assigned chunk
+  const float32_t k = sqrtf(2.0f / (float)M_PI);
+  for (int32_t i = chunk_start; i < chunk_stop; i++) {
+    float32_t x = data_in[i];
+    float32_t x3 = x * x * x;
+    float32_t cdf = 0.5f * (1.0f + tanhf(k * (x + 0.044715f * x3)));
+
+    data_out[i] = x * cdf;
+  }

This keeps the same tanh‑GELU form while making the kernel more friendly to tight tiling and per‑core execution.

Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py (1)

58-63: Helpful context on DMA assertion failures; linter warning is only stylistic

Wrapping _generateDmaTransferCalls in a try/except AssertionError and re-raising with the tensor name makes failures much easier to diagnose without changing control flow. The Ruff TRY003 warning here is stylistic; if desired, you could trim the message or introduce a small custom exception type to satisfy the linter, but it’s not functionally required.

Deeploy/TilingExtension/TilingCodegen.py (1)

167-183: Unhashable-safe uniqueness check in minimizeVariableReplacement

The new manual scan avoids set(value) and therefore works when replacement entries are unhashable (e.g., lists of shapes), while preserving the “fold if all equal, keep per-tile otherwise” behavior. This is a sensible robustness improvement given the new tiling constraints now emit list-valued replacements.

Deeploy/TilingExtension/TilerExtension.py (1)

422-451: Resolved tensor memory shapes now handle scalar buffers correctly

By normalizing scalar shapes (shape as int) to length-1 sequences when reconstructing newMemoryConstraint.shape, this method now lines up with the updated TilerModel behavior and keeps downstream users (e.g., rectangle computation, stride calculation) working uniformly on tuples. Looks good to me.

Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)

45-53: Pointer update templates look correct; confirm typeName invariant for all external buffers

The move to (${typeName}*)((char*)(${reference}) + ...) should fix the “byte offset on typed pointer” issue and makes the intent explicit. Using externalBuffer._type.referencedType.typeName for ${typeName} also matches the PointerClass pattern and yields base element type casts like float*.

The only assumption is that every externalBuffer._type here is a PointerClass with a valid .referencedType.typeName. If any external buffer can be non‑pointer (e.g., structs or scalars passed by value), this would break.

Please double‑check that all call sites passing externalBuffer into _generateExternalReferenceUpdate satisfy that invariant, or add a defensive assert isinstance(externalBuffer._type, PointerClass) / equivalent guard.

Also applies to: 160-164

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (1)

410-459: computeInputCube per-tile padding is well-structured; consider dilation and type-hint tweaks

The per-tile padding computation based on absolute output offsets fixes the classic interior-tile padding bug and the clamping with inputDims is a good safeguard against over-reads. Two minor points:

  • The receptive field size uses kernelShape only; if Conv2D ever uses dilations != (1, 1), the effective kernel span (dilation * (kernel - 1) + 1) will not be reflected here. If dilations are to be supported, you may want to pass them in and fold them into the size computation.
  • inputDims/outputDims are annotated as 3D tuples but are passed full NHWC shapes and indexed as such; updating these annotations to 4D or to a generic Sequence[int] would better match actual usage and help static type checking.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4795932 and e03cd85.

📒 Files selected for processing (19)
  • .github/workflows/ci-platform-siracusa-tiled.yml (5 hunks)
  • CHANGELOG.md (4 hunks)
  • Deeploy/DeeployTypes.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Platform.py (4 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (7 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (2 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (4 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/Tiler.py (3 hunks)
  • Deeploy/TilingExtension/AsyncDma.py (1 hunks)
  • Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py (1 hunks)
  • Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (2 hunks)
  • Deeploy/TilingExtension/TilerExtension.py (2 hunks)
  • Deeploy/TilingExtension/TilerModel.py (2 hunks)
  • Deeploy/TilingExtension/TilingCodegen.py (1 hunks)
  • DeeployTest/testUtils/codeGenerate.py (1 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
  • TargetLibraries/PULPOpen/src/GELU.c (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.

Applied to files:

  • CHANGELOG.md
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion manually.

Applied to files:

  • Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py
  • Deeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion.

Applied to files:

  • Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py
  • Deeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.

Applied to files:

  • Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py
  • Deeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.

Applied to files:

  • Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py
  • Deeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.

Applied to files:

  • Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
  • Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py
🧬 Code graph analysis (11)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (1)
  • uint16_t (42-45)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py (1)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
  • _generateDmaTransferCalls (115-149)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
Deeploy/AbstractDataTypes.py (1)
  • typeName (312-313)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (1)
  • uint16_t (42-45)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (4)
Deeploy/DeeployTypes.py (2)
  • lookup (720-752)
  • NetworkContext (508-1020)
Deeploy/TilingExtension/TilerModel.py (3)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
  • TilerModel (34-406)
Deeploy/TilingExtension/TileConstraint.py (1)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/TilingExtension/AsyncDma.py (1)
Deeploy/DeeployTypes.py (1)
  • NodeTemplate (87-229)
Deeploy/Targets/PULPOpen/Tiler.py (4)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (2)
  • DWConv2DTileConstraint (238-255)
  • RQDWConv2DTileConstraint (20-235)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
  • ReduceMeanTileConstraint (20-161)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)
  • SliceTileConstraint (20-171)
Deeploy/TilingExtension/TilerExtension.py (1)
  • TilingReadyNodeBindings (1027-1035)
Deeploy/Targets/PULPOpen/Platform.py (2)
Deeploy/Targets/PULPOpen/Parsers.py (1)
  • PULPFPDWConv2DParser (124-182)
Deeploy/Targets/Generic/Parsers.py (2)
  • ReduceMeanParser (526-570)
  • SliceParser (97-155)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (3)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (3)
  • Conv2DTileConstraint (233-598)
  • addPolicyConstraint (78-116)
  • addPolicyConstraint (337-386)
Deeploy/TilingExtension/TilerModel.py (2)
  • TilerModel (34-406)
  • getTensorDimVar (131-135)
Deeploy/DeeployTypes.py (1)
  • NetworkContext (508-1020)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (5)
Deeploy/DeeployTypes.py (3)
  • lookup (720-752)
  • NetworkContext (508-1020)
  • copy (1016-1020)
Deeploy/TilingExtension/TilerModel.py (4)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
  • addConstraint (103-120)
  • TilerModel (34-406)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (2)
  • uint16_t (42-45)
  • uint8_t (36-39)
Deeploy/TilingExtension/TilerExtension.py (1)
Deeploy/DeeployTypes.py (1)
  • lookup (720-752)
🪛 LanguageTool
CHANGELOG.md

[style] ~120-~120: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...e casting for tiling code generation. - Fixed bug in buffer name identification in co...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

65-65: Unused static method argument: tilerModel

(ARG004)


66-66: Unused static method argument: ctxt

(ARG004)


102-102: Unused class method argument: ctxt

(ARG003)

Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py

63-63: Avoid specifying long messages outside the exception class

(TRY003)

Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py

62-62: Unused static method argument: tilerModel

(ARG004)


63-63: Unused static method argument: ctxt

(ARG004)


100-100: Unused class method argument: ctxt

(ARG003)

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py

209-209: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py

583-583: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


586-586: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

🔇 Additional comments (21)
.github/workflows/ci-platform-siracusa-tiled.yml (4)

51-57: Conv2D and activation test expansion aligns with PR objectives.

The expansion from a single Conv2D test to bias-aware and depthwise variants, along with the new activation tests, reflects the PR's focus on improved tiling constraints and extended test coverage. L1 budget scaling (1600→7200 for DW variants) appears reasonable.

Also applies to: 62-63


102-102: Verify L1 budget for testFloatRelu in doublebuffer config.

L1=[20] bytes appears unrealistically low for a floating-point activation test. Typical I/O buffers alone would exceed this. Please confirm this is intentional (e.g., an empty/synthetic test) or if it should be a larger value like 200 or 2000.


138-142: TinyViT addition and multi-configuration memory testing align with PR objectives.

The introduction of testFloatDemoTinyViT with a consistent 4000-byte L1 budget across memory tiers, combined with multi-element L1 arrays for other tests, supports the PR's goal of enabling TinyViT on tiled Siracusa and extending test coverage across various memory configurations.


170-175: TinyViT and CCT configurations consistently applied across memory levels.

The uniform application of testFloatDemoTinyViT (L1=4000) and scaled CCT budgets across L3 sections and buffer strategies reinforces comprehensive test coverage at different memory availability levels. Consistency reduces configuration drift risk.

Also applies to: 210-215

TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

21-21: LGTM! Good type-safety practice.

The explicit cast from double (returned by log2()) to int8_t makes the type conversion intentional and improves code clarity. This aligns with the PR's type casting fixes.

DeeployTest/testUtils/codeGenerate.py (1)

227-235: Robust fix for buffer name parsing.

The pattern-based approach correctly restricts matches to exactly input_N or output_N (where N is a digit sequence), preventing the false positives from intermediate buffers like "conv_output_layer" or "input_weights_buffer" that would have been caught by substring checks. Edge cases (empty string, trailing underscores, multiple underscores, non-digit suffixes) are all handled safely by the combination of split, length check, exact prefix match, and isdigit() validation.

Deeploy/DeeployTypes.py (1)

479-485: Based on my verification of the codebase, I can now provide the final assessment:

Byte-wise pointer arithmetic fix is correct and safe to merge.

The change properly implements the recommended pattern of casting to char* before adding an offset to ensure byte-wise arithmetic, following SEI CERT C coding standards for correct pointer arithmetic.

Analysis of offset usage in the codebase shows:

  • Most _ReferenceBuffer instantiations (AsyncDma.py lines 247, 256) do not pass an offset parameter, defaulting to 0
  • The hoistReference method call in DeeployTypes.py line 907 does not pass an offset
  • Only TilingHoistingMixIn.py line 73 passes offset through from its caller—this is the primary code path to verify

No evidence was found of offset parameters being multiplied by element size elsewhere in the codebase, indicating callers are already using byte offsets correctly.

The fix successfully prevents the pointer arithmetic overflow bug for typed pointers (e.g., float32*) where element-scaled arithmetic would cause unintended memory access.

Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1)

20-255: RQ and FP DWConv2D tiling constraints look consistent with their respective layouts

  • RQDWConv2DTileConstraint correctly wires NCHW input, CHW weights, and NHWC output/add/mul, and the output-size formula matches standard conv padding/stride behavior given dilation is effectively 1.
  • DWConv2DTileConstraint reuses Conv2DTileConstraint’s NHWC geometry and only tightens channel equalities (ch_im_in/ch_im_out on dim 3), which matches the depthwise use-case and the new bindings in PULPOpen/Tiler.py.

No issues from my side here.

Deeploy/TilingExtension/AsyncDma.py (1)

201-263: Byte-wise pointer arithmetic fix aligns with legalized DMA offsets

Casting basePtr to char * before adding ${offset} ensures the offset is applied in bytes rather than elements, which matches the tiling pipeline’s _legalizeTransfers behavior (offsets/strides already converted to bytes for DMAs). This looks like the correct and minimal fix for the previous overflow/misalignment issue.

Deeploy/TilingExtension/TilerModel.py (2)

143-160: Scalar tensor.shape handling in dimension modeling is sound

Normalizing tensor.shape to a single-element iterable when it’s an int makes addTensorDimToModel robust for scalar buffers without affecting multi-dim tensors. Bounds and variable naming remain unchanged.


171-185: Consistent scalar-shape handling in element-count variable

Using the same int-vs-sequence normalization in addTensorNumOfEltToModel keeps the product logic consistent with the dimension variables and avoids iterating over non-iterable scalar shapes; the np.prod(tensor.shape) upper bound still behaves correctly for both cases.

Deeploy/Targets/PULPOpen/Tiler.py (3)

18-28: Imports align with new DWConv/Slice/ReduceMean tiling paths

The added PULP bindings and tile-constraint imports (PULPFloatDWConv2DBindings, PULPSliceBindings, PULPReduceMeanBindings, DWConv2DTileConstraint, ReduceMeanTileConstraint, SliceTileConstraint) match the new TileConstraint implementations and look consistent with the rest of the tiling stack.


43-54: DWConv tiling-ready bindings correctly separate RQ and FP variants

  • PULPRQSDWConv2DTilingReadyBindings now uses RQDWConv2DTileConstraint, matching the requantized depthwise kernel.
  • New PULPDWConv2DTilingReadyBindings pairs FP DWConv bindings with the generic DWConv2DTileConstraint built on top of Conv2DTileConstraint.

This cleanly splits RQ vs FP DWConv tiling behavior without changing existing Conv2D bindings.


135-145: Slice and ReduceMean are now properly exposed as tiling-ready operations

PULPSliceTilingReadyBindings and PULPReduceMeanTilingReadyBindings correctly plug SliceTileConstraint and ReduceMeanTileConstraint into the PULP Slice/ReduceMean bindings, enabling the new tiling logic and serialization paths for these ops. No issues spotted.

Deeploy/Targets/PULPOpen/Platform.py (1)

32-33: Tiling‑ready bindings wiring for PULP ops looks consistent

The re‑wiring of mappers to *TilingReadyBindings for ReduceMean, FP Conv2D/DWConv2D, and Slice (with SliceLayer([SliceMapper, DMASliceMapper])) is coherent and keeps the DMA‑specific Slice path separate via PULPDMASliceBindings. The mapping table remains backward‑compatible for op names (ReduceMean, IntegerMean, Slice, etc.) while enabling tiling.

No structural issues stand out; this should be a drop‑in improvement assuming the new bindings expose the same template fields used by the parsers.

Please run a minimal ONNX with:

  • batched ReduceMean (with/without keepdims)
  • regular and DW Conv2D (with/without bias)
  • standard Slice and DMA Slice

to confirm all mappers hit the intended tiling‑ready bindings in the PULP pipeline.

Also applies to: 38-49, 68-69, 78-79, 93-96, 135-136

Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

20-73: Geometrical constraints for batched MatMul look correct and trans‑aware

This refactor to derive AMatrix*, BMatrix*, and outputMatrix* from tensor shapes while honoring transA/transB is solid. The batch constraints conditioned on bufferA.shape[:-2] == bufferB.shape[:-2] nicely avoid over‑constraining broadcasted MatMul while still tying the output’s batch dims to A/B when shapes match.

No obvious correctness issues here.

Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)

22-59: The primary concern about parseDict['axes']/['steps'] being tensor names is incorrect—normalization occurs before invocation.

The review identifies that Generic SliceParser stores axes, steps, and starts as tensor names but overlooks the normalization step. In DMASliceTemplate.alignToContext() (lines 44-47) and similarly in Generic/Templates/SliceTemplate.py (lines 33-34), these fields are converted to integer arrays via operatorRepresentation['axes'] = axesBuffer.values before the tiling constraint is invoked. When SliceTileConstraint methods receive operatorRepresentation (aliased as parseDict), the values are already integer arrays. The code path is:

  1. Generic parser: stores tensor names
  2. Template layer: resolves to actual values
  3. SliceTileConstraint: receives normalized integer arrays

The remaining points (starts initialization, 4D rank assumption, unused interface parameters) are valid observations but represent minor documentation or hardening suggestions rather than functional defects.

Likely an incorrect or invalid review comment.

Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (4)

182-195: Consider using absolute output offsets for RQConv2D input-cube padding

computeInputCube now supports outputAbsoluteOffsets, but this call still relies on cube.offset only. If RQConv2D ever tiles the output with padding, interior tiles may see incorrect per-tile padding (same issue this change fixes for Conv2D). Consider passing the corresponding absoluteOutputCubes[idx].absoluteOffset here as well, or confirm that RQConv2D is only used in the single-tile/full-output case.


235-334: Geometrical constraints for grouped, padded Conv2D (with optional bias) look consistent

The refactored constraints using effectiveInputHeight/Width, dilations, and group (including inputChannelVar == weightInChannelVar * group and bias-dimension equality) align with standard Conv2D shape relations and should make spatial tiling more robust around padding and grouping. The worst‑case overlap modeling is clearly documented and acceptable until a more precise per-tile memory model is introduced.


339-385: Policy constraints now align with effective input geometry and grouping

Using effectiveInputHeight/Width for minimum kernel coverage and stride compatibility, while enforcing full input channels (inputChannelVar == ch_im_in) and weightInChannelVar * group == ch_im_in, ensures tiles are both geometrically valid and group-consistent. Keeping weights untiled via equality to dim_kernel_{x,y} preserves the original kernel semantics.


392-407: Symbolic node representation matches new replacement keys

Populating symbolicParseDict with dim_im_in_x, dim_im_in_y, dim_im_out_x, and dim_im_out_y keeps it in sync with the keys used in serializeTilingSolution’s replacements dict, which should avoid missing-variable issues downstream in tiling codegen.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)

100-229: Transpose and broadcast tiling logic is correct; Python ≥3.10 confirmed; optional rank asserts recommended

The refactored serializeTilingSolution correctly:

  • Reads transA and transB from operatorRepresentation (lines 112–113) and uses them to compute NSize and matrix offsets/shapes (lines 129–132, 140–147), matching the GEMM tiling pattern from GEMMTileConstraint.
  • Handles batch broadcasting by right-aligning batch dimensions of A and B against data_out and falling back to (offset=0, size=1) when an input batch dimension doesn't match the output.
  • Enforces "only the last batch dimension may be tiled" via the offsets[:-3] == 0 assert (line 127).
  • Uses zip(..., strict=True) safely: pyproject.toml declares requires-python = ">=3.10", so this is compatible.

Optional recommendation:
Add defensive assertions before the batch-alignment loops to document and catch invalid operator shapes early:

assert len(buffOut.shape) >= tensorsShapeLenA, \
    f"MatMul output rank {len(buffOut.shape)} < A rank {tensorsShapeLenA}"
assert len(buffOut.shape) >= tensorsShapeLenB, \
    f"MatMul output rank {len(buffOut.shape)} < B rank {tensorsShapeLenB}"

This makes the implicit rank contract explicit without changing behavior for valid shapes.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e03cd85 and f65282b.

📒 Files selected for processing (2)
  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (4 hunks)
  • TargetLibraries/PULPOpen/src/GELU.c (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • TargetLibraries/PULPOpen/src/GELU.c
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.

Applied to files:

  • Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (5)
Deeploy/DeeployTypes.py (2)
  • lookup (720-752)
  • NetworkContext (508-1020)
Deeploy/TilingExtension/TilerModel.py (3)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
  • TilerModel (34-406)
Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)
  • addPolicyConstraint (63-87)
  • addPolicyConstraint (234-258)
Deeploy/TilingExtension/TileConstraint.py (2)
  • addPolicyConstraint (32-36)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilingCodegen.py (1)
  • HyperRectangle (24-35)
🔇 Additional comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (2)

22-72: Geometric + batch constraints for MatMul look consistent with GEMM semantics

The refactored addGeometricalConstraint correctly:

  • Derives matrix dims from the last two axes and honors transA/transB.
  • Enforces M/O/inner-dim consistency between A, B, and data_out.
  • Adds batch-dim equality constraints only when A and B share identical leading shapes, which is a safe choice in the presence of broadcasting.

This aligns well with the GEMM tiling approach and should behave correctly for batched MatMul.


78-91: Policy constraints for N-dimension remain consistent with GEMM behavior

ASecondDimVar/BFirstDimVar are derived from the last two dims with transA/transB applied, then both constrained to parseDict['N']. This matches the GEMM policy of not tiling along the inner (N) dimension and looks algebraically equivalent to the existing GEMM implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)

117-131: Document or assert the rank ≤ 4 assumption in replacementTypes.

The replacementTypes structure uses fixed-length lists of 4 pointer types for data_in_shape and data_out_shape, implicitly assuming tensor rank ≤ 4. If ReduceMean is applied to higher-rank tensors, this could lead to incorrect behavior.

Consider adding an assertion in the method or a docstring comment documenting this constraint:

# Assert rank assumption
assert len(operatorRepresentation['data_in_shape']) <= 4, \
    "ReduceMean tiling currently supports tensors with rank ≤ 4"
assert len(operatorRepresentation['data_out_shape']) <= 4, \
    "ReduceMean tiling currently supports tensors with rank ≤ 4"

Or document it in the class/method docstring:

@classmethod
def serializeTilingSolution(
        cls, tilingSolution: NodeMemoryConstraint, absoluteOutputCubes: List[AbsoluteHyperRectangle],
        targetMemLevel: str, ctxt: NetworkContext,
        operatorRepresentation: OperatorRepresentation) -> Tuple[VariableReplacementScheme, TilingSchedule]:
    """
    Serialize the tiling solution for ReduceMean.
    
    Note: Currently supports tensors with rank ≤ 4 due to replacementTypes structure.
    """
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f65282b and 54acf5b.

📒 Files selected for processing (1)
  • Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (1)
  • uint16_t (42-45)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

61-61: Unused static method argument: tilerModel

(ARG004)


62-62: Unused static method argument: ctxt

(ARG004)


98-98: Unused class method argument: ctxt

(ARG003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
  • GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
  • GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
  • GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
🔇 Additional comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)

22-58: Excellent fix for the keepdims bug!

The geometrical constraint logic now correctly handles both keepdims == 0 and keepdims == 1 cases by iterating over input axes while maintaining an output index pointer. This resolves the critical bug flagged in the previous review.

The implementation correctly:

  • Iterates over input axes (line 42)
  • Skips incrementing output_idx when a reduced axis has keepDims == False (lines 43-50)
  • Maps non-reduced input dimensions to their corresponding output dimensions (lines 52-56)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)

95-157: Serialize logic looks sound; rank and size assumptions could be documented/guarded

The serialization flow—extracting base addresses, deriving in_cube from each out_cube, and filling data_in_shape, data_out_shape, and size—is coherent and consistent with the tiling abstractions. A couple of assumptions are baked in, though:

  • replacementTypes["data_in_shape"] / ["data_out_shape"] hard‑code four uint16_t pointers; this effectively assumes ReduceMean is only used with rank ≤ 4 tensors. If higher‑rank use is possible, consider asserting the rank or generalizing the scheme to avoid silent mismatch between dims and replacement types.
  • size is stored as uint16_t; if an output tile size can exceed 65535 elements, it will overflow the element count. If that’s not guaranteed impossible by upstream constraints, you might want to assert an upper bound or switch to a wider type.

ctxt is currently unused here as well; if it’s only present for interface consistency, renaming to _ctxt or adding a brief comment would quiet ARG003 without functional changes.

🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (2)

22-58: Geometrical constraint mapping for keepdims=0/1 now looks correct; consider a small sanity check

The revised loop over input axes with an output_idx pointer fixes the earlier keepdims bug and keeps addGeometricalConstraint consistent with computeInputCubeFromOutputCube. To catch mis‑specified axes/shape combinations early, you may want to assert at the end that output_idx matches the expected output rank (e.g. len(parseDict["data_out_shape"]) or the context tensor rank).


60-65: Unused tilerModel/ctxt parameters in constructSymbolicNodeRep

This method currently just clones parseDict, so tilerModel and ctxt stay unused and trigger Ruff’s ARG004 warning. If they’re required by the interface but intentionally unused, consider renaming them to _tilerModel/_ctxt or adding a short comment/pragma to silence the warning without changing behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 54acf5b and 97f7c83.

📒 Files selected for processing (1)
  • Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (1)
  • uint16_t (42-45)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

61-61: Unused static method argument: tilerModel

(ARG004)


62-62: Unused static method argument: ctxt

(ARG004)


98-98: Unused class method argument: ctxt

(ARG003)

🔇 Additional comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)

67-93: Type annotation and cube reconstruction logic are consistent with call site

Changing outputCube to HyperRectangle matches the actual argument (cube.rectangle) in serializeTilingSolution, and the input‑cube reconstruction correctly (a) copies full extents along reduced axes and (b) uses the output cube’s dims/offsets only for non‑reduced axes with a shared out_idx. This aligns with the geometrical constraints and avoids the previous type mismatch.

@Xeratec Xeratec moved this to In review in Deeploy Nov 19, 2025
@Xeratec Xeratec modified the milestones: Release xxx, Release 0.2.1 Nov 19, 2025
Copy link
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I also appreciate the comments you added in the tile constraints.
I only have a few minor comments.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (2)

61-67: constructSymbolicNodeRep ignores tilerModel/ctxt and just copies parseDict

Right now this method is effectively a no-op wrapper around parseDict.copy() and does not use tilerModel or ctxt, which matches Ruff’s unused-argument warnings. If the interface requires these parameters, consider prefixing them with _ or adding a brief comment that this node has no symbolic fields to construct yet, to make the intent explicit and quiet linters.


97-180: Minor clarity issues in serializeTilingSolution (unused ctxt, dead new_ends path)

  • Parameter ctxt is unused, matching the Ruff warning; likely required by the base interface, but adding _ = ctxt or a short comment would clarify intent and silence linting.
  • new_ends is computed (Lines 155-158) but only used in commented-out replacement code; this is slightly misleading. Either remove its computation while tiling is frozen or add a TODO referencing issue #134 so it’s clear this is intentional.

These are non-functional and can be adjusted when convenient.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 97f7c83 and fea1d26.

📒 Files selected for processing (4)
  • Deeploy/Targets/PULPOpen/Bindings.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1 hunks)
  • Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1 hunks)
  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.

Applied to files:

  • Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
  • Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.

Applied to files:

  • TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
🧬 Code graph analysis (3)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (4)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
Deeploy/CommonExtensions/DataTypes.py (1)
  • uint16_t (42-45)
Deeploy/TilingExtension/MemoryConstraints.py (1)
  • NodeMemoryConstraint (95-167)
Deeploy/TilingExtension/TileConstraint.py (2)
  • TileConstraint (19-221)
  • extractBaseAddr (56-74)
Deeploy/TilingExtension/TilerModel.py (3)
  • TilerModel (34-406)
  • addTensorDimToModel (143-159)
  • getTensorDimVar (131-135)
Deeploy/TilingExtension/TilingCodegen.py (4)
  • AbsoluteHyperRectangle (39-49)
  • HyperRectangle (24-35)
  • TilingSchedule (53-122)
  • VariableReplacementScheme (126-158)
Deeploy/Targets/PULPOpen/Bindings.py (2)
Deeploy/Targets/Generic/TypeCheckers.py (1)
  • SliceChecker (37-51)
Deeploy/AbstractDataTypes.py (1)
  • PointerClass (536-559)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py

61-61: Unused static method argument: tilerModel

(ARG004)


62-62: Unused static method argument: ctxt

(ARG004)


98-98: Unused class method argument: ctxt

(ARG003)

Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py

62-62: Unused static method argument: tilerModel

(ARG004)


63-63: Unused static method argument: ctxt

(ARG004)


100-100: Unused class method argument: ctxt

(ARG003)

🔇 Additional comments (5)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)

21-21: LGTM! Good optimization using PULP-specific macro.

Replacing the runtime log2() function with the LOG2() macro eliminates runtime overhead and aligns with PULP platform best practices. This is consistent with patterns used in other PULP kernels.

Based on learnings and past review comments.

Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)

22-93: Previous bugs successfully resolved.

Both critical issues from earlier reviews have been correctly addressed:

  1. The addGeometricalConstraint method (lines 22-58) now properly iterates over input axes while maintaining an output_idx pointer, correctly handling both keepdims=True and keepdims=False cases.
  2. The computeInputCubeFromOutputCube type annotation (line 68) now correctly specifies HyperRectangle instead of AbsoluteHyperRectangle, matching both the caller (line 142) and the implementation's use of .dims and .offset attributes.

The logic is now consistent and correct.

Deeploy/Targets/PULPOpen/Bindings.py (1)

157-167: Refactor suggestion valid; kernel support requires manual verification

Verification confirms your analysis: SliceChecker does infer signedness and levels from the first input only (line 37-51 of Generic TypeCheckers), the template itself is type-agnostic with no hard-coded integer types, and itertools is already imported and used elsewhere (PULPMulBindings).

The new PULPSliceBindings expands index type support from the fixed uint8_t used in Generic and DMA variants to all IntegerDataTypes, which is consistent with your pattern observation. However, the actual kernel implementations cannot be located in this search, so I cannot definitively confirm they support all integer index widths—your verification recommendation stands.

The itertools.product refactor suggestion is valid and improves consistency with existing code; it's already the pattern in PULPMulBindings.

Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (2)

69-95: Type annotation of computeInputCubeFromOutputCube is inconsistent with actual usage

The method is annotated with outputCube: AbsoluteHyperRectangle, but it is always called with HyperRectangle instances (outputCubes = [cube.rectangle for cube in absoluteOutputCubes]), and inside it accesses .dims and .offset, which only exist on HyperRectangle. This is a type-level inconsistency that may confuse readers and tooling even though runtime behavior is correct.

Consider updating the signature to:

-    def computeInputCubeFromOutputCube(outputCube: AbsoluteHyperRectangle, parseDict: Dict) -> HyperRectangle:
+    def computeInputCubeFromOutputCube(outputCube: HyperRectangle, parseDict: Dict) -> HyperRectangle:

to align types with actual usage.
[ suggest_recommended_refactor ]


22-59: Constraint formula indeed assumes positive steps; ONNX Slice spec supports negative steps

The ONNX Slice operator specification supports negative steps, where negative values mean slicing backward. However, your constraint formula inputDimensionVar == ((outputDimensionVar - 1) * axStep + 1) at line 54 is only valid for positive axStep. Additionally, the loop template in SliceTemplate.py uses for(uint32_t i_${axis} = ${start}; i_${axis} < ${end}; i_${axis} += ${step}), which assumes unsigned arithmetic and will fail or produce incorrect behavior with negative steps.

There is no validation in SliceParser to reject or handle negative steps. If your models use negative steps for backward slicing, this implementation will silently produce incorrect constraints or incorrect loop behavior. Consider adding explicit validation or extending the constraint logic to handle both positive and negative steps.

Copy link
Member

@Xeratec Xeratec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for you valuable work <3

@Xeratec Xeratec merged commit e07cd13 into pulp-platform:devel Nov 21, 2025
143 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in Deeploy Nov 21, 2025
@diaconuccalin diaconuccalin deleted the DemoTinyViT_Tiled_Siracusa branch November 21, 2025 13:59
@coderabbitai coderabbitai bot mentioned this pull request Nov 26, 2025
5 tasks
@coderabbitai coderabbitai bot mentioned this pull request Dec 10, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working Feature Addition of new features

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants