-
Notifications
You must be signed in to change notification settings - Fork 32
Demo TinyViT compatibility with tiled Siracusa #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo TinyViT compatibility with tiled Siracusa #124
Conversation
95d30e5 to
19a04dd
Compare
…ed closure naming issue
…ffer size computation. Added fixes to remove warnings
…ion. Adjusted memory allocation for the im2col buffer in 2d float conv for PULPOpen (lower requirement, dynamic allocation). Included all Float 2D Conv tests in the CI pipeline for siracusa
…version). Added MatMulAddMergePass to the PULP optimizer. Added float-reshape-skip-connection test to Siracusa CI pipeline. Fixes in the FloatAddTemplate of PULPOpen to eliminate warnings
… FP DW Conv2D tiling constraints. Fixed bugs in input tiling for PULP FP Conv2D. Added new tiled siracusa tests and reduced size of existent ones that now support input tiling for conv operations
…nstraints as well
… for new constraint set. Bugifx for input/output L3 buffer name check, where similar naming of internal nodes would confuse the dumper
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughThis PR expands tiling support in the Deeploy framework for PULPOpen targets by introducing new tile constraint classes (ReduceMean, Slice), enhancing existing constraints with bias and grouping awareness, refactoring pointer arithmetic for byte-wise addressing, and updating bindings and platform configuration to use tiling-ready variants. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
76-95: Fix three critical issues with MatMul transpose handling and broadcasting
Line 82-88 (addPolicyConstraint): Broadcasting issue with different tensor ranks
BFirstDimVarcalculation usestensorsShapeLen = len(bufferA.shape), but this length is applied tobufferB. If bufferA and bufferB have different ranks due to broadcasting, the index will be incorrect. Use separatetensorsShapeLenB = len(bufferB.shape)for the B tensor, following the pattern inaddGeometricalConstraint.Lines 154-189 (serializeTilingSolution): Missing transpose flag handling
NSize,AMatrixOffsets/Shape, andBMatrixOffsets/Shapeare hardcoded without consideringtransAandtransBflags. Read these flags fromoperatorRepresentationand conditionally swap offsets and dimensions as shown inFloatGEMMTileConstraint.serializeTilingSolution(lines 118-119 read the flags, then lines 126-135 and 137-146 conditionally swap based on transpose state).Line 209 (serializeTilingSolution): Use
strict=Truewith zip
Changezip(inputACubes, inputBCubes)tozip(inputACubes, inputBCubes, strict=True)to fail fast if cube lists diverge in length (Python 3.10+ is required, which matches the project'srequires-python = ">=3.10").Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (1)
467-598: Now let me search for the Python version requirements in the repository:Based on the verification, I can see that:
The Deeploy package supports Python 3.10, 3.11, and 3.12, which means the
strictparameter forzip()is available since Python 3.10.The
"NULL"bias sentinel is used consistently across the codebase:
- ConvTileConstraint.py: lines 479, 573, 582
- Parsers.py: lines 111, 169
All uses are consistent string literal comparisons. The review comment's suggestion about centralizing the constant is valid for maintainability.
Bias-aware tiling serialization looks correct; consider strengthening zip validation and centralizing the NULL sentinel
The bias-aware tiling serialization is coherent:
inputCSize = weightCin * groupaligns withch_im_inconstraints,ch_im_in/ch_im_outreplacements match the operator representation, and bias base addresses and cubes are correctly threaded intoinputLoadSchedulewhenvarBias != "NULL". Two improvements to consider:
- Both
zipcalls (lines 583 and 586) will silently drop extra elements if list lengths diverge. Since the minimum Python version is 3.10+, usezip(..., strict=True)to surface tiling bugs early and align with Ruff's B905 hint.- The
"NULL"bias sentinel appears in both ConvTileConstraint.py (lines 479, 573, 582) and Parsers.py (lines 111, 169). Centralizing it as a module-level constant would prevent drift if the representation changes.
🧹 Nitpick comments (7)
CHANGELOG.md (1)
118-121: Maintain consistent formatting in the Fixed section.Line 121 breaks the pattern of other items in the Fixed section—it lacks the "Fixed" verb prefix that lines 118–120 use. For consistency, consider restructuring to: "Fixed PULP GELU kernel to use tanh approximation."
The static analysis tool also flagged the three successive "Fixed" beginnings (lines 118–120); while repetition is acceptable in changelogs, tightening the wording could improve readability if desired.
TargetLibraries/PULPOpen/src/GELU.c (1)
24-30: Hoist constants and avoidpowfin the inner loop for better performanceThe tanh‑based GELU approximation is fine numerically, but as written it:
- Calls
sqrtf(2.0f / (float)M_PI)once per element.- Calls
powf(x, 3.0f)once per element.Both are relatively expensive on PULP cores and not needed per iteration. You can precompute the
sqrt(2/π)constant once and computex³as simple multiplies, which should significantly reduce cycles without changing behavior.- // Compute GELU on the assigned chunk - for (uint32_t i = chunk_start; i < chunk_stop; i++) { - float32_t x = data_in[i]; - float32_t cdf = 0.5f * (1.0f + tanhf((sqrtf(2.0f / (float)M_PI) * - (x + 0.044715f * powf(x, 3.0f))))); - - data_out[i] = x * cdf; - } + // Compute GELU on the assigned chunk + const float32_t k = sqrtf(2.0f / (float)M_PI); + for (int32_t i = chunk_start; i < chunk_stop; i++) { + float32_t x = data_in[i]; + float32_t x3 = x * x * x; + float32_t cdf = 0.5f * (1.0f + tanhf(k * (x + 0.044715f * x3))); + + data_out[i] = x * cdf; + }This keeps the same tanh‑GELU form while making the kernel more friendly to tight tiling and per‑core execution.
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py (1)
58-63: Helpful context on DMA assertion failures; linter warning is only stylisticWrapping
_generateDmaTransferCallsin atry/except AssertionErrorand re-raising with the tensor name makes failures much easier to diagnose without changing control flow. The Ruff TRY003 warning here is stylistic; if desired, you could trim the message or introduce a small custom exception type to satisfy the linter, but it’s not functionally required.Deeploy/TilingExtension/TilingCodegen.py (1)
167-183: Unhashable-safe uniqueness check inminimizeVariableReplacementThe new manual scan avoids
set(value)and therefore works when replacement entries are unhashable (e.g., lists of shapes), while preserving the “fold if all equal, keep per-tile otherwise” behavior. This is a sensible robustness improvement given the new tiling constraints now emit list-valued replacements.Deeploy/TilingExtension/TilerExtension.py (1)
422-451: Resolved tensor memory shapes now handle scalar buffers correctlyBy normalizing scalar shapes (
shapeasint) to length-1 sequences when reconstructingnewMemoryConstraint.shape, this method now lines up with the updatedTilerModelbehavior and keeps downstream users (e.g., rectangle computation, stride calculation) working uniformly on tuples. Looks good to me.Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
45-53: Pointer update templates look correct; confirmtypeNameinvariant for all external buffersThe move to
(${typeName}*)((char*)(${reference}) + ...)should fix the “byte offset on typed pointer” issue and makes the intent explicit. UsingexternalBuffer._type.referencedType.typeNamefor${typeName}also matches the PointerClass pattern and yields base element type casts likefloat*.The only assumption is that every
externalBuffer._typehere is a PointerClass with a valid.referencedType.typeName. If any external buffer can be non‑pointer (e.g., structs or scalars passed by value), this would break.Please double‑check that all call sites passing
externalBufferinto_generateExternalReferenceUpdatesatisfy that invariant, or add a defensiveassert isinstance(externalBuffer._type, PointerClass)/ equivalent guard.Also applies to: 160-164
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (1)
410-459:computeInputCubeper-tile padding is well-structured; consider dilation and type-hint tweaksThe per-tile padding computation based on absolute output offsets fixes the classic interior-tile padding bug and the clamping with
inputDimsis a good safeguard against over-reads. Two minor points:
- The receptive field size uses
kernelShapeonly; if Conv2D ever usesdilations != (1, 1), the effective kernel span (dilation * (kernel - 1) + 1) will not be reflected here. If dilations are to be supported, you may want to pass them in and fold them into the size computation.inputDims/outputDimsare annotated as 3D tuples but are passed full NHWC shapes and indexed as such; updating these annotations to 4D or to a genericSequence[int]would better match actual usage and help static type checking.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
.github/workflows/ci-platform-siracusa-tiled.yml(5 hunks)CHANGELOG.md(4 hunks)Deeploy/DeeployTypes.py(1 hunks)Deeploy/Targets/PULPOpen/Platform.py(4 hunks)Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py(7 hunks)Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py(2 hunks)Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py(4 hunks)Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/Tiler.py(3 hunks)Deeploy/TilingExtension/AsyncDma.py(1 hunks)Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py(1 hunks)Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py(2 hunks)Deeploy/TilingExtension/TilerExtension.py(2 hunks)Deeploy/TilingExtension/TilerModel.py(2 hunks)Deeploy/TilingExtension/TilingCodegen.py(1 hunks)DeeployTest/testUtils/codeGenerate.py(1 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)TargetLibraries/PULPOpen/src/GELU.c(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
📚 Learning: 2025-09-24T12:49:17.889Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: Deeploy/Targets/PULPOpen/Templates/FloatConvTemplate.py:100-0
Timestamp: 2025-09-24T12:49:17.889Z
Learning: In Deeploy's PULP FloatConvTemplate.py, the parameter order for PULP_Conv2d_Im2Col_fp*_HWC calls uses X,Y ordering (dim_im_in_x, dim_im_in_y, dim_kernel_x, dim_kernel_y, stride_x, stride_y) which is correct for the implementation, despite appearing different from some other function signatures.
Applied to files:
CHANGELOG.md
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion manually.
Applied to files:
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.pyDeeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py handles conversion from elements to bytes for DMA operations when isFinalMemoryLevel is true, eliminating the need for individual DMA implementations like MchanDma to perform this conversion.
Applied to files:
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.pyDeeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
Applied to files:
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.pyDeeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:58:06.454Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The _legalizeTransfers function in TilingCodeGeneration.py converts the innermost dimension of transfer rectangles from elements to bytes using a sizeInBytes helper function, ensuring DMA implementations receive properly byte-converted transfer parameters.
Applied to files:
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.pyDeeploy/TilingExtension/AsyncDma.py
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.
Applied to files:
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.pyDeeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.pyDeeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py
🧬 Code graph analysis (11)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (1)
uint16_t(42-45)Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py (1)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
_generateDmaTransferCalls(115-149)
Deeploy/TilingExtension/CodeTransformationPasses/TilingCodeGeneration.py (1)
Deeploy/AbstractDataTypes.py (1)
typeName(312-313)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (1)
uint16_t(42-45)Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (4)
Deeploy/DeeployTypes.py (2)
lookup(720-752)NetworkContext(508-1020)Deeploy/TilingExtension/TilerModel.py (3)
addTensorDimToModel(143-159)getTensorDimVar(131-135)TilerModel(34-406)Deeploy/TilingExtension/TileConstraint.py (1)
extractBaseAddr(56-74)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
Deeploy/TilingExtension/AsyncDma.py (1)
Deeploy/DeeployTypes.py (1)
NodeTemplate(87-229)
Deeploy/Targets/PULPOpen/Tiler.py (4)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (2)
DWConv2DTileConstraint(238-255)RQDWConv2DTileConstraint(20-235)Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
ReduceMeanTileConstraint(20-161)Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)
SliceTileConstraint(20-171)Deeploy/TilingExtension/TilerExtension.py (1)
TilingReadyNodeBindings(1027-1035)
Deeploy/Targets/PULPOpen/Platform.py (2)
Deeploy/Targets/PULPOpen/Parsers.py (1)
PULPFPDWConv2DParser(124-182)Deeploy/Targets/Generic/Parsers.py (2)
ReduceMeanParser(526-570)SliceParser(97-155)
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (3)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (3)
Conv2DTileConstraint(233-598)addPolicyConstraint(78-116)addPolicyConstraint(337-386)Deeploy/TilingExtension/TilerModel.py (2)
TilerModel(34-406)getTensorDimVar(131-135)Deeploy/DeeployTypes.py (1)
NetworkContext(508-1020)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (5)
Deeploy/DeeployTypes.py (3)
lookup(720-752)NetworkContext(508-1020)copy(1016-1020)Deeploy/TilingExtension/TilerModel.py (4)
addTensorDimToModel(143-159)getTensorDimVar(131-135)addConstraint(103-120)TilerModel(34-406)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (2)
uint16_t(42-45)uint8_t(36-39)
Deeploy/TilingExtension/TilerExtension.py (1)
Deeploy/DeeployTypes.py (1)
lookup(720-752)
🪛 LanguageTool
CHANGELOG.md
[style] ~120-~120: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...e casting for tiling code generation. - Fixed bug in buffer name identification in co...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
65-65: Unused static method argument: tilerModel
(ARG004)
66-66: Unused static method argument: ctxt
(ARG004)
102-102: Unused class method argument: ctxt
(ARG003)
Deeploy/TilingExtension/CodeTransformationPasses/SingleBufferingTilingCodeGeneration.py
63-63: Avoid specifying long messages outside the exception class
(TRY003)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
62-62: Unused static method argument: tilerModel
(ARG004)
63-63: Unused static method argument: ctxt
(ARG004)
100-100: Unused class method argument: ctxt
(ARG003)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
209-209: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py
583-583: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
586-586: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
🔇 Additional comments (21)
.github/workflows/ci-platform-siracusa-tiled.yml (4)
51-57: Conv2D and activation test expansion aligns with PR objectives.The expansion from a single Conv2D test to bias-aware and depthwise variants, along with the new activation tests, reflects the PR's focus on improved tiling constraints and extended test coverage. L1 budget scaling (1600→7200 for DW variants) appears reasonable.
Also applies to: 62-63
102-102: Verify L1 budget for testFloatRelu in doublebuffer config.L1=[20] bytes appears unrealistically low for a floating-point activation test. Typical I/O buffers alone would exceed this. Please confirm this is intentional (e.g., an empty/synthetic test) or if it should be a larger value like 200 or 2000.
138-142: TinyViT addition and multi-configuration memory testing align with PR objectives.The introduction of testFloatDemoTinyViT with a consistent 4000-byte L1 budget across memory tiers, combined with multi-element L1 arrays for other tests, supports the PR's goal of enabling TinyViT on tiled Siracusa and extending test coverage across various memory configurations.
170-175: TinyViT and CCT configurations consistently applied across memory levels.The uniform application of testFloatDemoTinyViT (L1=4000) and scaled CCT budgets across L3 sections and buffer strategies reinforces comprehensive test coverage at different memory availability levels. Consistency reduces configuration drift risk.
Also applies to: 210-215
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
21-21: LGTM! Good type-safety practice.The explicit cast from
double(returned bylog2()) toint8_tmakes the type conversion intentional and improves code clarity. This aligns with the PR's type casting fixes.DeeployTest/testUtils/codeGenerate.py (1)
227-235: Robust fix for buffer name parsing.The pattern-based approach correctly restricts matches to exactly
input_Noroutput_N(where N is a digit sequence), preventing the false positives from intermediate buffers like"conv_output_layer"or"input_weights_buffer"that would have been caught by substring checks. Edge cases (empty string, trailing underscores, multiple underscores, non-digit suffixes) are all handled safely by the combination ofsplit, length check, exact prefix match, andisdigit()validation.Deeploy/DeeployTypes.py (1)
479-485: Based on my verification of the codebase, I can now provide the final assessment:Byte-wise pointer arithmetic fix is correct and safe to merge.
The change properly implements the recommended pattern of casting to char* before adding an offset to ensure byte-wise arithmetic, following SEI CERT C coding standards for correct pointer arithmetic.
Analysis of offset usage in the codebase shows:
- Most
_ReferenceBufferinstantiations (AsyncDma.py lines 247, 256) do not pass an offset parameter, defaulting to 0- The
hoistReferencemethod call in DeeployTypes.py line 907 does not pass an offset- Only
TilingHoistingMixIn.pyline 73 passes offset through from its caller—this is the primary code path to verifyNo evidence was found of offset parameters being multiplied by element size elsewhere in the codebase, indicating callers are already using byte offsets correctly.
The fix successfully prevents the pointer arithmetic overflow bug for typed pointers (e.g., float32*) where element-scaled arithmetic would cause unintended memory access.
Deeploy/Targets/PULPOpen/TileConstraints/DWConvTileConstraint.py (1)
20-255: RQ and FP DWConv2D tiling constraints look consistent with their respective layouts
RQDWConv2DTileConstraintcorrectly wires NCHW input, CHW weights, and NHWC output/add/mul, and the output-size formula matches standard conv padding/stride behavior given dilation is effectively 1.DWConv2DTileConstraintreusesConv2DTileConstraint’s NHWC geometry and only tightens channel equalities (ch_im_in/ch_im_outon dim 3), which matches the depthwise use-case and the new bindings inPULPOpen/Tiler.py.No issues from my side here.
Deeploy/TilingExtension/AsyncDma.py (1)
201-263: Byte-wise pointer arithmetic fix aligns with legalized DMA offsetsCasting
basePtrtochar *before adding${offset}ensures the offset is applied in bytes rather than elements, which matches the tiling pipeline’s_legalizeTransfersbehavior (offsets/strides already converted to bytes for DMAs). This looks like the correct and minimal fix for the previous overflow/misalignment issue.Deeploy/TilingExtension/TilerModel.py (2)
143-160: Scalartensor.shapehandling in dimension modeling is soundNormalizing
tensor.shapeto a single-element iterable when it’s anintmakesaddTensorDimToModelrobust for scalar buffers without affecting multi-dim tensors. Bounds and variable naming remain unchanged.
171-185: Consistent scalar-shape handling in element-count variableUsing the same
int-vs-sequence normalization inaddTensorNumOfEltToModelkeeps the product logic consistent with the dimension variables and avoids iterating over non-iterable scalar shapes; thenp.prod(tensor.shape)upper bound still behaves correctly for both cases.Deeploy/Targets/PULPOpen/Tiler.py (3)
18-28: Imports align with new DWConv/Slice/ReduceMean tiling pathsThe added PULP bindings and tile-constraint imports (
PULPFloatDWConv2DBindings,PULPSliceBindings,PULPReduceMeanBindings,DWConv2DTileConstraint,ReduceMeanTileConstraint,SliceTileConstraint) match the new TileConstraint implementations and look consistent with the rest of the tiling stack.
43-54: DWConv tiling-ready bindings correctly separate RQ and FP variants
PULPRQSDWConv2DTilingReadyBindingsnow usesRQDWConv2DTileConstraint, matching the requantized depthwise kernel.- New
PULPDWConv2DTilingReadyBindingspairs FP DWConv bindings with the genericDWConv2DTileConstraintbuilt on top ofConv2DTileConstraint.This cleanly splits RQ vs FP DWConv tiling behavior without changing existing Conv2D bindings.
135-145: Slice and ReduceMean are now properly exposed as tiling-ready operations
PULPSliceTilingReadyBindingsandPULPReduceMeanTilingReadyBindingscorrectly plugSliceTileConstraintandReduceMeanTileConstraintinto the PULP Slice/ReduceMean bindings, enabling the new tiling logic and serialization paths for these ops. No issues spotted.Deeploy/Targets/PULPOpen/Platform.py (1)
32-33: Tiling‑ready bindings wiring for PULP ops looks consistentThe re‑wiring of mappers to
*TilingReadyBindingsfor ReduceMean, FP Conv2D/DWConv2D, and Slice (withSliceLayer([SliceMapper, DMASliceMapper])) is coherent and keeps the DMA‑specific Slice path separate viaPULPDMASliceBindings. The mapping table remains backward‑compatible for op names (ReduceMean,IntegerMean,Slice, etc.) while enabling tiling.No structural issues stand out; this should be a drop‑in improvement assuming the new bindings expose the same template fields used by the parsers.
Please run a minimal ONNX with:
- batched ReduceMean (with/without
keepdims)- regular and DW Conv2D (with/without bias)
- standard Slice and DMA Slice
to confirm all mappers hit the intended tiling‑ready bindings in the PULP pipeline.
Also applies to: 38-49, 68-69, 78-79, 93-96, 135-136
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
20-73: Geometrical constraints for batched MatMul look correct and trans‑awareThis refactor to derive
AMatrix*,BMatrix*, andoutputMatrix*from tensor shapes while honoringtransA/transBis solid. The batch constraints conditioned onbufferA.shape[:-2] == bufferB.shape[:-2]nicely avoid over‑constraining broadcasted MatMul while still tying the output’s batch dims to A/B when shapes match.No obvious correctness issues here.
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (1)
22-59: The primary concern aboutparseDict['axes']/['steps']being tensor names is incorrect—normalization occurs before invocation.The review identifies that Generic
SliceParserstoresaxes,steps, andstartsas tensor names but overlooks the normalization step. InDMASliceTemplate.alignToContext()(lines 44-47) and similarly inGeneric/Templates/SliceTemplate.py(lines 33-34), these fields are converted to integer arrays viaoperatorRepresentation['axes'] = axesBuffer.valuesbefore the tiling constraint is invoked. WhenSliceTileConstraintmethods receiveoperatorRepresentation(aliased asparseDict), the values are already integer arrays. The code path is:
- Generic parser: stores tensor names
- Template layer: resolves to actual values
- SliceTileConstraint: receives normalized integer arrays
The remaining points (starts initialization, 4D rank assumption, unused interface parameters) are valid observations but represent minor documentation or hardening suggestions rather than functional defects.
Likely an incorrect or invalid review comment.
Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (4)
182-195: Consider using absolute output offsets for RQConv2D input-cube padding
computeInputCubenow supportsoutputAbsoluteOffsets, but this call still relies oncube.offsetonly. If RQConv2D ever tiles the output with padding, interior tiles may see incorrect per-tile padding (same issue this change fixes for Conv2D). Consider passing the correspondingabsoluteOutputCubes[idx].absoluteOffsethere as well, or confirm that RQConv2D is only used in the single-tile/full-output case.
235-334: Geometrical constraints for grouped, padded Conv2D (with optional bias) look consistentThe refactored constraints using
effectiveInputHeight/Width,dilations, andgroup(includinginputChannelVar == weightInChannelVar * groupand bias-dimension equality) align with standard Conv2D shape relations and should make spatial tiling more robust around padding and grouping. The worst‑case overlap modeling is clearly documented and acceptable until a more precise per-tile memory model is introduced.
339-385: Policy constraints now align with effective input geometry and groupingUsing
effectiveInputHeight/Widthfor minimum kernel coverage and stride compatibility, while enforcing full input channels (inputChannelVar == ch_im_in) andweightInChannelVar * group == ch_im_in, ensures tiles are both geometrically valid and group-consistent. Keeping weights untiled via equality todim_kernel_{x,y}preserves the original kernel semantics.
392-407: Symbolic node representation matches new replacement keysPopulating
symbolicParseDictwithdim_im_in_x,dim_im_in_y,dim_im_out_x, anddim_im_out_ykeeps it in sync with the keys used inserializeTilingSolution’sreplacementsdict, which should avoid missing-variable issues downstream in tiling codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (1)
100-229: Transpose and broadcast tiling logic is correct; Python ≥3.10 confirmed; optional rank asserts recommendedThe refactored
serializeTilingSolutioncorrectly:
- Reads
transAandtransBfromoperatorRepresentation(lines 112–113) and uses them to computeNSizeand matrix offsets/shapes (lines 129–132, 140–147), matching the GEMM tiling pattern fromGEMMTileConstraint.- Handles batch broadcasting by right-aligning batch dimensions of A and B against
data_outand falling back to(offset=0, size=1)when an input batch dimension doesn't match the output.- Enforces "only the last batch dimension may be tiled" via the
offsets[:-3] == 0assert (line 127).- Uses
zip(..., strict=True)safely:pyproject.tomldeclaresrequires-python = ">=3.10", so this is compatible.Optional recommendation:
Add defensive assertions before the batch-alignment loops to document and catch invalid operator shapes early:assert len(buffOut.shape) >= tensorsShapeLenA, \ f"MatMul output rank {len(buffOut.shape)} < A rank {tensorsShapeLenA}" assert len(buffOut.shape) >= tensorsShapeLenB, \ f"MatMul output rank {len(buffOut.shape)} < B rank {tensorsShapeLenB}"This makes the implicit rank contract explicit without changing behavior for valid shapes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py(4 hunks)TargetLibraries/PULPOpen/src/GELU.c(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- TargetLibraries/PULPOpen/src/GELU.c
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.
Applied to files:
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (5)
Deeploy/DeeployTypes.py (2)
lookup(720-752)NetworkContext(508-1020)Deeploy/TilingExtension/TilerModel.py (3)
addTensorDimToModel(143-159)getTensorDimVar(131-135)TilerModel(34-406)Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py (2)
addPolicyConstraint(63-87)addPolicyConstraint(234-258)Deeploy/TilingExtension/TileConstraint.py (2)
addPolicyConstraint(32-36)extractBaseAddr(56-74)Deeploy/TilingExtension/TilingCodegen.py (1)
HyperRectangle(24-35)
🔇 Additional comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/MatMulTileConstraint.py (2)
22-72: Geometric + batch constraints for MatMul look consistent with GEMM semanticsThe refactored
addGeometricalConstraintcorrectly:
- Derives matrix dims from the last two axes and honors
transA/transB.- Enforces
M/O/inner-dim consistency between A, B, anddata_out.- Adds batch-dim equality constraints only when A and B share identical leading shapes, which is a safe choice in the presence of broadcasting.
This aligns well with the GEMM tiling approach and should behave correctly for batched MatMul.
78-91: Policy constraints for N-dimension remain consistent with GEMM behavior
ASecondDimVar/BFirstDimVarare derived from the last two dims withtransA/transBapplied, then both constrained toparseDict['N']. This matches the GEMM policy of not tiling along the inner (N) dimension and looks algebraically equivalent to the existing GEMM implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
117-131: Document or assert the rank ≤ 4 assumption in replacementTypes.The
replacementTypesstructure uses fixed-length lists of 4 pointer types fordata_in_shapeanddata_out_shape, implicitly assuming tensor rank ≤ 4. If ReduceMean is applied to higher-rank tensors, this could lead to incorrect behavior.Consider adding an assertion in the method or a docstring comment documenting this constraint:
# Assert rank assumption assert len(operatorRepresentation['data_in_shape']) <= 4, \ "ReduceMean tiling currently supports tensors with rank ≤ 4" assert len(operatorRepresentation['data_out_shape']) <= 4, \ "ReduceMean tiling currently supports tensors with rank ≤ 4"Or document it in the class/method docstring:
@classmethod def serializeTilingSolution( cls, tilingSolution: NodeMemoryConstraint, absoluteOutputCubes: List[AbsoluteHyperRectangle], targetMemLevel: str, ctxt: NetworkContext, operatorRepresentation: OperatorRepresentation) -> Tuple[VariableReplacementScheme, TilingSchedule]: """ Serialize the tiling solution for ReduceMean. Note: Currently supports tensors with rank ≤ 4 due to replacementTypes structure. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (1)
uint16_t(42-45)Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
61-61: Unused static method argument: tilerModel
(ARG004)
62-62: Unused static method argument: ctxt
(ARG004)
98-98: Unused class method argument: ctxt
(ARG003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
- GitHub Check: siracusa-neureka-models-tiled-singlebuffer-L3 (miniMobileNet, 2000, 8, L3) / test-runner-siracusa-neureka-tiled (2000)
- GitHub Check: siracusa-neureka-kernels-tiled-singlebuffer-L2 / test-runner-siracusa-neureka-tiled
- GitHub Check: siracusa-models-tiled-singlebuffer-L2 (CCT/CCT_1_16_16_8, 2000, 64000, 8) / test-runner-siracusa-tiled (64000)
🔇 Additional comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
22-58: Excellent fix for the keepdims bug!The geometrical constraint logic now correctly handles both
keepdims == 0andkeepdims == 1cases by iterating over input axes while maintaining an output index pointer. This resolves the critical bug flagged in the previous review.The implementation correctly:
- Iterates over input axes (line 42)
- Skips incrementing
output_idxwhen a reduced axis haskeepDims == False(lines 43-50)- Maps non-reduced input dimensions to their corresponding output dimensions (lines 52-56)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
95-157: Serialize logic looks sound; rank and size assumptions could be documented/guardedThe serialization flow—extracting base addresses, deriving
in_cubefrom eachout_cube, and fillingdata_in_shape,data_out_shape, andsize—is coherent and consistent with the tiling abstractions. A couple of assumptions are baked in, though:
replacementTypes["data_in_shape"]/["data_out_shape"]hard‑code fouruint16_tpointers; this effectively assumes ReduceMean is only used with rank ≤ 4 tensors. If higher‑rank use is possible, consider asserting the rank or generalizing the scheme to avoid silent mismatch between dims and replacement types.sizeis stored asuint16_t; if an output tile size can exceed 65535 elements, it will overflow the element count. If that’s not guaranteed impossible by upstream constraints, you might want to assert an upper bound or switch to a wider type.
ctxtis currently unused here as well; if it’s only present for interface consistency, renaming to_ctxtor adding a brief comment would quiet ARG003 without functional changes.
🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (2)
22-58: Geometrical constraint mapping for keepdims=0/1 now looks correct; consider a small sanity checkThe revised loop over input axes with an
output_idxpointer fixes the earlier keepdims bug and keepsaddGeometricalConstraintconsistent withcomputeInputCubeFromOutputCube. To catch mis‑specifiedaxes/shape combinations early, you may want to assert at the end thatoutput_idxmatches the expected output rank (e.g.len(parseDict["data_out_shape"])or the context tensor rank).
60-65: UnusedtilerModel/ctxtparameters inconstructSymbolicNodeRepThis method currently just clones
parseDict, sotilerModelandctxtstay unused and trigger Ruff’s ARG004 warning. If they’re required by the interface but intentionally unused, consider renaming them to_tilerModel/_ctxtor adding a short comment/pragma to silence the warning without changing behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
🧬 Code graph analysis (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (1)
uint16_t(42-45)Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
61-61: Unused static method argument: tilerModel
(ARG004)
62-62: Unused static method argument: ctxt
(ARG004)
98-98: Unused class method argument: ctxt
(ARG003)
🔇 Additional comments (1)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
67-93: Type annotation and cube reconstruction logic are consistent with call siteChanging
outputCubetoHyperRectanglematches the actual argument (cube.rectangle) inserializeTilingSolution, and the input‑cube reconstruction correctly (a) copies full extents along reduced axes and (b) uses the output cube’s dims/offsets only for non‑reduced axes with a sharedout_idx. This aligns with the geometrical constraints and avoids the previous type mismatch.
Xeratec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I also appreciate the comments you added in the tile constraints.
I only have a few minor comments.
…nstraints, according to Issue pulp-platform#134
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (2)
61-67:constructSymbolicNodeRepignorestilerModel/ctxtand just copiesparseDictRight now this method is effectively a no-op wrapper around
parseDict.copy()and does not usetilerModelorctxt, which matches Ruff’s unused-argument warnings. If the interface requires these parameters, consider prefixing them with_or adding a brief comment that this node has no symbolic fields to construct yet, to make the intent explicit and quiet linters.
97-180: Minor clarity issues inserializeTilingSolution(unusedctxt, deadnew_endspath)
- Parameter
ctxtis unused, matching the Ruff warning; likely required by the base interface, but adding_ = ctxtor a short comment would clarify intent and silence linting.new_endsis computed (Lines 155-158) but only used in commented-out replacement code; this is slightly misleading. Either remove its computation while tiling is frozen or add a TODO referencing issue #134 so it’s clear this is intentional.These are non-functional and can be adjusted when convenient.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
Deeploy/Targets/PULPOpen/Bindings.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py(1 hunks)Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py(1 hunks)TargetLibraries/PULPOpen/src/DWConvolution_fp32.c(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/DMA/MchanDma.py:61-64
Timestamp: 2025-09-09T15:58:06.454Z
Learning: The tiling pipeline in Deeploy handles unit conversion and normalization through functions like _legalizeTransfers, ensuring that DMA implementations receive properly formatted transfer parameters without needing to perform manual element-to-byte conversions.
📚 Learning: 2025-09-09T15:43:20.195Z
Learnt from: Xeratec
Repo: pulp-platform/Deeploy PR: 105
File: Deeploy/Targets/PULPOpen/TileConstraints/GEMMTileConstraint.py:120-124
Timestamp: 2025-09-09T15:43:20.195Z
Learning: In GEMMTileConstraint.serializeTilingSolution, transpose flags (transA, transB) must be read from operatorRepresentation and used to adjust NSize calculation and matrix offset/shape calculations, following the pattern in FloatGEMMTileConstraint.
Applied to files:
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.pyDeeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
📚 Learning: 2025-09-24T12:53:00.168Z
Learnt from: diaconuccalin
Repo: pulp-platform/Deeploy PR: 117
File: TargetLibraries/PULPOpen/src/DWConvolution_fp32.c:0-0
Timestamp: 2025-09-24T12:53:00.168Z
Learning: In PULP platform code, log2 is defined in pulp_nn_utils and can be used safely without requiring math.h or causing floating-point build/runtime issues, unlike the standard library log2 function.
Applied to files:
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c
🧬 Code graph analysis (3)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (4)
Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (6)
Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)Deeploy/CommonExtensions/DataTypes.py (1)
uint16_t(42-45)Deeploy/TilingExtension/MemoryConstraints.py (1)
NodeMemoryConstraint(95-167)Deeploy/TilingExtension/TileConstraint.py (2)
TileConstraint(19-221)extractBaseAddr(56-74)Deeploy/TilingExtension/TilerModel.py (3)
TilerModel(34-406)addTensorDimToModel(143-159)getTensorDimVar(131-135)Deeploy/TilingExtension/TilingCodegen.py (4)
AbsoluteHyperRectangle(39-49)HyperRectangle(24-35)TilingSchedule(53-122)VariableReplacementScheme(126-158)
Deeploy/Targets/PULPOpen/Bindings.py (2)
Deeploy/Targets/Generic/TypeCheckers.py (1)
SliceChecker(37-51)Deeploy/AbstractDataTypes.py (1)
PointerClass(536-559)
🪛 Ruff (0.14.5)
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py
61-61: Unused static method argument: tilerModel
(ARG004)
62-62: Unused static method argument: ctxt
(ARG004)
98-98: Unused class method argument: ctxt
(ARG003)
Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py
62-62: Unused static method argument: tilerModel
(ARG004)
63-63: Unused static method argument: ctxt
(ARG004)
100-100: Unused class method argument: ctxt
(ARG003)
🔇 Additional comments (5)
TargetLibraries/PULPOpen/src/DWConvolution_fp32.c (1)
21-21: LGTM! Good optimization using PULP-specific macro.Replacing the runtime
log2()function with theLOG2()macro eliminates runtime overhead and aligns with PULP platform best practices. This is consistent with patterns used in other PULP kernels.Based on learnings and past review comments.
Deeploy/Targets/PULPOpen/TileConstraints/ReduceMeanConstraint.py (1)
22-93: Previous bugs successfully resolved.Both critical issues from earlier reviews have been correctly addressed:
- The
addGeometricalConstraintmethod (lines 22-58) now properly iterates over input axes while maintaining anoutput_idxpointer, correctly handling bothkeepdims=Trueandkeepdims=Falsecases.- The
computeInputCubeFromOutputCubetype annotation (line 68) now correctly specifiesHyperRectangleinstead ofAbsoluteHyperRectangle, matching both the caller (line 142) and the implementation's use of.dimsand.offsetattributes.The logic is now consistent and correct.
Deeploy/Targets/PULPOpen/Bindings.py (1)
157-167: Refactor suggestion valid; kernel support requires manual verificationVerification confirms your analysis:
SliceCheckerdoes infer signedness and levels from the first input only (line 37-51 of Generic TypeCheckers), the template itself is type-agnostic with no hard-coded integer types, anditertoolsis already imported and used elsewhere (PULPMulBindings).The new
PULPSliceBindingsexpands index type support from the fixeduint8_tused in Generic and DMA variants to allIntegerDataTypes, which is consistent with your pattern observation. However, the actual kernel implementations cannot be located in this search, so I cannot definitively confirm they support all integer index widths—your verification recommendation stands.The
itertools.productrefactor suggestion is valid and improves consistency with existing code; it's already the pattern inPULPMulBindings.Deeploy/Targets/PULPOpen/TileConstraints/SliceConstraint.py (2)
69-95: Type annotation ofcomputeInputCubeFromOutputCubeis inconsistent with actual usageThe method is annotated with
outputCube: AbsoluteHyperRectangle, but it is always called withHyperRectangleinstances (outputCubes = [cube.rectangle for cube in absoluteOutputCubes]), and inside it accesses.dimsand.offset, which only exist onHyperRectangle. This is a type-level inconsistency that may confuse readers and tooling even though runtime behavior is correct.Consider updating the signature to:
- def computeInputCubeFromOutputCube(outputCube: AbsoluteHyperRectangle, parseDict: Dict) -> HyperRectangle: + def computeInputCubeFromOutputCube(outputCube: HyperRectangle, parseDict: Dict) -> HyperRectangle:to align types with actual usage.
[ suggest_recommended_refactor ]
22-59: Constraint formula indeed assumes positive steps; ONNX Slice spec supports negative stepsThe ONNX Slice operator specification supports negative steps, where negative values mean slicing backward. However, your constraint formula
inputDimensionVar == ((outputDimensionVar - 1) * axStep + 1)at line 54 is only valid for positiveaxStep. Additionally, the loop template inSliceTemplate.pyusesfor(uint32_t i_${axis} = ${start}; i_${axis} < ${end}; i_${axis} += ${step}), which assumes unsigned arithmetic and will fail or produce incorrect behavior with negative steps.There is no validation in
SliceParserto reject or handle negative steps. If your models use negative steps for backward slicing, this implementation will silently produce incorrect constraints or incorrect loop behavior. Consider adding explicit validation or extending the constraint logic to handle both positive and negative steps.
Xeratec
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for you valuable work <3
Key Changes:
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.