Refactor tensor class in C++ unit tests#2962
Conversation
Refactor test tensor wrapper by removing recipe-specific logic whenever possible. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
- Fix syntax error in switch case (:: -> :) - Fix double-underscore typo in variable name - Fix wrong buffer passed to set_amax_columnwise - Fix unique_ptr assignment from raw pointer (use reset()) - Remove dead duplicate NVTE_MXFP8_1D_SCALING branch in get_scales() - Rename cpu_data -> cpu_buffer to match Buffer class API - Remove const from Tensor::to_cpu/from_cpu and their callers, since both methods write to the CPU buffer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
CPU and GPU types are inconsistent, so the type checks cause too many problems. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Greptile SummaryThis PR refactors the tensor wrapper in the C++ unit test infrastructure by introducing a dedicated
Confidence Score: 5/5Safe to merge; changes are confined to the C++ test infrastructure with no production code paths affected. All substantive logic is in test helpers. The refactoring cleanly separates memory management from scaling-mode logic, the Buffer RAII class handles all allocation/transfer paths correctly, and the pre-capture of ref_scale before the kernel launch is a genuine correctness improvement. The remaining observations are diagnostic-quality items in the test layer. No files require special attention; the two minor items are both in test-only code. Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class Tensor {
-Buffer data_rowwise_
-Buffer data_columnwise_
-Buffer scale_inv_rowwise_
-Buffer scale_inv_columnwise_
-Buffer amax_rowwise_
-Buffer amax_columnwise_
-Buffer scale_
-TensorWrapper tensor_
-bool rowwise_
-bool columnwise_
+rowwise_cpu_dptr() T*
+columnwise_cpu_dptr() T*
+scale() float
+amax() float
+rowwise_scale_inv() float
+set_scale(float)
+set_scale_inv(float)
+to_cpu()
+from_cpu()
}
class Buffer {
-unique_ptr cpu_buffer_
-unique_ptr gpu_buffer_
-size_t size_
-DType dtype_
-size_t bytes_
+cpu_buffer() T*
+gpu_buffer() T*
+to_cpu()
+from_cpu()
+size() size_t
+dtype() DType
}
class GPUDeleter {
+operator()(void*) void
}
Tensor *-- Buffer : owns 7 buffers
Buffer *-- GPUDeleter : uses
Tensor *-- TensorWrapper : wraps
Reviews (3): Last reviewed commit: "Copy-paste error" | Re-trigger Greptile |
Also adopt review suggestions from @greptile-apps. Signed-off-by: Tim Moon <tmoon@nvidia.com>
| // Fill scales | ||
| if (t->scaling_mode() == NVTE_DELAYED_TENSOR_SCALING) { | ||
| if (isFp8Type(t->dtype())) { | ||
| // FP8 tensor scale is set to 1 | ||
| t->set_scale_inv(1.0); | ||
| } | ||
| } else { | ||
| // Block scales are filled randomly | ||
| t->fill_uniform_rowwise_scale_inv(); | ||
| t->fill_uniform_columnwise_scale_inv(); | ||
| } |
There was a problem hiding this comment.
This is weird, but it approximates the previous behavior.
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
|
/te-ci core L1 |
Oleg-Goncharov
left a comment
There was a problem hiding this comment.
LGTM, this looks much cleaner now, but the cast+transpose current scaling tests are failing with a segmentation fault.
Description
The tensor wrapper in the C++ unit tests has become unwieldy, with complicated interactions between recipes and memory management. This has recently resulted in bugs where we accidently didn't allocate a required buffer (#2943). This PR disentangles the memory management from the recipe logic by adding a simple RAII class to manage GPU and CPU buffers. I've also added more explicit checks, e.g. when we assume a tensor is a single FP32.
Type of change
Changes
Checklist: