Add rendering functionals by loliverhennigh · Pull Request #1618 · NVIDIA/physicsnemo

loliverhennigh · 2026-05-05T17:21:03Z

PhysicsNeMo Pull Request

Description

Rendering Kernels

copy-pr-bot · 2026-05-05T17:21:07Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-05-05T17:24:23Z

Greptile Summary

This PR adds a new physicsnemo.nn.functional.rendering package with eight GPU-accelerated rendering primitives (isosurface ray-marching, volume rendering, mesh ray-casting, LIC, point cloud, wireframe, and scalar/vector-field RGBA transfer functions) backed by custom Warp kernels and optional PyTorch fallbacks. The FunctionSpec dispatch pattern and test coverage are solid, but several correctness and portability issues in the core Warp implementation file need attention before merging.

TOCTOU race in wireframe rasterization — _write_depth_tested_pixel uses wp.atomic_min for depth but writes color non-atomically; two threads at similar depths can both pass the depth test and write their colors in an unordered way, producing incorrect results.
wp.init() at module import time — calling Warp initialization unconditionally at the top of _warp_impl.py causes every import of the rendering package to initialize the GPU, breaking CPU-only and Warp-free environments even when only the PyTorch backends are needed.
GPU validation gaps — camera and bounds sanity checks in _camera_basis and _bounds_tensor are skipped on CUDA devices, so degenerate inputs (eye == center, up parallel to view, invalid bounds) silently produce corrupted renders on GPU.

Important Files Changed

Filename	Overview
physicsnemo/nn/functional/rendering/_warp_impl.py	Core Warp kernel file; contains a TOCTOU race in _write_depth_tested_pixel, module-level wp.init() at import time, GPU-only validation gaps in _camera_basis/_bounds_tensor, a hardcoded 8192-step cap for line rasterization, and no near-plane clipping for wireframe edges.
physicsnemo/nn/functional/rendering/_torch_impl.py	Pure-PyTorch fallback implementations for scalar_field_to_rgba and vector_field_to_rgba; logic is clean and correct.
physicsnemo/nn/functional/rendering/isosurface_render.py	FunctionSpec wrapper for isosurface ray-marching; delegates to warp backend, validation and make_inputs_forward look correct.
physicsnemo/nn/functional/rendering/mesh_raycast.py	FunctionSpec wrapper for BVH mesh ray-casting; input validation and dispatch are correct.
physicsnemo/nn/functional/rendering/wireframe_render.py	Thin FunctionSpec wrapper for wireframe rasterization; the underlying kernel issues are in _warp_impl.py.
physicsnemo/nn/functional/rendering/point_cloud_render.py	FunctionSpec wrapper for two-pass point cloud rasterization; uses race-safe atomic key scheme unlike wireframe.
physicsnemo/nn/functional/rendering/volume_render.py	VolumeRender FunctionSpec wrapper; front-to-back compositing logic is correct.
physicsnemo/nn/functional/rendering/scalar_field_to_rgba.py	ScalarFieldToRGBA with warp and torch backends; compare_forward tolerance looks appropriate.
physicsnemo/nn/functional/rendering/vector_field_to_rgba.py	VectorFieldToRGBA with warp and torch backends; logic mirrors scalar_field_to_rgba correctly.
physicsnemo/nn/functional/rendering/line_integral_convolution.py	LineIntegralConvolution FunctionSpec wrapper; delegates to warp kernel, looks correct.
physicsnemo/nn/functional/rendering/init.py	Exports all rendering classes and functional aliases; all is complete and consistent.

_{Reviews (1): Last reviewed commit: "Add rendering functionals" | Re-trigger Greptile}

greptile-apps · 2026-05-05T17:24:27Z

+def _write_depth_tested_pixel(
+    x: int,
+    y: int,
+    z: wp.float32,
+    color: wp.vec4,
+    width: int,
+    height: int,
+    rgba: wp.array(dtype=wp.vec4),
+    depth: wp.array(dtype=wp.float32),
+):
+    if x >= 0 and x < width and y >= 0 and y < height:
+        index = y * width + x
+        old_depth = wp.atomic_min(depth, index, z)
+        if z <= old_depth:
+            rgba[index] = color


TOCTOU race between depth test and color write

wp.atomic_min establishes the winning depth value atomically, but the subsequent color write is unprotected. Two threads rendering overlapping wireframe segments at nearly identical depths can both satisfy z <= old_depth (both see the unmodified value before the other's atomic_min completes), and then both write to rgba[index] in an unordered manner. The color stored in the buffer will correspond to whichever thread wrote last, which need not match the thread that actually holds the minimum depth after both atomic_min calls settle. For the point cloud path the two-phase key/resolve scheme avoids this problem; the same pattern should be applied to wireframe pixels, or the atomic operation should cover the combined depth-color update.

greptile-apps · 2026-05-05T17:24:28Z

+wp.init()
+wp.config.quiet = True


wp.init() executed at module import time

Calling wp.init() at the top level of _warp_impl.py means it runs the moment any rendering submodule is imported — including users who only want the pure-PyTorch ScalarFieldToRGBA or VectorFieldToRGBA torch backends. In environments without CUDA (CI runners, CPU-only inference containers) this will either error out or trigger slow device enumeration. The initialization should be deferred to first use, e.g., inside the @torch.library.custom_op implementations or behind a lazy try/except guard.

greptile-apps · 2026-05-05T17:24:29Z

+    if forward_raw.device.type == "cpu" and bool(
+        (forward_raw.norm() <= 1.0e-12).item()
+    ):
+        raise ValueError("eye and center must not be equal")
+    forward = _normalize_torch(forward_raw)
+    up_hint = _normalize_torch(up)
+    right_raw = torch.linalg.cross(up_hint, forward, dim=0)
+    if right_raw.device.type == "cpu" and bool((right_raw.norm() <= 1.0e-12).item()):
+        raise ValueError("up must not be parallel to the camera direction")


Input validation skipped on CUDA devices

Both _camera_basis and _bounds_tensor guard their invariant checks with if forward_raw.device.type == "cpu", so on CUDA the checks are never performed. Passing eye == center (degenerate forward vector), up parallel to the view direction (degenerate right vector), or bounds_max <= bounds_min in any dimension will silently produce NaN or zero vectors that propagate through the kernel, yielding a black/corrupt render with no diagnostic. The same CPU-only pattern is repeated in _bounds_tensor (line 88). The validation should be unconditional, or the tensors should be moved to CPU just for the check.

greptile-apps · 2026-05-05T17:24:30Z

+    for step in range(8192):
+        if step > steps:
+            break
+        alpha = wp.float32(0.0)
+        if steps > 0:
+            alpha = wp.float32(step) / wp.float32(steps)
+        z = z0 * (1.0 - alpha) + z1 * alpha
+        for oy in range(-radius, radius + 1):
+            for ox in range(-radius, radius + 1):
+                _write_depth_tested_pixel(
+                    x + ox, y + oy, z, color, width, height, rgba, depth
+                )
+
+        if x == x1 and y == y1:
+            break
+        e2 = 2 * err
+        if e2 > -dy:
+            err -= dy
+            x += sx
+        if e2 < dx:
+            err += dx
+            y += sy


Hardcoded range(8192) cap silently truncates long line segments

_draw_line_depth_tested loops for step in range(8192) and breaks when step > steps. For an edge whose projected screen-space extent spans more than 8192 pixels (e.g., a near-clipped edge at very high resolution or a very small FOV), the line will be silently cut off partway. The limit is invisible to callers and produces no warning. Consider deriving the cap from max(image_width, image_height) passed into the kernel, or at minimum documenting the constraint.

greptile-apps · 2026-05-05T17:24:31Z

+    s0 = _project_point(p0, camera, width, height, tan_half_fov, aspect)
+    s1 = _project_point(p1, camera, width, height, tan_half_fov, aspect)
+
+    if s0[2] <= near or s0[2] >= far or s1[2] <= near or s1[2] >= far:


Wireframe clips entire edge when any endpoint is outside the clip range

_wireframe_render_kernel returns early if either endpoint has s[2] <= near or s[2] >= far. A segment where one vertex is behind the camera and the other is in front will be silently dropped entirely rather than clipped. For scenes with edges that straddle the near plane this produces visually missing lines. The standard fix is to perform a parametric clip of the segment against the near/far planes before rasterizing.

loliverhennigh · 2026-05-06T18:28:03Z

/blossom-ci

…unctionals

loliverhennigh · 2026-05-06T18:51:02Z

/blossom-ci

loliverhennigh · 2026-05-06T20:40:54Z

/blossom-ci

…unctionals

loliverhennigh · 2026-05-07T06:06:02Z

/blossom-ci

loliverhennigh · 2026-05-07T08:10:03Z

/blossom-ci

loliverhennigh · 2026-05-07T22:34:47Z

/blossom-ci

loliverhennigh · 2026-05-07T22:45:12Z

/blossom-ci

megnvidia

lgtm

…unctionals

loliverhennigh · 2026-05-08T18:20:17Z

/blossom-ci

…unctionals

loliverhennigh · 2026-05-11T19:02:09Z

/blossom-ci

Add rendering functionals

ebabe20

loliverhennigh requested review from ktangsali and megnvidia as code owners May 5, 2026 17:21

greptile-apps Bot reviewed May 5, 2026

View reviewed changes

Move rendering examples under functional examples

df05f9a

loliverhennigh added 2 commits May 6, 2026 11:50

Document rendering functional helpers

5094323

Merge remote-tracking branch 'upstream/main' into feature/rendering-f…

e5fdd97

…unctionals

Merge remote-tracking branch 'upstream/main' into feature/rendering-f…

9733a46

…unctionals

Align isosurface error test with CUDA validation

16ee6af

Align rendering functionals with functional layout

d420dea

Move rendering backends into functional packages

fe6ff58

megnvidia approved these changes May 8, 2026

View reviewed changes

Merge remote-tracking branch 'upstream/main' into feature/rendering-f…

2f9dbec

…unctionals

Merge remote-tracking branch 'upstream/main' into feature/rendering-f…

7b94ba2

…unctionals

Conversation

loliverhennigh commented May 5, 2026

PhysicsNeMo Pull Request

Description

Uh oh!

copy-pr-bot Bot commented May 5, 2026

Uh oh!

greptile-apps Bot commented May 5, 2026

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot May 5, 2026

Choose a reason for hiding this comment

Uh oh!

loliverhennigh commented May 6, 2026

Uh oh!

loliverhennigh commented May 6, 2026

Uh oh!

loliverhennigh commented May 6, 2026

Uh oh!

loliverhennigh commented May 7, 2026

Uh oh!

loliverhennigh commented May 7, 2026

Uh oh!

loliverhennigh commented May 7, 2026

Uh oh!

loliverhennigh commented May 7, 2026

Uh oh!

megnvidia left a comment

Choose a reason for hiding this comment

Uh oh!

loliverhennigh commented May 8, 2026

Uh oh!

loliverhennigh commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants