Skip to content

Add rendering functionals#1618

Open
loliverhennigh wants to merge 10 commits into
NVIDIA:mainfrom
loliverhennigh:feature/rendering-functionals
Open

Add rendering functionals#1618
loliverhennigh wants to merge 10 commits into
NVIDIA:mainfrom
loliverhennigh:feature/rendering-functionals

Conversation

@loliverhennigh
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Rendering Kernels

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR adds a new physicsnemo.nn.functional.rendering package with eight GPU-accelerated rendering primitives (isosurface ray-marching, volume rendering, mesh ray-casting, LIC, point cloud, wireframe, and scalar/vector-field RGBA transfer functions) backed by custom Warp kernels and optional PyTorch fallbacks. The FunctionSpec dispatch pattern and test coverage are solid, but several correctness and portability issues in the core Warp implementation file need attention before merging.

  • TOCTOU race in wireframe rasterization_write_depth_tested_pixel uses wp.atomic_min for depth but writes color non-atomically; two threads at similar depths can both pass the depth test and write their colors in an unordered way, producing incorrect results.
  • wp.init() at module import time — calling Warp initialization unconditionally at the top of _warp_impl.py causes every import of the rendering package to initialize the GPU, breaking CPU-only and Warp-free environments even when only the PyTorch backends are needed.
  • GPU validation gaps — camera and bounds sanity checks in _camera_basis and _bounds_tensor are skipped on CUDA devices, so degenerate inputs (eye == center, up parallel to view, invalid bounds) silently produce corrupted renders on GPU.

Important Files Changed

Filename Overview
physicsnemo/nn/functional/rendering/_warp_impl.py Core Warp kernel file; contains a TOCTOU race in _write_depth_tested_pixel, module-level wp.init() at import time, GPU-only validation gaps in _camera_basis/_bounds_tensor, a hardcoded 8192-step cap for line rasterization, and no near-plane clipping for wireframe edges.
physicsnemo/nn/functional/rendering/_torch_impl.py Pure-PyTorch fallback implementations for scalar_field_to_rgba and vector_field_to_rgba; logic is clean and correct.
physicsnemo/nn/functional/rendering/isosurface_render.py FunctionSpec wrapper for isosurface ray-marching; delegates to warp backend, validation and make_inputs_forward look correct.
physicsnemo/nn/functional/rendering/mesh_raycast.py FunctionSpec wrapper for BVH mesh ray-casting; input validation and dispatch are correct.
physicsnemo/nn/functional/rendering/wireframe_render.py Thin FunctionSpec wrapper for wireframe rasterization; the underlying kernel issues are in _warp_impl.py.
physicsnemo/nn/functional/rendering/point_cloud_render.py FunctionSpec wrapper for two-pass point cloud rasterization; uses race-safe atomic key scheme unlike wireframe.
physicsnemo/nn/functional/rendering/volume_render.py VolumeRender FunctionSpec wrapper; front-to-back compositing logic is correct.
physicsnemo/nn/functional/rendering/scalar_field_to_rgba.py ScalarFieldToRGBA with warp and torch backends; compare_forward tolerance looks appropriate.
physicsnemo/nn/functional/rendering/vector_field_to_rgba.py VectorFieldToRGBA with warp and torch backends; logic mirrors scalar_field_to_rgba correctly.
physicsnemo/nn/functional/rendering/line_integral_convolution.py LineIntegralConvolution FunctionSpec wrapper; delegates to warp kernel, looks correct.
physicsnemo/nn/functional/rendering/init.py Exports all rendering classes and functional aliases; all is complete and consistent.

Reviews (1): Last reviewed commit: "Add rendering functionals" | Re-trigger Greptile

Comment on lines +730 to +744
def _write_depth_tested_pixel(
x: int,
y: int,
z: wp.float32,
color: wp.vec4,
width: int,
height: int,
rgba: wp.array(dtype=wp.vec4),
depth: wp.array(dtype=wp.float32),
):
if x >= 0 and x < width and y >= 0 and y < height:
index = y * width + x
old_depth = wp.atomic_min(depth, index, z)
if z <= old_depth:
rgba[index] = color
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 TOCTOU race between depth test and color write

wp.atomic_min establishes the winning depth value atomically, but the subsequent color write is unprotected. Two threads rendering overlapping wireframe segments at nearly identical depths can both satisfy z <= old_depth (both see the unmodified value before the other's atomic_min completes), and then both write to rgba[index] in an unordered manner. The color stored in the buffer will correspond to whichever thread wrote last, which need not match the thread that actually holds the minimum depth after both atomic_min calls settle. For the point cloud path the two-phase key/resolve scheme avoids this problem; the same pattern should be applied to wireframe pixels, or the atomic operation should cover the combined depth-color update.

Comment on lines +27 to +28
wp.init()
wp.config.quiet = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 wp.init() executed at module import time

Calling wp.init() at the top level of _warp_impl.py means it runs the moment any rendering submodule is imported — including users who only want the pure-PyTorch ScalarFieldToRGBA or VectorFieldToRGBA torch backends. In environments without CUDA (CI runners, CPU-only inference containers) this will either error out or trigger slow device enumeration. The initialization should be deferred to first use, e.g., inside the @torch.library.custom_op implementations or behind a lazy try/except guard.

Comment on lines +66 to +74
if forward_raw.device.type == "cpu" and bool(
(forward_raw.norm() <= 1.0e-12).item()
):
raise ValueError("eye and center must not be equal")
forward = _normalize_torch(forward_raw)
up_hint = _normalize_torch(up)
right_raw = torch.linalg.cross(up_hint, forward, dim=0)
if right_raw.device.type == "cpu" and bool((right_raw.norm() <= 1.0e-12).item()):
raise ValueError("up must not be parallel to the camera direction")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Input validation skipped on CUDA devices

Both _camera_basis and _bounds_tensor guard their invariant checks with if forward_raw.device.type == "cpu", so on CUDA the checks are never performed. Passing eye == center (degenerate forward vector), up parallel to the view direction (degenerate right vector), or bounds_max <= bounds_min in any dimension will silently produce NaN or zero vectors that propagate through the kernel, yielding a black/corrupt render with no diagnostic. The same CPU-only pattern is repeated in _bounds_tensor (line 88). The validation should be unconditional, or the tensors should be moved to CPU just for the check.

Comment on lines +848 to +869
for step in range(8192):
if step > steps:
break
alpha = wp.float32(0.0)
if steps > 0:
alpha = wp.float32(step) / wp.float32(steps)
z = z0 * (1.0 - alpha) + z1 * alpha
for oy in range(-radius, radius + 1):
for ox in range(-radius, radius + 1):
_write_depth_tested_pixel(
x + ox, y + oy, z, color, width, height, rgba, depth
)

if x == x1 and y == y1:
break
e2 = 2 * err
if e2 > -dy:
err -= dy
x += sx
if e2 < dx:
err += dx
y += sy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded range(8192) cap silently truncates long line segments

_draw_line_depth_tested loops for step in range(8192) and breaks when step > steps. For an edge whose projected screen-space extent spans more than 8192 pixels (e.g., a near-clipped edge at very high resolution or a very small FOV), the line will be silently cut off partway. The limit is invisible to callers and produces no warning. Consider deriving the cap from max(image_width, image_height) passed into the kernel, or at minimum documenting the constraint.

s0 = _project_point(p0, camera, width, height, tan_half_fov, aspect)
s1 = _project_point(p1, camera, width, height, tan_half_fov, aspect)

if s0[2] <= near or s0[2] >= far or s1[2] <= near or s1[2] >= far:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Wireframe clips entire edge when any endpoint is outside the clip range

_wireframe_render_kernel returns early if either endpoint has s[2] <= near or s[2] >= far. A segment where one vertex is behind the camera and the other is in front will be silently dropped entirely rather than clipped. For scenes with edges that straddle the near plane this produces visually missing lines. The standard fix is to perform a parametric clip of the segment against the near/far planes before rasterizing.

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

1 similar comment
@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Copy link
Copy Markdown
Collaborator

@megnvidia megnvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

@loliverhennigh
Copy link
Copy Markdown
Collaborator Author

/blossom-ci

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants