Fix EGL context creation on headless NVIDIA (EGL_BAD_ACCESS)#13332
Fix EGL context creation on headless NVIDIA (EGL_BAD_ACCESS)#13332sam-kpm wants to merge 3 commits intoComfy-Org:masterfrom
Conversation
On headless Linux with NVIDIA GPUs and no display server, eglInitialize() with EGL_DEFAULT_DISPLAY fails with EGL_BAD_ACCESS. The fix falls back to EGL_EXT_platform_device: enumerate EGL devices and obtain a display via eglGetPlatformDisplayEXT(EGL_PLATFORM_DEVICE_EXT, ...). PyOpenGL's egl_get_devices() wrapper doesn't reliably resolve the eglQueryDevicesEXT function pointer in this scenario, so both functions are called directly from libEGL.so.1 via ctypes. Also handles the case where eglInitialize raises EGLError rather than returning False, which varies by PyOpenGL version and EGL vendor.
📝 WalkthroughWalkthroughAdded 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy_extras/nodes_glsl.py (1)
256-259: Consider usingc_uint32instead ofc_boolfor the EGLboolean return type ofeglQueryDevicesEXT.EGLboolean is defined as
unsigned int(32-bit) in the EGL specification, whereasctypes.c_boolmaps to C99_Bool(typically 1 byte). While this works in practice due to calling conventions, usingc_uint32is more semantically correct and matches the actual EGL header definition.🔧 Suggested fix
_query_devices = ctypes.CFUNCTYPE( - ctypes.c_bool, + ctypes.c_uint32, ctypes.c_int32, ctypes.POINTER(ctypes.c_void_p), ctypes.POINTER(ctypes.c_int32), )(_query_devices_ptr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy_extras/nodes_glsl.py` around lines 256 - 259, The EGL function wrapper _query_devices currently uses ctypes.c_bool for the EGLboolean return, but EGLboolean is a 32-bit unsigned int; update the CFUNCTYPE signature for _query_devices (and any similar wrappers like eglQueryDevicesEXT) to use ctypes.c_uint32 as the return type instead of ctypes.c_bool so the ctypes signature matches the EGL header and avoids size/mapping mismatches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 256-259: The EGL function wrapper _query_devices currently uses
ctypes.c_bool for the EGLboolean return, but EGLboolean is a 32-bit unsigned
int; update the CFUNCTYPE signature for _query_devices (and any similar wrappers
like eglQueryDevicesEXT) to use ctypes.c_uint32 as the return type instead of
ctypes.c_bool so the ctypes signature matches the EGL header and avoids
size/mapping mismatches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9b95a465-80c0-4b95-bb53-9e32ac03729b
📒 Files selected for processing (1)
comfy_extras/nodes_glsl.py
- Extract device enumeration into _egl_device_display() helper
- Use ctypes.util.find_library("EGL") instead of hardcoded libEGL.so.1
- Fix eglGetDisplay(EGL_DEFAULT_DISPLAY) failure also falling through to
device enumeration (previously raised immediately, skipping the fallback)
- Two-pass eglQueryDevicesEXT to avoid arbitrary device count cap
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 245-254: The current code uses raw_devices[0] and only attempts
the first EGL device; change this to iterate over raw_devices and try each
device in turn by calling _get_platform_display(EGL_PLATFORM_DEVICE_EXT, device,
None), casting result to ctypes.c_void_p as display, then calling
eglInitialize(display, major, minor) for each until one returns true; on first
successful eglInitialize stop and use that display, and if none succeed raise a
RuntimeError indicating initialization failed for all enumerated devices
(include device info if available) — update references in this block for
EGL_PLATFORM_DEVICE_EXT, _get_platform_display, raw_devices, display, and
eglInitialize.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91f5bc7b-70e5-4eea-9db3-bc404500666a
📒 Files selected for processing (1)
comfy_extras/nodes_glsl.py
- Use c_uint32 for EGLboolean return type (unsigned int per EGL spec, not _Bool) - Try all enumerated EGL devices in order rather than only the first; skip devices where eglGetPlatformDisplayEXT or eglInitialize fails
There was a problem hiding this comment.
🧹 Nitpick comments (1)
comfy_extras/nodes_glsl.py (1)
200-264: Well-implemented EGL device enumeration fallback.The implementation correctly addresses the headless NVIDIA issue by:
- Loading EGL extensions directly via ctypes when PyOpenGL's wrapper is unreliable
- Using
c_uint32for EGLboolean return type per EGL spec- Iterating all enumerated devices rather than just the first (addresses the prior review feedback)
One minor suggestion: the exception handler at lines 260-261 silently discards the exception. Logging it at debug level would help diagnose edge cases where
eglInitializeraises instead of returning False.,
Optional: Log caught exception for debugging
try: if eglInitialize(display, major, minor): logger.debug(f"_egl_device_display: device {i} succeeded, EGL version {major.value}.{minor.value}") return display, major, minor - except Exception: - pass + except Exception as e: + logger.debug(f"_egl_device_display: device {i} eglInitialize raised {type(e).__name__}: {e}") logger.debug(f"_egl_device_display: device {i} eglInitialize failed, skipping")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@comfy_extras/nodes_glsl.py` around lines 200 - 264, In _egl_device_display, don't silently swallow exceptions from the eglInitialize call; change the except block that currently catches Exception and passes to log the exception at debug level (include the device index and exception info) so you can diagnose failures where eglInitialize raises instead of returning False; update the except Exception handler around the call to eglInitialize(display, major, minor) to capture the exception as e and call logger.debug (or logger.debug(..., exc_info=True)) with a short message referencing the device index and the exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@comfy_extras/nodes_glsl.py`:
- Around line 200-264: In _egl_device_display, don't silently swallow exceptions
from the eglInitialize call; change the except block that currently catches
Exception and passes to log the exception at debug level (include the device
index and exception info) so you can diagnose failures where eglInitialize
raises instead of returning False; update the except Exception handler around
the call to eglInitialize(display, major, minor) to capture the exception as e
and call logger.debug (or logger.debug(..., exc_info=True)) with a short message
referencing the device index and the exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: caf4ff88-d37f-45b1-b643-ac94e116fb43
📒 Files selected for processing (1)
comfy_extras/nodes_glsl.py
Problem
On headless Linux with an NVIDIA GPU and no display server (no
$DISPLAY/$WAYLAND_DISPLAY), the GLSL shader node fails with:This is a common setup: cloud VMs, remote GPU servers, and Docker containers with NVIDIA GPUs typically have no display server.
Root cause:
eglInitialize(EGL_DEFAULT_DISPLAY)requires a running X or Wayland compositor. On a bare headless system, NVIDIA's EGL returnsEGL_BAD_ACCESS. The correct approach for headless GPU rendering is theEGL_EXT_platform_deviceextension — enumerate EGL devices and obtain a display from a specific device handle.There are two additional complications:
eglInitializeraisesEGLErrorrather than returningFalsein some PyOpenGL versions/EGL vendor combinations — the original code only checked the return value.egl_get_devices()wrapper does not reliably resolve theeglQueryDevicesEXTfunction pointer in headless NVIDIA scenarios, so the fallback must calllibEGL.so.1directly via ctypes.Fix
When
eglInitialize(EGL_DEFAULT_DISPLAY)fails (either by returningFalseor raisingEGLError), fall back to device enumeration:eglQueryDevicesEXTandeglGetPlatformDisplayEXTdirectly fromlibEGL.so.1viactypes(bypassing PyOpenGL's broken wrapper)eglGetPlatformDisplayEXT(EGL_PLATFORM_DEVICE_EXT, device, NULL)Testing
Verified on Ubuntu 24.04, NVIDIA driver 580.65.06, no display server, using the built-in GLSL shader node.