Skip to content

[Feature Request] MIG-related functions are incompatible with Blackwell Architecture (RTX Pro 6000 Blackwell Server Edition) #17

@choehojun

Description

@choehojun

Description

I encountered an AssertionError when attempting to check the MIG mode status on an RTX Pro 6000 Blackwell Server Edition GPU using the following command:
sudo python3 nvidia_gpu_tools.py --gpu=0 --query-mig-mode

Upon investigating the source code, I found that MIG mode support is strictly hardcoded to the Ampere A100 architecture at line 3409:

# nvidia_gpu_tools.py L3409
self.is_mig_mode_supported = self.is_ampere_100

Therefore, I modified the line to self.is_mig_mode_supported = self.is_ampere_plus to allow execution on Blackwell architecture. Then, I ran the MIG toggle test:
sudo python3 nvidia_gpu_tools.py --gpu=0 --test-mig-toggle

However, another error occured:

NVIDIA GPU Tools version v2025.11.21o
Command line arguments: ['nvidia_gpu_tools.py', '--gpu=0', '--test-mig-toggle']
GPUs:
  0 GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
...
2026-03-13,04:49:44.257 INFO      Selected GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000
2026-03-13,04:49:44.261 INFO      GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 set MIG to be enabled after next reset
  File "/home/choehojun/gpu-admin-tools/cli/per_gpu.py", line 315, in main_per_gpu
    gpu.test_mig_toggle()
  File "/home/choehojun/gpu-admin-tools/nvidia_gpu_tools.py", line 4050, in test_mig_toggle
    raise GpuError("{0} MIG mode failed to switch from {1} to {2}".format(self, org_state, new_state))
2026-03-13,04:49:48.215 ERROR    GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 MIG mode failed to switch from False to False
2026-03-13,04:49:48.215 ERROR    GPU 0000:40:00.0 RTX-PRO-6000 0x2bb5 BAR0 0x207000000000 testing MIG toggle failed

To analyze the cause of the error, I reviewed the set_mig_mode_after_reset function (lines 4022-4028) and noticed that it uses a fixed bitfield address 0x118f78:

def set_mig_mode_after_reset(self, enabled):
    assert self.is_mig_mode_supported

    scratch = self.bitfield(0x118f78)
    scratch[14:16] = 3 if enabled else 2

    info("%s set MIG to be %s after next reset", self, "enabled" if enabled else "disabled")

Questions

  1. Is the bitfield address 0x118f78 specific to the Ampere 100 architecture? Is this the reason why is_mig_mode_supported was restricted to is_ampere_100?
  2. Are there plans to update these register addresses or provide official support for Hopper and Blackwell architectures regarding MIG functionality?

Additional Inquiry: Multi-tenant GPU CC with MIG

In a scenario where GPU-CC is enabled, is it possible to use MIG mode to support multi-tenant GPU CC?

Specifically, I would like to know if the hardware/tool combination allows partitioning a single GPU into multiple MIG instances while maintaining CC security guarantees for multiple users.

Environmental Configurations

Hardware: RTX Pro 6000 Blackwell Server Edition
Tool Verison: v2025.11.21o

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions