It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions?

We are working on generative AI models training. Like training FLUX, or Qwen Image or Wan 2.2.

We have noticed that we are getting massive speed loss when we do big data transfer between RAM and GPU on Windows compared to Linux.

The hit is such a big scale that Linux runs 2x faster than Windows even more.

Tests are made on same : GPU RTX 5090

You can read more info here : https://github.com/kohya-ss/musubi-tuner/pull/700

It turns out if we enable TCC mode on Windows, it gets equal speed as Linux.

However NVIDIA blocked this at driver level.

I found a Chinese article with just changing few letters, via Patching nvlddmkm.sys, the TCC mode fully becomes working on consumer GPUs. However this option is extremely hard and complex for average users.

Now my question is, why we can't get Linux speed on Windows?

Everything I found says it is due to driver mode WDDM

Moreover it seems like Microsoft added this feature : MCDM

https://learn.microsoft.com/en-us/windows-hardware/drivers/display/mcdm-architecture

And as far as I understood, MCDM mode should be also same speed.

How can we solve this slowness on Windows compared to Linux?

Our issue is happening due to this. Recent AI models are massive and not fitting into GPU. So we are doing Block Swapping. Which means only the model blocks that will be trained being on GPU. So we swap model between RAM and GPU constantly.

As you can imagine this is a massive data transfer. This is being ultra fast on Linux on same hardware. However on Windows, it is like at least 3x slower and we couldn't solve this issue yet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions? #103

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

It turns out WDDM driver mode is making our RAM - GPU transfer extremely slower compared to TCC or MCDM mode. Anyone has figured out the bypass NVIDIA software level restrictions? #103

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions