-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Hello AMD FidelityFX Team,
First, thank you for your incredible work on FSR 2 and for making it open source. It's a fantastic piece of technology.
As a follow-up to the discussion in issue #135, I've conducted a deeper investigation and have a specific technical proposal.
I am writing to explore the feasibility of creating a fallback path for FSR 2 to run on DirectX 11 hardware that is limited to Feature Level 11_0. My test hardware is an NVIDIA GT 730 (Fermi), which cannot support Typed UAV Loads.
As expected, when running the FSR2DX11Sample, the application fails during context creation with the well-understood error: D3D11 ERROR: ID3D11Device::CreateComputeShader: Shader uses new Typed UAV Load formats.... The failure point is, predictably, the compute_luminance_pyramid pass.
My investigation into a potential workaround led me down a fascinating path:
-
My initial thought was a multi-pass "ping-pong" technique. However, I quickly concluded this would be inefficient due to the heavy driver overhead from resource barriers between passes and the increased memory consumption, likely negating any performance gains from FSR 2.
-
This led me to a much more promising solution: replacing the problematic Typed UAVs with
RWByteAddressBuffer. This approach avoids the pitfalls of ping-ponging by:- Maintaining the single-pass architecture of the shader.
- Avoiding expensive state changes and resource barriers.
- Requiring only modifications to the shader logic to handle manual byte-addressing and type-casting (e.g., using
.Load(byte_address)andasfloat/asuint).
This seems to be the most architecturally sound way to support FL 11_0 hardware without a complete algorithmic rewrite.
My question is this:
Would the team be open to considering or providing guidance on implementing such a fallback path using RWByteAddressBuffer for the key resources in the luminance pyramid pass (e.g., rw_auto_exposure, rw_img_mip_5)?
I believe that adding this fallback would extend FSR 2's incredible technology to a wider range of older but still-capable hardware, which would be a significant win for the community. I understand that this is a non-trivial request and might introduce maintenance overhead that falls outside the project's current scope.
Even a brief statement on the feasibility or potential pitfalls of this RWByteAddressBuffer approach from your perspective would be immensely valuable.
Thank you for your time and for all your contributions to the graphics community.
Best regards.