Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

[PyTorch] Batch CP attention tests in single torchrun to amortize NCC…
#2965 opened May 6, 2026 by sudhakarsingh27 Collaborator Loading…
13 tasks
Refactor tensor class in C++ unit tests refactor
#2962 opened May 6, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
Draft:Extended Tensor Parallelism
#2960 opened May 5, 2026 by jiemingz Draft
13 tasks
[PyTorch/Common] Remove legacy FP8DS implementation 2.16.0
#2959 opened May 5, 2026 by cyanguwa Collaborator Loading…
8 of 13 tasks
[Common] Use specialized unfused MXFP8 cast kernels by default
#2958 opened May 5, 2026 by Oleg-Goncharov Collaborator Loading…
5 of 13 tasks
CPU overhead optimizations for te autocast
#2957 opened May 4, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
[Common, PyTorch] Improve mHC to match DeepSeek's implementation
#2953 opened May 1, 2026 by kainzhong Collaborator Draft
9 of 13 tasks
[Common, PyTorch] Add Triton MLA attention kernels for SM80 community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2950 opened Apr 30, 2026 by bzantium Loading…
[All] Remove legacy max512 backend 2.16.0
#2949 opened Apr 30, 2026 by cyanguwa Collaborator Loading…
8 of 13 tasks
Add NVFP4 1x64 Local Encode Recipe
#2941 opened Apr 29, 2026 by cael-ling Contributor Loading…
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Loading…
1 of 13 tasks
Implement row-scaled NVFP4 fprop recipe community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2931 opened Apr 27, 2026 by zianglih Contributor Loading…
8 of 13 tasks
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928 opened Apr 25, 2026 by eyupcanakman Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925 opened Apr 25, 2026 by ksivaman Member Loading…
7 of 13 tasks
[PyTorch] Add distributed Muon optimizer 2.16.0
#2920 opened Apr 23, 2026 by vcherepanov-nv Collaborator Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919 opened Apr 23, 2026 by CarlosGomes98 Contributor Loading…
1 of 13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Contributor Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
ProTip! Follow long discussions with comments:>50.