-
Notifications
You must be signed in to change notification settings - Fork 708
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: TransformerEngineBaseModule quantizers init values type
#2927
opened Apr 25, 2026 by
muutot
Loading…
4 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925
opened Apr 25, 2026 by
ksivaman
Member
Loading…
7 of 13 tasks
Correctly pad scaling factor inverses to satisfy cuteDSL requirements
2.15.0
MoE
#2924
opened Apr 24, 2026 by
ksivaman
Member
Loading…
6 of 13 tasks
Make TE Sequential Grouped linear Op CUDA graphable
#2923
opened Apr 24, 2026 by
vthumbe1503
Collaborator
•
Draft
13 tasks
[Draft] [PyTorch] Add distributed Muon optimizer
#2920
opened Apr 23, 2026 by
vcherepanov-nv
Collaborator
Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919
opened Apr 23, 2026 by
CarlosGomes98
Contributor
•
Draft
13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916
opened Apr 22, 2026 by
sudhakarsingh27
Collaborator
•
Draft
1 of 3 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2911
opened Apr 21, 2026 by
NoonePauseferg
Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#2907
opened Apr 21, 2026 by
jing-4369
Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906
opened Apr 21, 2026 by
yaox12
Member
Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900
opened Apr 18, 2026 by
timmoon10
Collaborator
Loading…
9 of 13 tasks
add support for enabling cuda graph under thd format in megatron.
#2898
opened Apr 17, 2026 by
HaochenYuan
Loading…
13 tasks
Improve the dimension checks for the FP8 recipes
#2894
opened Apr 16, 2026 by
ptrendx
Member
Loading…
13 tasks
[Debug] Add AutoswitchGEmm for Debug Precision Tool
#2883
opened Apr 15, 2026 by
shangxiaokang
•
Draft
3 of 13 tasks
[PyTorch] Split TE ops op_forward into op_forward and setup_context
#2877
opened Apr 14, 2026 by
pggPL
Collaborator
Loading…
5 of 7 tasks
[DONOT MERGE] Wgrad cute dsl v2
#2872
opened Apr 13, 2026 by
vthumbe1503
Collaborator
•
Draft
13 tasks
[JAX] Add debug validation mode for runtime group size alignment
#2867
opened Apr 11, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
13 tasks
Optimizations for MXFP8/NVFP4 dequantize kernels
#2865
opened Apr 10, 2026 by
YigongQin
Loading…
8 of 13 tasks
Adds GEMM Profiling Guide to TE
#2863
opened Apr 9, 2026 by
jomitchellnv
Contributor
Loading…
7 tasks
Previous Next
ProTip!
Updated in the last three days: updated:>2026-04-22.