Add AI written qwen3_moe example by skyw · Pull Request #2887 · NVIDIA/TransformerEngine

skyw · 2026-04-15T18:30:35Z

Description

A almost pure TE module implementation of Qwen3 Moe model

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Please list the changes introduced in this PR:

Add Qwen3 MoE model use TE module only
Simple test to match HF counterpart.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Hao Wu <skyw@nvidia.com>

greptile-apps · 2026-04-15T18:40:31Z

Greptile Summary

This PR adds a new examples/pytorch/qwen3_moe/ directory with a pure-TE implementation of Qwen3 MoE (Qwen3MoeForCausalLM) and a numerical comparison test against the HuggingFace reference model. The model mapping is well-structured and the test correctly seeds weights and checks both forward logits and backward gradients. Remaining findings are all P2: a truncated module docstring, tokens_per_expert being int32 (TE GroupedLinear typically expects int64), a silent skip of None-gradient parameters in the backward test, and a minor debuggability gap in the expert-weight name-mapping fallthrough.

Confidence Score: 5/5

Safe to merge; all remaining findings are P2 style/quality improvements with no blocking correctness issues

All four findings are P2: a truncated docstring, a potential int32 dtype concern (uncertain without running against the actual TE kernel), a test coverage gap, and a minor fallthrough comment. None are definitive runtime breakages in the model itself. The core model mapping logic is sound and the test structure is reasonable.

examples/pytorch/qwen3_moe/model.py — verify tokens_per_expert dtype accepted by te_ops.GroupedLinear; examples/pytorch/qwen3_moe/test_vs_hf.py — address truncated docstring and None-grad skip

Important Files Changed

Filename	Overview
examples/pytorch/qwen3_moe/model.py	Full TE-based Qwen3 MoE implementation; potential `int32` dtype issue for `tokens_per_expert` passed to `GroupedLinear`
examples/pytorch/qwen3_moe/test_vs_hf.py	HF comparison test; truncated docstring, silent `None`-grad skip in backward loop, and a no-op `data.copy_()` before backward (addressed in prior review thread)
examples/pytorch/qwen3_moe/config.py	Frozen dataclass mirroring HuggingFace Qwen3MoeConfig defaults; no issues found
examples/pytorch/qwen3_moe/README.md	Concise README with module mapping table, file descriptions, and correct `cd + python test_vs_hf.py` invocation instructions

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["input_ids (batch, seq_len)"] --> B["embed_tokens\n(nn.Embedding)"]
    B --> C["RotaryPositionEmbedding\nfreqs"]
    C --> D

    subgraph LAYER["Qwen3MoeDecoderLayer (×N)"]
        D["hidden_states"] --> E["te.MultiheadAttention\n(fused LN + QKV + QK-norm + RoPE + attn + O)"]
        E --> F["+ residual"]
        F --> G["te.RMSNorm\npost_attention_layernorm"]
        G --> H

        subgraph MOE["Qwen3MoeBlock"]
            H["hidden_flat (tokens, hidden)"] --> I["Qwen3MoeRouter\n(softmax + top-k)"]
            I --> J["moe_permute_with_probs"]
            J --> K["te_ops.GroupedLinear\n(gate+up, int32 tokens_per_expert⚠)"]
            K --> L["te_ops.SwiGLU"]
            L --> M["te_ops.GroupedLinear\n(down)"]
            M --> N["moe_unpermute\n(prob-weighted combine)"]
        end

        N --> O["+ residual"]
    end

    O --> P["te.RMSNorm\nfinal norm"]
    P --> Q["te.Linear\nlm_head"]
    Q --> R["logits (batch, seq_len, vocab_size)"]

_{Reviews (5): Last reviewed commit: "Merge branch 'main' into vibe_qwen3" | Re-trigger Greptile}

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

Signed-off-by: Hao Wu <skyw@nvidia.com>

skyw added 4 commits April 15, 2026 11:16

add qwen3 model to example

4bd1efb

Signed-off-by: Hao Wu <skyw@nvidia.com>

add readme

c56277e

Signed-off-by: Hao Wu <skyw@nvidia.com>

remove python 3.12 feature

2ea1d36

Signed-off-by: Hao Wu <skyw@nvidia.com>

add license

ed3cf31

Signed-off-by: Hao Wu <skyw@nvidia.com>

ksivaman self-requested a review April 15, 2026 18:39

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread examples/pytorch/qwen3_moe/test_vs_hf.py Outdated

Comment thread examples/pytorch/qwen3_moe/test_vs_hf.py

skyw and others added 4 commits April 15, 2026 12:30

Update examples/pytorch/qwen3_moe/test_vs_hf.py

4063918

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

Update examples/pytorch/qwen3_moe/test_vs_hf.py

fc0ecf7

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Hao Wu <skyw@users.noreply.github.com>

rollback wrong changes initiated by AI

189b18f

Signed-off-by: Hao Wu <skyw@nvidia.com>

Merge branch 'main' into vibe_qwen3

5ad6ba9

ptrendx assigned pggPL Apr 16, 2026

Merge branch 'main' into vibe_qwen3

921e5d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AI written qwen3_moe example#2887

Add AI written qwen3_moe example#2887
skyw wants to merge 9 commits intoNVIDIA:mainfrom
skyw:vibe_qwen3

skyw commented Apr 15, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

skyw commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

skyw commented Apr 15, 2026 •

edited

Loading

greptile-apps Bot commented Apr 15, 2026 •

edited

Loading