[feat] Add TL2 (LUT) support for Falcon3 family models & fix setup_env.py flag by 7shi · Pull Request #550 · microsoft/BitNet

7shi · 2026-04-22T17:59:10Z

Description

Although the README.md states that TL2 (LUT) inference is supported for the Falcon3 family on x86 CPUs, setup_env.py and utils/codegen_tl2.py were missing the required matrix shapes and block configurations for these models. As a result, attempting to build or run Falcon3 with -q tl2 would result in a NotImplementedError during the GGUF conversion process, or yield garbage output due to mismatched tensor dimensions in the generated C++ kernels.

This PR properly implements the TL2 code generation parameters for the Falcon3 family and fixes a hardcoded compiler flag that was preventing the TL2 kernel from compiling.

Note regarding TL1 (ARM):
The same underlying issue likely affects the TL1 (ARM) path in setup_env.py and utils/codegen_tl1.py. However, since I do not have access to an ARM testing environment to verify the mathematical correctness and stability of the generated kernels, the fix for TL1 has been intentionally omitted from this PR.

Changes Made

Added Falcon3 Matrix Shapes: Updated ModelShapeDict in utils/codegen_tl2.py to include the exact [M, K] dimensions for all Falcon3 models (10B, 7B, 3B, 1B) for both the Instruct and Base variants.
Updated Code Generation Arguments: Modified gen_code() in setup_env.py to intercept models starting with Falcon3 and pass the mathematically optimal block sizes (--BM 256,128,256,128 --BK 96,96,96,96 --bm 32,32,32,32) to satisfy the LUT kernel constraints.
Fixed Compiler Flag Bug: In setup_env.py, COMPILER_EXTRA_ARGS hardcoded -DBITNET_X86_TL2=OFF regardless of the user's input. Modified the execution block to dynamically set -DBITNET_X86_TL2=ON when the -q tl2 argument is explicitly provided.

Testing

Successfully converted and ran inference on tiiuae/Falcon3-7B-Instruct-1.58bit and other Falcon3 models with the -q tl2 flag.
Verified that the generated output is now meaningful and logically correct.

7shi · 2026-04-22T17:59:30Z

@microsoft-github-policy-service agree

7shi added 2 commits April 22, 2026 21:38

[fix] enable -DBITNET_X86_TL2=ON when -q tl2 is specified

73bc198

[feat] add TL2 support for Falcon3 family models

f0c2918

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add TL2 (LUT) support for Falcon3 family models & fix setup_env.py flag#550

[feat] Add TL2 (LUT) support for Falcon3 family models & fix setup_env.py flag#550
7shi wants to merge 2 commits intomicrosoft:mainfrom
7shi:fix-tl2-falcon3

7shi commented Apr 22, 2026

Uh oh!

7shi commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

7shi commented Apr 22, 2026

Description

Changes Made

Testing

Uh oh!

7shi commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant