Skip to content

[feat] Add TL2 (LUT) support for Falcon3 family models & fix setup_env.py flag#550

Open
7shi wants to merge 2 commits intomicrosoft:mainfrom
7shi:fix-tl2-falcon3
Open

[feat] Add TL2 (LUT) support for Falcon3 family models & fix setup_env.py flag#550
7shi wants to merge 2 commits intomicrosoft:mainfrom
7shi:fix-tl2-falcon3

Conversation

@7shi
Copy link
Copy Markdown

@7shi 7shi commented Apr 22, 2026

Description

Although the README.md states that TL2 (LUT) inference is supported for the Falcon3 family on x86 CPUs, setup_env.py and utils/codegen_tl2.py were missing the required matrix shapes and block configurations for these models. As a result, attempting to build or run Falcon3 with -q tl2 would result in a NotImplementedError during the GGUF conversion process, or yield garbage output due to mismatched tensor dimensions in the generated C++ kernels.

This PR properly implements the TL2 code generation parameters for the Falcon3 family and fixes a hardcoded compiler flag that was preventing the TL2 kernel from compiling.

Note regarding TL1 (ARM):
The same underlying issue likely affects the TL1 (ARM) path in setup_env.py and utils/codegen_tl1.py. However, since I do not have access to an ARM testing environment to verify the mathematical correctness and stability of the generated kernels, the fix for TL1 has been intentionally omitted from this PR.

Changes Made

  1. Added Falcon3 Matrix Shapes: Updated ModelShapeDict in utils/codegen_tl2.py to include the exact [M, K] dimensions for all Falcon3 models (10B, 7B, 3B, 1B) for both the Instruct and Base variants.
  2. Updated Code Generation Arguments: Modified gen_code() in setup_env.py to intercept models starting with Falcon3 and pass the mathematically optimal block sizes (--BM 256,128,256,128 --BK 96,96,96,96 --bm 32,32,32,32) to satisfy the LUT kernel constraints.
  3. Fixed Compiler Flag Bug: In setup_env.py, COMPILER_EXTRA_ARGS hardcoded -DBITNET_X86_TL2=OFF regardless of the user's input. Modified the execution block to dynamically set -DBITNET_X86_TL2=ON when the -q tl2 argument is explicitly provided.

Testing

  • Successfully converted and ran inference on tiiuae/Falcon3-7B-Instruct-1.58bit and other Falcon3 models with the -q tl2 flag.
  • Verified that the generated output is now meaningful and logically correct.

@7shi
Copy link
Copy Markdown
Author

7shi commented Apr 22, 2026

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant