Issue/1153 add fused FFN operator and hardware-task mutual awareness analyzer by ZhouBencheng · Pull Request #1154 · InfiniTensor/InfiniCore

ZhouBencheng · 2026-05-07T14:54:40Z

概述

本 PR 关联 #1153，新增两个相互独立的模块，均通过编译选项或环境变量控制，默认关闭，不影响现有算子的源码、签名与默认编译路径：

Fused FFN 算子（infiniopFusedFFN）：将 LLM decode 阶段 FFN 主路径融合为单算子。
资源-任务互感知分析器（--mutual-awareness=y）：在算子调度链路上提供运行时上下文采集与目标驱动的 kernel 选择能力。

模块详情

1. Hardware-Task Mutual Awareness Analyzer

通过 --mutual-awareness=y 启用，关闭时零编译产出、零 ABI 影响。

analyzer/：OpTraceRing / PhaseDetector / ResourceSensor / IntentGenerator / MutualAwarenessAnalyzer 五个核心子模块
OpDispatcher 新增 OptimizationGoal 重载（registerDevice(device, fn, goal) / lookup(device, goal)），完全向后兼容；只有 Attention::execute 接入 goal-aware 路径，其余算子不变
infinirt 新增统一资源快照 API：infinirtGetMemInfo / infinirtGetDeviceResourceSnapshot，按平台分层：NVIDIA / Iluvatar 走 NVML / IXML（dlopen），MetaX 走 mcMemGetInfo，CPU 走
/proc/meminfo，其它 backend 返回 NOOP / fallback
Python 绑定 infinicore.analyzer 在启用时自动暴露

2. Fused FFN Operator

公开 C API：include/infiniop/ops/fused_ffn.h

out = Down( SwiGLU( GateUp( RMSNorm(in) ) ) ) + residual

形状校验支持 gate_up_weight 两种 layout（[2*di, d] / [d, 2*di]）和 down_weight 两种 layout
CPU 实现：纯 C++ reference，支持 F16 / BF16 / F32
NVIDIA / Iluvatar / Hygon / Ali / QY：默认 5-stage 路径复用 gemm / rms_norm / swiglu / add 子算子 descriptor；可选 deep-fused 路径（INFINIOP_FUSED_FFN_DEEP=1 走调度器、=2 强制），将
GateUp+SwiGLU 合并为单 kernel 消除 HBM 往返，对小 ntok 友好；当 out == residual 时通过 beta=1 把 residual 融入 Down GEMM
MetaX 实现：通过 mcblas 调用 GEMM，复用 NVIDIA kernel.cuh 的 RMSNorm / SwiGLU / ResidualAdd kernel
测试：test/infiniop/fused_ffn.py，13 种 shape × 2 dtype，覆盖 LLaMA-7B / 13B / Qwen 三种典型架构 × {with / without residual}
在 test/infiniop/libinfiniop/op_register.py 末尾追加 fused_ffn_ ctypes 注册

测试验证

编译

平台	配置	结果
沐曦 C500	`--metax-gpu=y --use-mc=y --mutual-awareness=y`	✓
天数 BI-V150	`--iluvatar-gpu=y --cpu=y --mutual-awareness=y`	✓
天数 BI-V150	`--cpu=y --mutual-awareness=y`	✓

正确性

Fused FFN

沐曦 C500（--metax）：26/26 PASS
天数 BI-V150（--cpu）：24/24 PASS

Mutual Awareness

C++ 单元测试：8/8 PASS
Python smoke：6/6 PASS
已验证 --mutual-awareness=n 时 analyzer 符号未链接

回归

天数 cpu / 沐曦 metax 平台上对 rms_norm / swiglu / add / gemm / attention 五个相关算子做了 sanity 回归，全部 PASS。

影响范围

默认编译（--mutual-awareness=n）下：analyzer 文件不编译，dispatch 无 goal-aware 分支，infinirt 新增 API 在禁用平台为 NOOP，零侵入
不修改任何已有算子源码或对外签名
xmake.lua 仅新增 mutual-awareness option 与 ENABLE_MUTUAL_AWARENESS 宏；fused_ffn 通过现有 glob 自动收文件，无新增 build 规则

Checklist

编译通过（沐曦 metax / 天数 cpu / 天数 iluvatar）
单元测试通过（cpu + metax）
子算子回归 sanity 通过
commit / PR 标题以 issue/1153 开头

…are kernel dispatch

wooway777 · 2026-05-09T02:24:13Z

感谢周老师

ZhouBencheng added 2 commits May 7, 2026 22:49

issue/1153 - add hardware-task mutual awareness analyzer with goal-aw…

4aec970

…are kernel dispatch

issue/1153 - add fused FFN operator with multi-backend support

f4faae1

ZhouBencheng requested a review from a team May 7, 2026 14:54

issue/1153 - apply clang-format-16 and fix MSVC C4996 on Windows

38d3dc3

voltjia approved these changes May 9, 2026

View reviewed changes

Ziminli approved these changes May 9, 2026

View reviewed changes

wooway777 merged commit 971f76a into InfiniTensor:main May 9, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue/1153 add fused FFN operator and hardware-task mutual awareness analyzer#1154

Issue/1153 add fused FFN operator and hardware-task mutual awareness analyzer#1154
wooway777 merged 3 commits into
InfiniTensor:mainfrom
ZhouBencheng:issue/1153

ZhouBencheng commented May 7, 2026

Uh oh!

wooway777 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ZhouBencheng commented May 7, 2026

概述

模块详情

1. Hardware-Task Mutual Awareness Analyzer

2. Fused FFN Operator

测试验证

编译

正确性

回归

影响范围

Checklist

Uh oh!

wooway777 commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants