Skip to content

meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56

Merged
kvinwang merged 2 commits intomainfrom
feat/nvidia-580.95.05
Apr 21, 2026
Merged

meta-nvidia: bump NVIDIA stack to 580.95.05 (Blackwell CC support)#56
kvinwang merged 2 commits intomainfrom
feat/nvidia-580.95.05

Conversation

@kvinwang
Copy link
Copy Markdown
Collaborator

Summary

Adds NVIDIA driver 580.95.05 recipes for nvidia, nvidia-fabricmanager, and libnvidia-nscq. Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5) confidential compute under Intel TDX — the existing 570.172.08 driver predates Pro 6000 SE and bails out with GPU confidential compute capability is not enabled when loaded inside a TDX guest.

Yocto picks the highest available PV by default, so adding these recipes is enough to make builds use 580.95.05; the 570 recipes are kept for now.

What's in each recipe

  • nvidia_580.95.05.bb — same structure as 570 recipe, only SRC_URI[md5sum] / SRC_URI[sha256sum] updated.
  • nvidia-fabricmanager_580.95.05.bb — same as 570. The 580 archive ships extra gb300_* topology files; current do_install lists files explicitly so the new ones aren't picked up. Harmless for non-GB300 hosts; could be turned into a glob in a follow-up.
  • libnvidia-nscq_580.95.05.bb — 580 archive no longer ships bin/nscq-cli, so do_install and FILES:${PN} were trimmed accordingly.

Verification

Built dstack-nvidia-dev-rootfs with the new stack and deployed to a TDX guest with one RTX PRO 6000 Blackwell Server Edition passthrough on Ubuntu intel-tdx kernel 6.14.0-1008-intel + tdx-patched QEMU. Inside the guest:

NVRM version: NVIDIA UNIX Open Kernel Module 580.95.05
GPU 0: NVIDIA RTX PRO 6000 Blackwell Server Edition

==============NVSMI CONF-COMPUTE LOG==============
    CC State                   : ON
    Multi-GPU Mode             : None
    CPU CC Capabilities        : INTEL TDX
    GPU CC Capabilities        : CC Capable
    CC GPUs Ready State        : Ready
    Protected memory size      : 99461312 KiB
    Unprotected memory size    : 0 KiB
    Environment                : PRODUCTION

Fabric Manager version is : 580.95.05
/usr/lib/libnvidia-nscq.so.580.95.05

Real workload: PyTorch 2.11+cu128 sees sm_120, runs cuBLAS GEMM (~78 TFLOP/s FP32, ~115 TFLOP/s BF16 on a single Pro 6000) entirely inside the TDX guest with CC ON.

Test plan

  • bitbake nvidia succeeds
  • bitbake nvidia-fabricmanager libnvidia-nscq succeeds
  • bitbake dstack-nvidia-dev-rootfs succeeds; rootfs contains nvidia.ko@580.95.05, nv-fabricmanager reporting 580.95.05, libnvidia-nscq.so.580.95.05
  • TDX guest boots, NVIDIA module loads on RTX PRO 6000 Blackwell SE
  • nvidia-smi conf-compute -q shows CC State: ON
  • PyTorch CUDA workload runs end-to-end

Required for RTX PRO 6000 Blackwell Server Edition (10de:2bb5)
Confidential Compute support: the 570.172.08 NVIDIA Open Kernel Module
predates Pro 6000 SE and refuses to attach with
"GPU confidential compute capability is not enabled" inside a TDX guest.

580.95.05 recognizes GB202 and brings the full TDX guest CC stack online
(verified end-to-end: nvidia-smi conf-compute reports CC State=ON,
GPU CC Capabilities=CC Capable, Protected memory=99461312 KiB; PyTorch
CUDA workloads run inside the TD).

The nscq archive at 580 no longer ships bin/nscq-cli, so the new recipe
drops it from do_install and FILES.
Copilot AI review requested due to automatic review settings April 21, 2026 02:19
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Yocto recipes for the NVIDIA 580.95.05 driver stack to enable confidential compute support on newer Blackwell-era GPUs (e.g., RTX PRO 6000 Blackwell Server Edition) while keeping existing 570.x recipes in place.

Changes:

  • Add nvidia_580.95.05.bb with updated 580.95.05 installer checksums/URL.
  • Add nvidia-fabricmanager_580.95.05.bb to package Fabric Manager 580.95.05.
  • Add libnvidia-nscq_580.95.05.bb, updated for the 580 archive layout (no nscq-cli).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
meta-nvidia/recipes-graphics/nvidia/nvidia_580.95.05.bb New 580.95.05 driver recipe (runfile extraction + includes).
meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.95.05.bb New 580.95.05 Fabric Manager recipe and install manifest.
meta-nvidia/recipes-graphics/nvidia/libnvidia-nscq_580.95.05.bb New 580.95.05 NSCQ library recipe; install/package list adjusted to archive contents.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread meta-nvidia/recipes-graphics/nvidia/nvidia-fabricmanager_580.95.05.bb Outdated
- nvidia-fabricmanager: install all files from share/nvidia/nvswitch/ via
  glob loop instead of an explicit list. The 580.95.05 archive ships new
  GB300 topologies (gb300_nvl72r{1,2}_c2g4_topology) that the old
  hard-coded list missed; future archives stay covered automatically.
  (Addresses Copilot review feedback on PR #56.)

- dstack.conf: introduce NVIDIA_VERSION = "580.95.05" and pin
  PREFERRED_VERSION_{nvidia,nvidia-fabricmanager,libnvidia-nscq} from it
  so the kernel module ABI and userspace libs always move together.
@kvinwang
Copy link
Copy Markdown
Collaborator Author

Addressed @copilot-pull-request-reviewer feedback in 85e7e17 + added version pinning:

  1. nvidia-fabricmanager_580.95.05.bb — replaced the explicit topology file list with a for f in ${S}/share/nvidia/nvswitch/*; do install …; done loop. Verified build picks up the new gb300_nvl72r{1,2}_c2g4_topology files that the old list was missing (29 topology files installed total).

  2. Version pin (meta-dstack/conf/distro/dstack.conf):

    NVIDIA_VERSION = "580.95.05"
    PREFERRED_VERSION_nvidia                = "${NVIDIA_VERSION}"
    PREFERRED_VERSION_nvidia-fabricmanager  = "${NVIDIA_VERSION}"
    PREFERRED_VERSION_libnvidia-nscq        = "${NVIDIA_VERSION}"
    

    So the three packages always bump together — kernel module ABI and userspace libs are paired. bitbake-getvar --value confirms all four resolve to 580.95.05. Future bumps only need to change NVIDIA_VERSION.

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

Addressed @copilot-pull-request-reviewer feedback in 85e7e17 + added version pinning:

  1. nvidia-fabricmanager_580.95.05.bb — replaced the explicit topology file list with a `for f in ${S}/sha...

Both changes are already in place at commit 85e7e17:

  • nvidia-fabricmanager_580.95.05.bb uses the glob loop (for f in ${S}/share/nvidia/nvswitch/*) instead of the explicit file list.
  • meta-dstack/conf/distro/dstack.conf has NVIDIA_VERSION = "580.95.05" with the three PREFERRED_VERSION_* variables pinning all packages together.

No further changes needed.

@kvinwang kvinwang merged commit be6feef into main Apr 21, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants