Skip to content

cuda.bindings latency benchmarks - part 3#1948

Open
danielfrg wants to merge 10 commits intomainfrom
cuda-bindings-bench-3
Open

cuda.bindings latency benchmarks - part 3#1948
danielfrg wants to merge 10 commits intomainfrom
cuda-bindings-bench-3

Conversation

@danielfrg
Copy link
Copy Markdown
Contributor

@danielfrg danielfrg commented Apr 17, 2026

Description

Follow up #1580

This one was pretty much all AI generated but seems to lgmt.

The other big change is moving the benchmark to the top level of the repo as we agreeded.

The new results still look on par:

----------------------------------------------------------------------------------
Benchmark                                   C++ (mean)   Python (mean)    Overhead
----------------------------------------------------------------------------------
memory.mem_alloc_async_free_async               384 ns          775 ns     +390 ns
memory.mem_alloc_free                          1.61 us         2.06 us     +451 ns
memory.memcpy_dtod                             2.10 us         2.34 us     +249 ns
memory.memcpy_dtoh                             4.99 us         5.49 us     +498 ns
memory.memcpy_htod                             3.95 us         4.00 us      +58 ns

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@danielfrg danielfrg requested review from mdboom and rwgk April 17, 2026 18:58
@danielfrg danielfrg self-assigned this Apr 17, 2026
@danielfrg danielfrg added cuda.bindings Everything related to the cuda.bindings module performance labels Apr 17, 2026
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 17, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs label Apr 17, 2026
@@ -168,39 +168,6 @@ int main(int argc, char** argv) {
});
}

// --- launch_small_kernel ---
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were duplicated before. Cleaning up.

Copy link
Copy Markdown
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danielfrg
Copy link
Copy Markdown
Contributor Author

/ok to test 5ecba20

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

@danielfrg the pre-commit is still failing: to conserve our CI resources, it'll be best if you cancel the workflow asap, because you'll need another run anyway after the pre-commit fixes.

You can use pre-commit run --all-files to clean up before pushing commits. I have it installed with sudo apt install pre-commit, but you can also install with pip install pre-commit.

To make the "CI: Enforce assignee/label/milestone on PRs" workflow happy: do you have the ability to assign a milestone?

@github-actions
Copy link
Copy Markdown

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

I believe something isn't right with my recent toolshed/check_spdx.py changes (from PR #1913). It doesn't complain about the missing benchmarks entry in EXPECTED_LICENSE_IDENTIFIERS (I was sure I tested that, but apparently I missed something). I'm still looking.

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

Cursor GPT-5.4 Extra High Fast findings (both confirmed manually):

  • benchmarks/cuda_bindings/tests/test_runner.py still points at the old cuda_bindings/benchmarks/... paths. Running TestVenv/bin/python -m pytest benchmarks/cuda_bindings/tests/test_runner.py -q fails all 3 tests immediately with FileNotFoundError against those deleted locations.

  • .github/workflows/test-wheel-linux.yml still does pushd cuda_bindings/benchmarks for the benchmark smoke test. This PR moves that tree to benchmarks/cuda_bindings, so that workflow step is stale and will fail when it runs.

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

I believe something isn't right with my recent toolshed/check_spdx.py changes (from PR #1913). It doesn't complain about the missing benchmarks entry in EXPECTED_LICENSE_IDENTIFIERS (I was sure I tested that, but apparently I missed something). I'm still looking.

This explains it:

#1913 (comment)

Sorry for the accident. I'm looking into applying the missing changes here, because this is the perfect test case.

GitHub merged PR 1913 before the later local commits were pushed, so replay the recovered SPDX policy follow-ups and related license fixes here.
Context: #1913 (comment)

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

I just pushed the missing commits from PR #1913 here as one commit (i.e. missing changes squashed): f2c0838

Now the check behaves as I expected (below). I'll try to fix that next.

smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/forked/cuda-python $ pre-commit run --all-files check-spdx
Check SPDX Headers.......................................................Failed
- hook id: check-spdx
- exit code: 1

MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_stream.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_ctx_device.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_pointer_attributes.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_event.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/test_launch_latency.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_support.hpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_ctx_device.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/runner/runtime.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/compare.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pixi.toml'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_pointer_attributes.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/test_numba.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/conftest.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_stream.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/runner/cpp.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_event.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/test_pointer_attributes.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_memory.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_memory.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/CMakeLists.txt'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/runner/main.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/runner/__init__.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/run_cpp.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/kernels.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/bench_launch.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/run_pyperf.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/pytest-legacy/test_cupy.py'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/benchmarks/cpp/bench_launch.cpp'
MISSING TOP_LEVEL_DIRS_LICENSE_IDENTIFIERS entry for top-level directory 'benchmarks' required by 'benchmarks/cuda_bindings/tests/test_runner.py'

rwgk added 2 commits April 17, 2026 16:06
The naming-rule suppressions used to live under cuda_bindings/benchmarks, so move the needed legacy-path suppressions to the relocated benchmarks/cuda_bindings pytest-legacy path and drop the stale old-path entry.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 17, 2026

@danielfrg I pushed two more commits (on top of the commit that fixed the PR #1913 accident):

The first one was closely related to the PR #1913 work. The second one I jump on to be sure pre-commit passes cleanly now.

I want to stop here and hand this back to you. There are still the issues reported under #1948 (comment). Could you take it from here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module Needs-Restricted-Paths-Review PR touches cuda_bindings or cuda_python; only NVIDIA employees may modify these paths; see LICENSEs performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants