Skip to content

[ai] Deploying a CPU-only vLLM inference workload on ACP#588

Open
jing2uo wants to merge 3 commits into
mainfrom
kb/reprocess-2026-04-27/running-vllm-on-cpu-only-nodes-for-exper
Open

[ai] Deploying a CPU-only vLLM inference workload on ACP#588
jing2uo wants to merge 3 commits into
mainfrom
kb/reprocess-2026-04-27/running-vllm-on-cpu-only-nodes-for-exper

Conversation

@jing2uo
Copy link
Copy Markdown
Collaborator

@jing2uo jing2uo commented May 2, 2026

新增一篇 ACP KB 文章。

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

Walkthrough

A new documentation guide is added that provides an end-to-end recipe for deploying vLLM in CPU-only mode on Kubernetes, including container image sourcing, persistent storage setup, secret configuration, deployment manifest, service and ingress exposure, validation via curl, benchmarking instructions, and diagnostic troubleshooting steps.

Changes

vLLM CPU-Only Kubernetes Deployment Guide

Layer / File(s) Summary
Overview & Setup
docs/en/solutions/Running_vLLM_on_CPU_only_nodes_for_experimental_LLM_serving.md (lines 1–32)
Introduces the solution scope (CPU mode for experimental evaluation), lists required Kubernetes components, and guides obtaining or building a CPU-targeted vLLM container image.
Storage & Secrets
docs/en/solutions/...md (lines 33–61)
Documents PersistentVolumeClaim creation for the Hugging Face model cache and Secret creation for the Hugging Face Hub authentication token.
Workload Deployment
docs/en/solutions/...md (lines 62–109)
Specifies the Deployment manifest with single replica, cache volume mount, secret environment variable sourcing, container port binding on 8001, resource requests/limits, and the serve command targeting Llama-3.2-1B-Instruct.
Exposure & Security
docs/en/solutions/...md (lines 111–167)
Describes ClusterIP Service and Ingress for in-cluster and external routing, security posture notes for restricted namespaces, and a curl-based sanity check against the /v1/chat/completions endpoint.
Benchmarking & Diagnostics
docs/en/solutions/...md (lines 169–227)
Documents throughput and latency benchmarking using guidellm, expected performance characteristics on CPU, and a diagnostic troubleshooting section covering pod failure modes (OOMKilled, auth issues, connectivity), log inspection, and cache validation.

Poem

🐰 A vLLM on CPU hops along,
Through Kubernetes fields, steady and strong,
With PVC cache and secrets in place,
The service routes with elegant grace,
Now docs light the way for experimenters to race! ✨

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title mentions 'CPU-only vLLM inference workload on ACP' but the actual change is a documentation page about 'Running vLLM on CPU-only nodes'. The title is inconsistent with the file name and content focus, which centers on a Kubernetes recipe rather than a generic inference workload deployment. Align the title with the actual content: use 'Running vLLM on CPU-only nodes for experimental LLM serving' or similar to match the documentation focus and file naming.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kb/reprocess-2026-04-27/running-vllm-on-cpu-only-nodes-for-exper

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jing2uo jing2uo changed the title [ai] Running vLLM on CPU-only nodes for experimental LLM serving [ai] Deploying a CPU-only vLLM inference workload on ACP May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant