[ai] Deploying a CPU-only vLLM inference workload on ACP by jing2uo · Pull Request #588 · alauda/knowledge

jing2uo · 2026-05-02T01:51:58Z

新增一篇 ACP KB 文章。

coderabbitai · 2026-05-02T01:52:09Z

Walkthrough

A new documentation guide is added that provides an end-to-end recipe for deploying vLLM in CPU-only mode on Kubernetes, including container image sourcing, persistent storage setup, secret configuration, deployment manifest, service and ingress exposure, validation via curl, benchmarking instructions, and diagnostic troubleshooting steps.

Changes

vLLM CPU-Only Kubernetes Deployment Guide

Layer / File(s)	Summary
Overview & Setup `docs/en/solutions/Running_vLLM_on_CPU_only_nodes_for_experimental_LLM_serving.md` (lines 1–32)	Introduces the solution scope (CPU mode for experimental evaluation), lists required Kubernetes components, and guides obtaining or building a CPU-targeted vLLM container image.
Storage & Secrets `docs/en/solutions/...md` (lines 33–61)	Documents PersistentVolumeClaim creation for the Hugging Face model cache and Secret creation for the Hugging Face Hub authentication token.
Workload Deployment `docs/en/solutions/...md` (lines 62–109)	Specifies the Deployment manifest with single replica, cache volume mount, secret environment variable sourcing, container port binding on 8001, resource requests/limits, and the serve command targeting Llama-3.2-1B-Instruct.
Exposure & Security `docs/en/solutions/...md` (lines 111–167)	Describes ClusterIP Service and Ingress for in-cluster and external routing, security posture notes for restricted namespaces, and a curl-based sanity check against the `/v1/chat/completions` endpoint.
Benchmarking & Diagnostics `docs/en/solutions/...md` (lines 169–227)	Documents throughput and latency benchmarking using `guidellm`, expected performance characteristics on CPU, and a diagnostic troubleshooting section covering pod failure modes (OOMKilled, auth issues, connectivity), log inspection, and cache validation.

Poem

🐰 A vLLM on CPU hops along,
Through Kubernetes fields, steady and strong,
With PVC cache and secrets in place,
The service routes with elegant grace,
Now docs light the way for experimenters to race! ✨

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The PR title mentions 'CPU-only vLLM inference workload on ACP' but the actual change is a documentation page about 'Running vLLM on CPU-only nodes'. The title is inconsistent with the file name and content focus, which centers on a Kubernetes recipe rather than a generic inference workload deployment.	Align the title with the actual content: use 'Running vLLM on CPU-only nodes for experimental LLM serving' or similar to match the documentation focus and file naming.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kb/reprocess-2026-04-27/running-vllm-on-cpu-only-nodes-for-exper

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

[ai] Running vLLM on CPU-only nodes for experimental LLM serving

9d31a10

jing2uo temporarily deployed to translate May 2, 2026 01:52 — with GitHub Actions Inactive

[ai] Running vLLM on CPU-only nodes for experimental LLM serving

182f545

jing2uo temporarily deployed to translate May 2, 2026 12:36 — with GitHub Actions Inactive

[ai] Deploying a CPU-only vLLM inference workload on ACP

5556054

jing2uo changed the title ~~[ai] Running vLLM on CPU-only nodes for experimental LLM serving~~ [ai] Deploying a CPU-only vLLM inference workload on ACP May 17, 2026

jing2uo temporarily deployed to translate May 17, 2026 03:28 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ai] Deploying a CPU-only vLLM inference workload on ACP#588

[ai] Deploying a CPU-only vLLM inference workload on ACP#588
jing2uo wants to merge 3 commits into
mainfrom
kb/reprocess-2026-04-27/running-vllm-on-cpu-only-nodes-for-exper

jing2uo commented May 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jing2uo commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jing2uo commented May 2, 2026 •

edited

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading