Fix/gcs check hashes by postevanus-scale · Pull Request #821 · scaleapi/llm-engine

postevanus-scale · 2026-05-06T08:39:58Z

Pull Request Summary

What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.

Test Plan and Usage Guide

How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.

Greptile Summary

This PR fixes GCS checkpoint/weight downloads that were failing because the inference image lacks the google-crc32c library required by gcloud storage's default check_hashes=always mode. The fix appends gcloud config set storage/check_hashes if_fast_else_skip to the SDK install command in both load_model_weights_sub_commands_gcs and load_model_files_sub_commands_trt_llm, with corresponding test updates in all three affected test cases.

Confidence Score: 5/5

Safe to merge — minimal, focused fix with full test coverage and clear inline documentation.

No P0/P1 issues found. Both affected code paths are updated consistently, tests cover all three GCS cases, and the code comments clearly explain the rationale. The chosen if_fast_else_skip value is appropriate for environments without google-crc32c.

No files require special attention.

Important Files Changed

Filename	Overview
model-engine/model_engine_server/domain/use_cases/llm_model_endpoint_use_cases.py	Adds `gcloud config set storage/check_hashes if_fast_else_skip` to both GCS download paths; fix is well-targeted and the string concatenation is correct.
model-engine/tests/unit/domain/test_llm_use_cases.py	All three GCS test cases updated to include the new `check_hashes` config command in the expected subcommand strings.

Sequence Diagram

sequenceDiagram
    participant Pod
    participant GCS

    Pod->>Pod: curl | tar → install gcloud SDK
    Pod->>Pod: gcloud config set disable_usage_reporting true
    Pod->>Pod: gcloud config set storage/check_hashes if_fast_else_skip
    Pod->>GCS: gcloud storage rsync / cp (checkpoint files)
    GCS-->>Pod: checkpoint files (hash check skipped if no CRC32c lib)

_{Reviews (1): Last reviewed commit: "docs(gcs): tighten check_hashes rational..." | Re-trigger Greptile}

…s without google-crc32c `gcloud storage rsync` and `gcloud storage cp` default to `check_hashes=always`, which requires the `google-crc32c` library to compute CRC32C hashes of downloaded files. The inference container that runs the `load_model_weights_sub_commands_gcs` and `load_model_files_sub_commands_trt_llm` (gs://) chains does not have that lib installed, so every download task fails with: ERROR: Task '...safetensors' failed: This copy was skipped since fast hash calculation tools are not installed. The download still proceeds (TCP-level integrity is intact), but gcloud refuses to mark each file as completed and the rsync exits non-zero — the inference pod then falls through into vLLM/TRT startup with an empty `model_files` directory. Setting `storage/check_hashes=if_fast_else_skip` lets gcloud skip the optional integrity check when the fast hash lib isn't available, while preserving it where it is. End-to-end verified on a GKE-deployed gpt-oss-20b vLLM endpoint: the rsync now completes (~1.7 GiB/s in-region), vLLM loads the safetensors shards, and the pod transitions to Ready. Reproduced and fixed without modifying the inference image. Existing `test_load_model_weights_sub_commands` (GCS branch) and `test_load_model_files_sub_commands_trt_llm_gcs` updated to expect the new config line.

postevanus-scale · 2026-05-06T09:06:31Z

Closing this duplicate PR with #820

postevanus-scale added 3 commits May 5, 2026 16:24

Merge branch 'main' into fix/gcs-check-hashes

43535d8

docs(gcs): tighten check_hashes rationale comments

77bdf32

postevanus-scale requested review from arniechops and lilyz-ai May 6, 2026 08:40

postevanus-scale enabled auto-merge (squash) May 6, 2026 09:05

postevanus-scale closed this May 6, 2026

auto-merge was automatically disabled May 6, 2026 09:06
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/gcs check hashes#821

Fix/gcs check hashes#821
postevanus-scale wants to merge 3 commits intomainfrom
fix/gcs-check-hashes

postevanus-scale commented May 6, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

postevanus-scale commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

postevanus-scale commented May 6, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Summary

Test Plan and Usage Guide

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

postevanus-scale commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

postevanus-scale commented May 6, 2026 •

edited by greptile-apps Bot

Loading