Fix/gcs check hashes#821
Closed
postevanus-scale wants to merge 3 commits intomainfrom
Closed
Conversation
…s without google-crc32c
`gcloud storage rsync` and `gcloud storage cp` default to `check_hashes=always`,
which requires the `google-crc32c` library to compute CRC32C hashes of downloaded
files. The inference container that runs the `load_model_weights_sub_commands_gcs`
and `load_model_files_sub_commands_trt_llm` (gs://) chains does not have that lib
installed, so every download task fails with:
ERROR: Task '...safetensors' failed: This copy was skipped since fast hash
calculation tools are not installed.
The download still proceeds (TCP-level integrity is intact), but gcloud refuses
to mark each file as completed and the rsync exits non-zero — the inference pod
then falls through into vLLM/TRT startup with an empty `model_files` directory.
Setting `storage/check_hashes=if_fast_else_skip` lets gcloud skip the optional
integrity check when the fast hash lib isn't available, while preserving it
where it is. End-to-end verified on a GKE-deployed gpt-oss-20b vLLM endpoint:
the rsync now completes (~1.7 GiB/s in-region), vLLM loads the safetensors
shards, and the pod transitions to Ready. Reproduced and fixed without
modifying the inference image.
Existing `test_load_model_weights_sub_commands` (GCS branch) and
`test_load_model_files_sub_commands_trt_llm_gcs` updated to expect the new
config line.
Collaborator
Author
|
Closing this duplicate PR with #820 |
auto-merge was automatically disabled
May 6, 2026 09:06
Pull request was closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Summary
What is this PR changing? Why is this change being made? Any caveats you'd like to highlight? Link any relevant documents, links, or screenshots here if applicable.
Test Plan and Usage Guide
How did you validate that your PR works correctly? How do you run or demo the code? Provide enough detail so a reviewer can reasonably reproduce the testing procedure. Paste example command line invocations if applicable.
Greptile Summary
This PR fixes GCS checkpoint/weight downloads that were failing because the inference image lacks the
google-crc32clibrary required bygcloud storage's defaultcheck_hashes=alwaysmode. The fix appendsgcloud config set storage/check_hashes if_fast_else_skipto the SDK install command in bothload_model_weights_sub_commands_gcsandload_model_files_sub_commands_trt_llm, with corresponding test updates in all three affected test cases.Confidence Score: 5/5
Safe to merge — minimal, focused fix with full test coverage and clear inline documentation.
No P0/P1 issues found. Both affected code paths are updated consistently, tests cover all three GCS cases, and the code comments clearly explain the rationale. The chosen
if_fast_else_skipvalue is appropriate for environments withoutgoogle-crc32c.No files require special attention.
Important Files Changed
gcloud config set storage/check_hashes if_fast_else_skipto both GCS download paths; fix is well-targeted and the string concatenation is correct.check_hashesconfig command in the expected subcommand strings.Sequence Diagram
sequenceDiagram participant Pod participant GCS Pod->>Pod: curl | tar → install gcloud SDK Pod->>Pod: gcloud config set disable_usage_reporting true Pod->>Pod: gcloud config set storage/check_hashes if_fast_else_skip Pod->>GCS: gcloud storage rsync / cp (checkpoint files) GCS-->>Pod: checkpoint files (hash check skipped if no CRC32c lib)Reviews (1): Last reviewed commit: "docs(gcs): tighten check_hashes rational..." | Re-trigger Greptile