fix(model-engine): remediate Trivy vulnerability findings by scale-ballen · Pull Request #818 · scaleapi/llm-engine

scale-ballen · 2026-05-01T20:36:40Z

Summary

Raise vulnerable model-engine Python dependencies to Trivy-fixed versions and regenerate requirements.txt
Build kubectl from Kubernetes v1.35.4 so the embedded github.com/moby/spdystream dependency is fixed
Remove pip from the runtime venv after installation so runtime scans no longer report pip CVEs

Verification

docker build -f model-engine/Dockerfile -t model-engine:trivy-remediation-local .
Runtime smoke checks passed: upgraded dependency imports, fixed package versions, pip absent, kubectl version --client=true --output=yaml reports v1.35.4
FastAPI /healthcheck smoke test passed in the rebuilt image with local fake AWS config
trivy image --scanners vuln --list-all-pkgs --format json --output trivy-model-engine-remediation-2026-05-01/model-engine-trivy-remediation-local-vuln-all-pkgs.json --timeout 30m model-engine:trivy-remediation-local

Trivy Result

wolfi OS packages: 25 packages, 0 vulnerabilities
Python packages: 220 packages, 0 vulnerabilities
usr/local/bin/aws-iam-authenticator: 90 packages, 0 vulnerabilities
usr/local/bin/kubectl: 82 packages, 0 vulnerabilities

Greptile Summary

This PR remediates Trivy vulnerability findings in the model-engine image by bumping a range of Python dependencies to patched versions, upgrading kubectl from v1.35.3 to v1.35.4 (fixing the moby/spdystream transitive CVE), and removing pip from the runtime venv post-install so Trivy no longer reports pip CVEs at scan time. Two code changes accompany the dependency churn: SPIECE_UNDERLINE is inlined as "\u2581" since it was dropped from the transformers public API in 5.x, and the HF-repo fallback logic in live_tokenizer_repository.py is cleaned up to avoid the anti-pattern of raising RepositoryNotFoundError immediately to catch it.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/behavioral notes, no logic defects introduced by the PR.

The changes are security-focused version bumps with verified Trivy results and smoke-test confirmation. The only notable risk is the transformers 4.x → 5.x major version jump's potential for subtle tokenization behavioral changes, but this is a P2 observation — no current defect is demonstrated. All other changes are straightforward version increments or minor code cleanups.

model-engine/requirements.in and model-engine/requirements.txt warrant attention due to the transformers 4.x → 5.x and huggingface-hub 0.x → 1.x major version jumps.

Important Files Changed

Filename	Overview
model-engine/Dockerfile	Bumps pip to 26.1, adds pip uninstall after package installation to remove CVE surface from runtime, and bumps kubectl build tag from v1.35.3 to v1.35.4
model-engine/model_engine_server/inference/tensorrt-llm/triton_model_repo/postprocessing/1/model.py	Removes SPIECE_UNDERLINE import from transformers (removed from public API in 5.x) and inlines the correct Unicode constant U+2581
model-engine/model_engine_server/infra/repositories/live_tokenizer_repository.py	Refactors HF repo fallback logic: eliminates the anti-pattern of raising RepositoryNotFoundError immediately to catch it, now cleanly branches on hf_repo presence with equivalent semantics
model-engine/requirements.in	Multiple security-driven version bumps including a major transformers 4.x → 5.x upgrade; also pins flask, mako, pygments, filelock, h2, marshmallow, zipp as direct deps for CVE remediation
model-engine/requirements.txt	Regenerated lock file reflecting all bumped versions; notable: transformers 4.55.4 → 5.7.0 and huggingface-hub 0.36.2 → 1.13.0 (both major version changes)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Docker Build - builder stage] --> B[pip install deps from requirements.txt]
    B --> C[pip install -r requirements_override.txt]
    C --> D[pip install -e . model-engine]
    D --> E[pip uninstall -y pip\nremoves pip CVE surface]
    E --> F[Build kubectl v1.35.4\nfixes moby/spdystream vuln]
    F --> G[Runtime image - model-engine stage]
    G --> H[Copy venv without pip]
    G --> I[Copy kubectl binary]

    J[live_tokenizer_repository.py] --> K{hf_repo set?}
    K -- Yes --> L[list_repo_refs on HF Hub]
    L -- Found --> M[Use HF repo directly]
    L -- RepositoryNotFoundError --> N[_load_tokenizer_from_s3]
    K -- No --> N

    O[model.py - Triton postprocessing] --> P[Use local SPIECE_UNDERLINE = U+2581\ninstead of transformers import]

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
model-engine/requirements.in:71
**Major version bump: transformers 4.x → 5.x**

`transformers` 5.x consolidated slow (Python/SentencePiece) and fast (Rust) tokenizer backends into a single implementation per model, with the Rust backend now the default. This changes the default code path for `AutoTokenizer.from_pretrained` in `live_tokenizer_repository.py`. One reported consequence is that `LlamaTokenizer` in v5 overrides `tokenizer.json`'s `ByteLevel` pre-tokenizer with `Metaspace`, silently producing different tokenization for models like the DeepSeek V3/R1 family ([transformers#45488](https://github.com/huggingface/transformers/issues/45488)). Smoke tests covering a single healthcheck endpoint may not catch per-token output differences on production model traffic.

_{Reviews (4): Last reviewed commit: "fix(model-engine): preserve tokenizer s3..." | Re-trigger Greptile}

socket-security · 2026-05-01T20:46:36Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Vulnerability
huggingface-hub@0.36.2 ⏵ 1.13.0	⁺¹⁰
pygments@2.15.1 ⏵ 2.20.0	⁺¹	⁺¹
mako@1.2.4 ⏵ 1.3.12	⁺¹	⁺²
azure-identity@1.15.0 ⏵ 1.25.3	⁺¹	⁺²
flask@3.0.3 ⏵ 3.1.3		⁺¹
gcloud-aio-auth@5.4.2 ⏵ 5.4.4
quart@0.19.9 ⏵ 0.20.0	⁺¹	⁺²
msal-extensions@1.1.0 ⏵ 1.3.1
typer@0.23.1
filelock@3.13.1 ⏵ 3.29.0		⁺³
python-multipart@0.0.22 ⏵ 0.0.27	⁺¹	⁺²
tokenizers@0.21.4 ⏵ 0.22.2	⁺¹
h2@4.1.0 ⏵ 4.3.0	⁺¹	⁺²
regex@2023.10.3 ⏵ 2026.4.4	⁺¹
zipp@3.16.0 ⏵ 3.23.1		⁺²
cryptography@46.0.5 ⏵ 47.0.0	⁺¹	⁺³
hf-xet@1.4.2 ⏵ 1.4.3
marshmallow@3.19.0 ⏵ 3.26.2	⁺¹	⁺²
hpack@4.0.0 ⏵ 4.1.0	⁺¹
blinker@1.6.2 ⏵ 1.9.0
hyperframe@6.0.1 ⏵ 6.1.0
itsdangerous@2.1.2 ⏵ 2.2.0

View full report

scale-ballen added 2 commits May 1, 2026 16:36

fix(model-engine): remediate trivy vulnerability findings

61fbc4d

fix(model-engine): address review feedback

54d564d

scale-ballen added 2 commits May 1, 2026 16:53

fix(model-engine): avoid internal transformers spiece export

e72dcc0

fix(model-engine): preserve tokenizer s3 fallback

73c2e68

scale-ballen requested a review from lilyz-ai May 1, 2026 21:46

lilyz-ai approved these changes May 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(model-engine): remediate Trivy vulnerability findings#818

fix(model-engine): remediate Trivy vulnerability findings#818
scale-ballen wants to merge 4 commits intomainfrom
sec/model-engine-trivy-vuln-fixes

scale-ballen commented May 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

socket-security Bot commented May 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

scale-ballen commented May 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Trivy Result

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

socket-security Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

scale-ballen commented May 1, 2026 •

edited by greptile-apps Bot

Loading

socket-security Bot commented May 1, 2026 •

edited

Loading