Skip to content

fix(model-engine): remediate Trivy vulnerability findings#818

Open
scale-ballen wants to merge 4 commits intomainfrom
sec/model-engine-trivy-vuln-fixes
Open

fix(model-engine): remediate Trivy vulnerability findings#818
scale-ballen wants to merge 4 commits intomainfrom
sec/model-engine-trivy-vuln-fixes

Conversation

@scale-ballen
Copy link
Copy Markdown
Contributor

@scale-ballen scale-ballen commented May 1, 2026

Summary

  • Raise vulnerable model-engine Python dependencies to Trivy-fixed versions and regenerate requirements.txt
  • Build kubectl from Kubernetes v1.35.4 so the embedded github.com/moby/spdystream dependency is fixed
  • Remove pip from the runtime venv after installation so runtime scans no longer report pip CVEs

Verification

  • docker build -f model-engine/Dockerfile -t model-engine:trivy-remediation-local .
  • Runtime smoke checks passed: upgraded dependency imports, fixed package versions, pip absent, kubectl version --client=true --output=yaml reports v1.35.4
  • FastAPI /healthcheck smoke test passed in the rebuilt image with local fake AWS config
  • trivy image --scanners vuln --list-all-pkgs --format json --output trivy-model-engine-remediation-2026-05-01/model-engine-trivy-remediation-local-vuln-all-pkgs.json --timeout 30m model-engine:trivy-remediation-local

Trivy Result

  • wolfi OS packages: 25 packages, 0 vulnerabilities
  • Python packages: 220 packages, 0 vulnerabilities
  • usr/local/bin/aws-iam-authenticator: 90 packages, 0 vulnerabilities
  • usr/local/bin/kubectl: 82 packages, 0 vulnerabilities

Greptile Summary

This PR remediates Trivy vulnerability findings in the model-engine image by bumping a range of Python dependencies to patched versions, upgrading kubectl from v1.35.3 to v1.35.4 (fixing the moby/spdystream transitive CVE), and removing pip from the runtime venv post-install so Trivy no longer reports pip CVEs at scan time. Two code changes accompany the dependency churn: SPIECE_UNDERLINE is inlined as "\u2581" since it was dropped from the transformers public API in 5.x, and the HF-repo fallback logic in live_tokenizer_repository.py is cleaned up to avoid the anti-pattern of raising RepositoryNotFoundError immediately to catch it.

Confidence Score: 5/5

Safe to merge; all findings are P2 style/behavioral notes, no logic defects introduced by the PR.

The changes are security-focused version bumps with verified Trivy results and smoke-test confirmation. The only notable risk is the transformers 4.x → 5.x major version jump's potential for subtle tokenization behavioral changes, but this is a P2 observation — no current defect is demonstrated. All other changes are straightforward version increments or minor code cleanups.

model-engine/requirements.in and model-engine/requirements.txt warrant attention due to the transformers 4.x → 5.x and huggingface-hub 0.x → 1.x major version jumps.

Important Files Changed

Filename Overview
model-engine/Dockerfile Bumps pip to 26.1, adds pip uninstall after package installation to remove CVE surface from runtime, and bumps kubectl build tag from v1.35.3 to v1.35.4
model-engine/model_engine_server/inference/tensorrt-llm/triton_model_repo/postprocessing/1/model.py Removes SPIECE_UNDERLINE import from transformers (removed from public API in 5.x) and inlines the correct Unicode constant U+2581
model-engine/model_engine_server/infra/repositories/live_tokenizer_repository.py Refactors HF repo fallback logic: eliminates the anti-pattern of raising RepositoryNotFoundError immediately to catch it, now cleanly branches on hf_repo presence with equivalent semantics
model-engine/requirements.in Multiple security-driven version bumps including a major transformers 4.x → 5.x upgrade; also pins flask, mako, pygments, filelock, h2, marshmallow, zipp as direct deps for CVE remediation
model-engine/requirements.txt Regenerated lock file reflecting all bumped versions; notable: transformers 4.55.4 → 5.7.0 and huggingface-hub 0.36.2 → 1.13.0 (both major version changes)

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Docker Build - builder stage] --> B[pip install deps from requirements.txt]
    B --> C[pip install -r requirements_override.txt]
    C --> D[pip install -e . model-engine]
    D --> E[pip uninstall -y pip\nremoves pip CVE surface]
    E --> F[Build kubectl v1.35.4\nfixes moby/spdystream vuln]
    F --> G[Runtime image - model-engine stage]
    G --> H[Copy venv without pip]
    G --> I[Copy kubectl binary]

    J[live_tokenizer_repository.py] --> K{hf_repo set?}
    K -- Yes --> L[list_repo_refs on HF Hub]
    L -- Found --> M[Use HF repo directly]
    L -- RepositoryNotFoundError --> N[_load_tokenizer_from_s3]
    K -- No --> N

    O[model.py - Triton postprocessing] --> P[Use local SPIECE_UNDERLINE = U+2581\ninstead of transformers import]
Loading

Fix All in Cursor Fix All in Claude Code Fix All in Codex

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
model-engine/requirements.in:71
**Major version bump: transformers 4.x → 5.x**

`transformers` 5.x consolidated slow (Python/SentencePiece) and fast (Rust) tokenizer backends into a single implementation per model, with the Rust backend now the default. This changes the default code path for `AutoTokenizer.from_pretrained` in `live_tokenizer_repository.py`. One reported consequence is that `LlamaTokenizer` in v5 overrides `tokenizer.json`'s `ByteLevel` pre-tokenizer with `Metaspace`, silently producing different tokenization for models like the DeepSeek V3/R1 family ([transformers#45488](https://github.com/huggingface/transformers/issues/45488)). Smoke tests covering a single healthcheck endpoint may not catch per-token output differences on production model traffic.

Reviews (4): Last reviewed commit: "fix(model-engine): preserve tokenizer s3..." | Re-trigger Greptile

@scale-ballen scale-ballen requested a review from lilyz-ai May 1, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants