-
Notifications
You must be signed in to change notification settings - Fork 74
Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
535dfd7
add support for on-prem
tarunravi 02fa305
clean up on-prem artificats
tarunravi eff3dbb
add back comments from initial code
tarunravi 086a2e6
fix lint
tarunravi efeba0d
use ecr image repo:tag directly
tarunravi 5d25267
fix: isort import ordering
tarunravi 4a7ebc5
fix: remove unused infra_config import
tarunravi 871a73d
fix: mypy type annotation errors
tarunravi 0954737
fix: remove type annotation causing mypy no-redef error
tarunravi 74b29b0
fix: mypy type errors in s3_utils.py and io.py - use botocore.config.…
tarunravi bf5a1f4
fix: mypy typeddict-item errors - use broad type ignore
tarunravi 84b153d
fix: update test mocks to use get_s3_resource from s3_utils
tarunravi c37a109
test: add unit tests for s3_utils, onprem_docker_repository, and onpr…
tarunravi bae3472
style: format test files with black
tarunravi 7e6dae7
refactor: use filesystem_gateway abstraction for S3 operations
tarunravi fd0de42
fix: deduplicate S3 client config by using centralized s3_utils
tarunravi f66ab7f
fix: add pagination to list_objects to handle >1000 objects
tarunravi 4f757fa
fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR
tarunravi 2bef11c
refactor: add explicit on-prem branches in dependencies.py for clarity
tarunravi f9d13fe
feat: implement Redis LLEN for queue depth in OnPremQueueEndpointReso…
tarunravi 02dfdd0
fix: replace mutable default argument with None in _get_client
tarunravi 8c2fc5b
refactor: extract inline import to module-level helper function
tarunravi 7bfe43f
fix: reduce excessive debug logging in s3_utils
tarunravi 384b2ed
chore: remove unused TYPE_CHECKING import
tarunravi db22a1f
fix: make Dockerfile multi-arch compatible for ARM/AMD64
tarunravi e818ae4
style: fix black formatting in test_onprem_queue_endpoint_resource_de…
tarunravi 16fbe03
fix: restore AWS_PROFILE env var fallback in s3_utils
tarunravi ea587f6
fix: correct isort ordering in s3_filesystem_gateway.py
tarunravi f592c18
fix: use Literal type for s3 addressing_style to satisfy mypy
tarunravi 3a30bb2
Onprem Compatibility Change
charlesahn-scale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # On-premise deployment configuration | ||
| # This configuration file provides defaults for on-prem deployments | ||
| # Many values can be overridden via environment variables | ||
|
|
||
| cloud_provider: "onprem" | ||
| env: "production" # Can be: production, staging, development, local | ||
| k8s_cluster_name: "onprem-cluster" | ||
| dns_host_domain: "ml.company.local" | ||
| default_region: "us-east-1" # Placeholder for compatibility with cloud-agnostic code | ||
|
|
||
| # ==================== | ||
| # Object Storage (MinIO/S3-compatible) | ||
| # ==================== | ||
| s3_bucket: "model-engine" | ||
| # S3 endpoint URL - can be overridden by S3_ENDPOINT_URL env var | ||
| # Examples: "https://minio.company.local", "http://minio-service:9000" | ||
| s3_endpoint_url: "" # Set via S3_ENDPOINT_URL env var if not specified here | ||
| # MinIO requires path-style addressing (bucket in URL path, not subdomain) | ||
| s3_addressing_style: "path" | ||
|
|
||
| # ==================== | ||
| # Redis Configuration | ||
| # ==================== | ||
| # Redis is used for: | ||
| # - Celery task queue broker | ||
| # - Model endpoint caching | ||
| # - Inference autoscaling metrics | ||
| redis_host: "" # Set via REDIS_HOST env var (e.g., "redis.company.local" or "redis-service") | ||
| redis_port: 6379 | ||
| # Whether to use Redis as Celery broker (true for on-prem) | ||
| celery_broker_type_redis: true | ||
|
|
||
| # ==================== | ||
| # Celery Configuration | ||
| # ==================== | ||
| # Backend protocol: "redis" for on-prem (not "s3" or "abs") | ||
| celery_backend_protocol: "redis" | ||
|
|
||
| # ==================== | ||
| # Database Configuration | ||
| # ==================== | ||
| # Database connection settings (credentials from environment variables) | ||
| # DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD | ||
| db_host: "postgres" # Default hostname, can be overridden by DB_HOST env var | ||
| db_port: 5432 | ||
| db_name: "llm_engine" | ||
| db_engine_pool_size: 20 | ||
| db_engine_max_overflow: 10 | ||
| db_engine_echo: false | ||
| db_engine_echo_pool: false | ||
| db_engine_disconnect_strategy: "pessimistic" | ||
|
|
||
| # ==================== | ||
| # Docker Registry Configuration | ||
| # ==================== | ||
| # Docker registry prefix for container images | ||
| # Examples: "registry.company.local", "harbor.company.local/ml-platform" | ||
| # Leave empty if using full image paths directly | ||
| docker_repo_prefix: "registry.company.local" | ||
|
|
||
| # ==================== | ||
| # Monitoring & Observability | ||
| # ==================== | ||
| # Prometheus server address for metrics (optional) | ||
| # prometheus_server_address: "http://prometheus:9090" | ||
|
|
||
| # ==================== | ||
| # Not applicable for on-prem (kept for compatibility) | ||
| # ==================== | ||
| ml_account_id: "onprem" | ||
| profile_ml_worker: "default" | ||
| profile_ml_inference_worker: "default" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit