-
Notifications
You must be signed in to change notification settings - Fork 72
Model Engine OnPrem Support and vLLM 0.11.1 + Model Engine Integration Fixes #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
535dfd7
add support for on-prem
TarunRavikumar 02fa305
clean up on-prem artificats
TarunRavikumar eff3dbb
add back comments from initial code
TarunRavikumar 086a2e6
fix lint
TarunRavikumar efeba0d
use ecr image repo:tag directly
TarunRavikumar 5d25267
fix: isort import ordering
TarunRavikumar 4a7ebc5
fix: remove unused infra_config import
TarunRavikumar 871a73d
fix: mypy type annotation errors
TarunRavikumar 0954737
fix: remove type annotation causing mypy no-redef error
TarunRavikumar 74b29b0
fix: mypy type errors in s3_utils.py and io.py - use botocore.config.…
TarunRavikumar bf5a1f4
fix: mypy typeddict-item errors - use broad type ignore
TarunRavikumar 84b153d
fix: update test mocks to use get_s3_resource from s3_utils
TarunRavikumar c37a109
test: add unit tests for s3_utils, onprem_docker_repository, and onpr…
TarunRavikumar bae3472
style: format test files with black
TarunRavikumar 7e6dae7
refactor: use filesystem_gateway abstraction for S3 operations
TarunRavikumar fd0de42
fix: deduplicate S3 client config by using centralized s3_utils
TarunRavikumar f66ab7f
fix: add pagination to list_objects to handle >1000 objects
TarunRavikumar 4f757fa
fix: make OnPremDockerRepository.get_image_url consistent with ECR/ACR
TarunRavikumar 2bef11c
refactor: add explicit on-prem branches in dependencies.py for clarity
TarunRavikumar f9d13fe
feat: implement Redis LLEN for queue depth in OnPremQueueEndpointReso…
TarunRavikumar 02dfdd0
fix: replace mutable default argument with None in _get_client
TarunRavikumar 8c2fc5b
refactor: extract inline import to module-level helper function
TarunRavikumar 7bfe43f
fix: reduce excessive debug logging in s3_utils
TarunRavikumar 384b2ed
chore: remove unused TYPE_CHECKING import
TarunRavikumar db22a1f
fix: make Dockerfile multi-arch compatible for ARM/AMD64
TarunRavikumar e818ae4
style: fix black formatting in test_onprem_queue_endpoint_resource_de…
TarunRavikumar 16fbe03
fix: restore AWS_PROFILE env var fallback in s3_utils
TarunRavikumar ea587f6
fix: correct isort ordering in s3_filesystem_gateway.py
TarunRavikumar f592c18
fix: use Literal type for s3 addressing_style to satisfy mypy
TarunRavikumar 3a30bb2
Onprem Compatibility Change
charlesahn-scale File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # On-premise deployment configuration | ||
| # This configuration file provides defaults for on-prem deployments | ||
| # Many values can be overridden via environment variables | ||
|
|
||
| cloud_provider: "onprem" | ||
| env: "production" # Can be: production, staging, development, local | ||
| k8s_cluster_name: "onprem-cluster" | ||
| dns_host_domain: "ml.company.local" | ||
| default_region: "us-east-1" # Placeholder for compatibility with cloud-agnostic code | ||
|
|
||
| # ==================== | ||
| # Object Storage (MinIO/S3-compatible) | ||
| # ==================== | ||
| s3_bucket: "model-engine" | ||
| # S3 endpoint URL - can be overridden by S3_ENDPOINT_URL env var | ||
| # Examples: "https://minio.company.local", "http://minio-service:9000" | ||
| s3_endpoint_url: "" # Set via S3_ENDPOINT_URL env var if not specified here | ||
| # MinIO requires path-style addressing (bucket in URL path, not subdomain) | ||
| s3_addressing_style: "path" | ||
|
|
||
| # ==================== | ||
| # Redis Configuration | ||
| # ==================== | ||
| # Redis is used for: | ||
| # - Celery task queue broker | ||
| # - Model endpoint caching | ||
| # - Inference autoscaling metrics | ||
| redis_host: "" # Set via REDIS_HOST env var (e.g., "redis.company.local" or "redis-service") | ||
| redis_port: 6379 | ||
| # Whether to use Redis as Celery broker (true for on-prem) | ||
| celery_broker_type_redis: true | ||
|
|
||
| # ==================== | ||
| # Celery Configuration | ||
| # ==================== | ||
| # Backend protocol: "redis" for on-prem (not "s3" or "abs") | ||
| celery_backend_protocol: "redis" | ||
|
|
||
| # ==================== | ||
| # Database Configuration | ||
| # ==================== | ||
| # Database connection settings (credentials from environment variables) | ||
| # DB_HOST, DB_PORT, DB_NAME, DB_USER, DB_PASSWORD | ||
| db_host: "postgres" # Default hostname, can be overridden by DB_HOST env var | ||
| db_port: 5432 | ||
| db_name: "llm_engine" | ||
| db_engine_pool_size: 20 | ||
| db_engine_max_overflow: 10 | ||
| db_engine_echo: false | ||
| db_engine_echo_pool: false | ||
| db_engine_disconnect_strategy: "pessimistic" | ||
|
|
||
| # ==================== | ||
| # Docker Registry Configuration | ||
| # ==================== | ||
| # Docker registry prefix for container images | ||
| # Examples: "registry.company.local", "harbor.company.local/ml-platform" | ||
| # Leave empty if using full image paths directly | ||
| docker_repo_prefix: "registry.company.local" | ||
|
|
||
| # ==================== | ||
| # Monitoring & Observability | ||
| # ==================== | ||
| # Prometheus server address for metrics (optional) | ||
| # prometheus_server_address: "http://prometheus:9090" | ||
|
|
||
| # ==================== | ||
| # Not applicable for on-prem (kept for compatibility) | ||
| # ==================== | ||
| ml_account_id: "onprem" | ||
| profile_ml_worker: "default" | ||
| profile_ml_inference_worker: "default" |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit