-
Notifications
You must be signed in to change notification settings - Fork 69
[PSP-449] Add on-prem support to model-engine #725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
TarunRavikumar
wants to merge
30
commits into
main
Choose a base branch
from
tr/onprem
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
MicahFulton
reviewed
Oct 29, 2025
c12c02c to
bdc6852
Compare
bdc6852 to
c2eefb5
Compare
3aea8cd to
f23d823
Compare
…em_queue_endpoint_resource_delegate
MicahFulton
reviewed
Dec 15, 2025
model-engine/model_engine_server/infra/gateways/s3_file_storage_gateway.py
Outdated
Show resolved
Hide resolved
Address review feedback to use the filesystem_gateway abstraction layer instead of directly calling get_s3_client. This ensures on-prem S3 configuration logic is properly encapsulated in the gateway. Changes: - Add head_object, delete_object, list_objects methods to S3FilesystemGateway - Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops - Remove direct import of get_s3_client from s3_file_storage_gateway
Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase.
S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results.
Include docker_repo_prefix in image URL to match behavior of ECR and ACR implementations. Also change image_exists logging from warning to debug to reduce log noise on every deployment. Updated tests to mock infra_config and verify prefix handling.
Add explicit elif branches for on-prem cloud provider to make it clear that S3-based gateways are intentionally used for on-prem (with MinIO configuration applied via s3_utils). This improves code readability and makes the on-prem support more discoverable.
…urceDelegate Replace hardcoded queue depth with actual Redis LLEN call to enable proper autoscaling based on queue metrics. Falls back to 0 gracefully if Redis client is unavailable. - Add optional redis_client parameter to constructor - Implement lazy Redis client initialization - Add tests for both with and without Redis scenarios
Using {} as a default argument is a Python anti-pattern that can cause
subtle bugs since the same dict instance is shared across calls.
Use Optional[Dict] = None pattern instead.
Move the infra_config import from inside the validator to a module-level helper function _is_onprem_deployment(). This improves testability, avoids repeated import overhead on each validation call, and follows Python best practices for imports.
Replace per-call debug logs with a one-time info log when S3 is configured for on-prem. This prevents log spam from debug messages firing on every S3 client creation. - Extract common on-prem config to _get_onprem_client_kwargs helper - Add _s3_config_logged flag to log endpoint only once - Add return type annotations to get_s3_client and get_s3_resource - Update tests to reset logging flag between tests
Clean up unused import left over from refactoring the inline import.
Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms.
The original code checked os.getenv('AWS_PROFILE') as a fallback when
no aws_profile kwarg was provided. This was accidentally removed during
refactoring, breaking S3 operations in CI where AWS_PROFILE may be set
via environment variable.
Restores the original behavior for AWS deployments while maintaining
the new on-prem path.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add On-Premise Deployment Support
This PR adds support for on-premise deployments using Redis, MinIO storage.
Key Changes
onprem.yamlconfig file with settings for MinIO, Redis, and private registriesOnPremDockerRepositoryWhy?
Next Steps
Linear
https://linear.app/scale-epd/issue/PSP-449/llm-engine-on-prem-support