Skip to content

Conversation

@TarunRavikumar
Copy link
Collaborator

@TarunRavikumar TarunRavikumar commented Oct 28, 2025

Add On-Premise Deployment Support

This PR adds support for on-premise deployments using Redis, MinIO storage.

Key Changes

  • New on-prem configuration: Added onprem.yaml config file with settings for MinIO, Redis, and private registries
  • Redis-based infrastructure: Implemented Redis task queues and on-prem queue endpoint delegate
  • S3-compatible storage: Added support for MinIO and custom S3 endpoints with configurable addressing styles
  • Container registry flexibility: Support for private registries with OnPremDockerRepository
  • Database configuration: Environment variable-based PostgreSQL connection for on-prem deployments
  • Improved logging: Enhanced error handling and debug logs in S3 file storage gateway

Why?

  • We have contractual requirements to move our stack on-prem.
  • MCPx needs model engine running locally for on-prem and ci/cd testing.

Next Steps

  • There's probably more work we need to do in llm-engine to fully bring it to on-prem support. I've only tested that model engine can create new launch endpoints (and healthy pods) from our input docker image. This was what was needed for mcpx-go.

Linear

https://linear.app/scale-epd/issue/PSP-449/llm-engine-on-prem-support

@socket-security
Copy link

socket-security bot commented Oct 28, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Updatedpsycopg2-binary@​2.9.3 ⏵ 2.9.10100 +110010010070

View full report

@TarunRavikumar TarunRavikumar changed the title Add on-prem support to model-engine [PSP-449] Add on-prem support to model-engine Oct 28, 2025
@TarunRavikumar TarunRavikumar marked this pull request as ready for review December 12, 2025 17:45
Address review feedback to use the filesystem_gateway abstraction layer
instead of directly calling get_s3_client. This ensures on-prem S3
configuration logic is properly encapsulated in the gateway.

Changes:
- Add head_object, delete_object, list_objects methods to S3FilesystemGateway
- Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops
- Remove direct import of get_s3_client from s3_file_storage_gateway
Refactor open_wrapper to use get_s3_client from s3_utils instead of
duplicating the on-prem S3 configuration logic. This ensures a single
source of truth for S3 client creation across the codebase.
S3 list_objects_v2 returns max 1000 objects per request. Use paginator
to iterate through all pages and return complete results. Without this
fix, directories with >1000 files would silently return truncated results.
Include docker_repo_prefix in image URL to match behavior of ECR and ACR
implementations. Also change image_exists logging from warning to debug
to reduce log noise on every deployment.

Updated tests to mock infra_config and verify prefix handling.
Add explicit elif branches for on-prem cloud provider to make it clear
that S3-based gateways are intentionally used for on-prem (with MinIO
configuration applied via s3_utils). This improves code readability
and makes the on-prem support more discoverable.
…urceDelegate

Replace hardcoded queue depth with actual Redis LLEN call to enable
proper autoscaling based on queue metrics. Falls back to 0 gracefully
if Redis client is unavailable.

- Add optional redis_client parameter to constructor
- Implement lazy Redis client initialization
- Add tests for both with and without Redis scenarios
Using {} as a default argument is a Python anti-pattern that can cause
subtle bugs since the same dict instance is shared across calls.
Use Optional[Dict] = None pattern instead.
Move the infra_config import from inside the validator to a module-level
helper function _is_onprem_deployment(). This improves testability,
avoids repeated import overhead on each validation call, and follows
Python best practices for imports.
Replace per-call debug logs with a one-time info log when S3 is
configured for on-prem. This prevents log spam from debug messages
firing on every S3 client creation.

- Extract common on-prem config to _get_onprem_client_kwargs helper
- Add _s3_config_logged flag to log endpoint only once
- Add return type annotations to get_s3_client and get_s3_resource
- Update tests to reset logging flag between tests
Clean up unused import left over from refactoring the inline import.
Use architecture detection to download the correct binaries for
aws-iam-authenticator and kubectl. This enables building the image
for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms.
The original code checked os.getenv('AWS_PROFILE') as a fallback when
no aws_profile kwarg was provided. This was accidentally removed during
refactoring, breaking S3 operations in CI where AWS_PROFILE may be set
via environment variable.

Restores the original behavior for AWS deployments while maintaining
the new on-prem path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants