[PSP-449] Add on-prem support to model-engine #725

TarunRavikumar · 2025-10-28T18:06:46Z

Add On-Premise Deployment Support

This PR adds support for on-premise deployments using Redis, MinIO storage.

Key Changes

New on-prem configuration: Added onprem.yaml config file with settings for MinIO, Redis, and private registries
Redis-based infrastructure: Implemented Redis task queues and on-prem queue endpoint delegate
S3-compatible storage: Added support for MinIO and custom S3 endpoints with configurable addressing styles
Container registry flexibility: Support for private registries with OnPremDockerRepository
Database configuration: Environment variable-based PostgreSQL connection for on-prem deployments
Improved logging: Enhanced error handling and debug logs in S3 file storage gateway

Why?

We have contractual requirements to move our stack on-prem.
MCPx needs model engine running locally for on-prem and ci/cd testing.

Next Steps

There's probably more work we need to do in llm-engine to fully bring it to on-prem support. I've only tested that model engine can create new launch endpoints (and healthy pods) from our input docker image. This was what was needed for mcpx-go.

Linear

https://linear.app/scale-epd/issue/PSP-449/llm-engine-on-prem-support

socket-security · 2025-10-28T18:10:06Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	psycopg2-binary@2.9.3 ⏵ 2.9.10	⁺¹

View full report

pr.md

…Config

…em_queue_endpoint_resource_delegate

model-engine/model_engine_server/infra/gateways/s3_file_storage_gateway.py

Address review feedback to use the filesystem_gateway abstraction layer instead of directly calling get_s3_client. This ensures on-prem S3 configuration logic is properly encapsulated in the gateway. Changes: - Add head_object, delete_object, list_objects methods to S3FilesystemGateway - Update S3FileStorageGateway to use self.filesystem_gateway for all S3 ops - Remove direct import of get_s3_client from s3_file_storage_gateway

Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase.

S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results.

Include docker_repo_prefix in image URL to match behavior of ECR and ACR implementations. Also change image_exists logging from warning to debug to reduce log noise on every deployment. Updated tests to mock infra_config and verify prefix handling.

Add explicit elif branches for on-prem cloud provider to make it clear that S3-based gateways are intentionally used for on-prem (with MinIO configuration applied via s3_utils). This improves code readability and makes the on-prem support more discoverable.

…urceDelegate Replace hardcoded queue depth with actual Redis LLEN call to enable proper autoscaling based on queue metrics. Falls back to 0 gracefully if Redis client is unavailable. - Add optional redis_client parameter to constructor - Implement lazy Redis client initialization - Add tests for both with and without Redis scenarios

Using {} as a default argument is a Python anti-pattern that can cause subtle bugs since the same dict instance is shared across calls. Use Optional[Dict] = None pattern instead.

Move the infra_config import from inside the validator to a module-level helper function _is_onprem_deployment(). This improves testability, avoids repeated import overhead on each validation call, and follows Python best practices for imports.

Replace per-call debug logs with a one-time info log when S3 is configured for on-prem. This prevents log spam from debug messages firing on every S3 client creation. - Extract common on-prem config to _get_onprem_client_kwargs helper - Add _s3_config_logged flag to log endpoint only once - Add return type annotations to get_s3_client and get_s3_resource - Update tests to reset logging flag between tests

Clean up unused import left over from refactoring the inline import.

Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms.

…legate

The original code checked os.getenv('AWS_PROFILE') as a fallback when no aws_profile kwarg was provided. This was accidentally removed during refactoring, breaking S3 operations in CI where AWS_PROFILE may be set via environment variable. Restores the original behavior for AWS deployments while maintaining the new on-prem path.

TarunRavikumar changed the title ~~Add on-prem support to model-engine~~ [PSP-449] Add on-prem support to model-engine Oct 28, 2025

MicahFulton reviewed Oct 29, 2025

View reviewed changes

pr.md Outdated Show resolved Hide resolved

TarunRavikumar force-pushed the tr/onprem branch 2 times, most recently from c12c02c to bdc6852 Compare October 30, 2025 19:52

TarunRavikumar force-pushed the tr/onprem branch from bdc6852 to c2eefb5 Compare November 12, 2025 14:55

TarunRavikumar added 5 commits December 11, 2025 14:13

add support for on-prem

a545b1a

clean up on-prem artificats

ca0a703

add back comments from initial code

d7e5fab

fix lint

9b45d81

use ecr image repo:tag directly

f23d823

TarunRavikumar force-pushed the tr/onprem branch from 3aea8cd to f23d823 Compare December 11, 2025 19:50

TarunRavikumar added 10 commits December 11, 2025 14:56

fix: isort import ordering

20a8dc2

fix: remove unused infra_config import

cf4a411

fix: mypy type annotation errors

d19e6f2

Merge branch 'main' into tr/onprem

f72fea2

fix: remove type annotation causing mypy no-redef error

3bea65a

fix: mypy type errors in s3_utils.py and io.py - use botocore.config.…

0ad17fb

…Config

fix: mypy typeddict-item errors - use broad type ignore

48eaac4

fix: update test mocks to use get_s3_resource from s3_utils

5257762

test: add unit tests for s3_utils, onprem_docker_repository, and onpr…

412fe41

…em_queue_endpoint_resource_delegate

style: format test files with black

5b3f796

TarunRavikumar marked this pull request as ready for review December 12, 2025 17:45

TarunRavikumar requested a review from MicahFulton December 15, 2025 17:03

MicahFulton reviewed Dec 15, 2025

View reviewed changes

model-engine/model_engine_server/infra/gateways/s3_file_storage_gateway.py Outdated Show resolved Hide resolved

TarunRavikumar added 5 commits December 15, 2025 13:20

fix: deduplicate S3 client config by using centralized s3_utils

aaca0b8

Refactor open_wrapper to use get_s3_client from s3_utils instead of duplicating the on-prem S3 configuration logic. This ensures a single source of truth for S3 client creation across the codebase.

fix: add pagination to list_objects to handle >1000 objects

2687232

S3 list_objects_v2 returns max 1000 objects per request. Use paginator to iterate through all pages and return complete results. Without this fix, directories with >1000 files would silently return truncated results.

TarunRavikumar added 10 commits December 15, 2025 13:30

fix: replace mutable default argument with None in _get_client

4c85f12

Using {} as a default argument is a Python anti-pattern that can cause subtle bugs since the same dict instance is shared across calls. Use Optional[Dict] = None pattern instead.

chore: remove unused TYPE_CHECKING import

f27d817

Clean up unused import left over from refactoring the inline import.

fix: make Dockerfile multi-arch compatible for ARM/AMD64

2e20e55

Use architecture detection to download the correct binaries for aws-iam-authenticator and kubectl. This enables building the image for both ARM64 (Mac M1/M2) and AMD64 (CI/production) platforms.

style: fix black formatting in test_onprem_queue_endpoint_resource_de…

6c62b72

…legate

fix: correct isort ordering in s3_filesystem_gateway.py

982ecb1

fix: use Literal type for s3 addressing_style to satisfy mypy

df98ddc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PSP-449] Add on-prem support to model-engine #725

[PSP-449] Add on-prem support to model-engine #725

Uh oh!

TarunRavikumar commented Oct 28, 2025 •

edited

Loading

Uh oh!

socket-security bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[PSP-449] Add on-prem support to model-engine #725

Are you sure you want to change the base?

[PSP-449] Add on-prem support to model-engine #725

Uh oh!

Conversation

TarunRavikumar commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add On-Premise Deployment Support

Key Changes

Why?

Next Steps

Linear

Uh oh!

socket-security bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TarunRavikumar commented Oct 28, 2025 •

edited

Loading

socket-security bot commented Oct 28, 2025 •

edited

Loading