Revamp model internals, unify generation flow, and extend platform support #89

CodeWithKyrian · 2025-07-21T10:13:47Z

This PR delivers a major refactor focused on simplifying the generation pipeline, improving cross-platform native support, extending model capabilities, and enhancing overall maintainability.

Highlights:

Stopping Criteria: Added MaxLength, MaxTime, EosToken and Interruptable criteria for better generation control.
PretrainedConfig: Introduced base config to reduce boilerplate across model files.
Platform Package: Converted to platform-package with dynamic loading support for Linux, macOS, and Windows (x86_64 & ARM64).
Optimized FFI: Cleaned up FFI bindings for better reliability and developer ergonomics.
AutoModel Refactor: Unified resolution logic to prioritize generic models and avoid accidental task-specific selection.
New Model Support: Added Gemma, Qwen3, Phi, and other language models.
Introduced token-per-second (TPS) benchmarking
Improved logits processors and model session handling
Enhanced Audio/Image Utilities: Upgraded constructors and added better error handling.
Refactored BPEModel to support new merge format
PSR-3 Logging: Logging support throughout the codebase, now defaults to NullLogger
Added full inference and utility test coverage
Bumped shared library versions -onnxruntime and rindowmatlib
Updated documentation and examples

Resolves #88
Resolves #29

No known breaking changes.

- New Stopping Criteria: Added MaxLength, MaxTime, and Interruptable stopping criteria for more flexible generation control. - Streamers Refactor: Simplified streamer implementation to improve clarity. - Performance Benchmarking: Introduced a token-per-second (TPS) metric to benchmark model performance across updates. - Bug Fixes: - Fixed an error in the Tensor::slice() method - Corrected the RepetitionPenaltyLogitsProcessor to properly utilize tokens.

- Converted library to a platform-package with cross-platform native library support - Added platform-specific shared libraries for Linux, macOS, and Windows (x86_64 and ARM64) - Refactored FFI library loading mechanism to support dynamic library resolution - Updated composer configuration to support platform-specific package installation - Introduced new native library classes for OpenBLAS, RindowMatlib, and other dependencies - Simplified native library interaction and improved platform compatibility - Updated example configurations and test cases to reflect new architecture

- Updated Samplerate FFI method to handle errors more robustly - Modified Audio and Image utility classes to use new constructor patterns

…logic, and FFI usage - Reorganized methods within `PretrainedModel` for better readability and logical flow - Fixed usage logic for various LogitsProcessors including `NoBadWordsLogitsProcessor`, `MinNewTokensLengthLogitsProcessor`, `ForcedBOSTokenLogitsProcessor`, and `SuppressTokensAtBeginLogitsProcessor`. - Removed usage of `ForceTokensLogitsProcessor`. - Refactored `Samplerate` and `Sndfile` FFI wrappers from static methods to an instance-based approach.

…d consistency - Enhanced explanations for model conversion and usage to better guide users.

…tlib -> 1.1.0 - Improved the Native libraries to work with versioning in binary names - Implemented dynamic library path generation based on platform-specific templates.

…R for consistency

…ries

…tionPipeline

- Use `$sessions` array in PretrainedModel instead of separate session properties - Remove redundant constructors in model subclasses - Minor logging and doc improvements in Audio and Image utils - Update .gitignore for log files

[skip ci]

- Updated model class mapping constants in AutoModel and its subclasses to use a unified naming convention. - Removed redundant MODEL_CLASS_MAPPINGS constants in favor of direct usage of MODELS. - Improved code readability and maintainability by consolidating model definitions.

- Introduced new model classes: Gemma, Gemma2, Gemma3, Qwen3, and Phi with their respective causal language models. - Enhanced model handling in auto models for better lookup

…mapping - Added support for the new merge format in BPEModel, allowing for direct usage of merges as arrays. - Improved the creation of the ranks map by using JSON encoded pairs as keys for better compatibility and performance.

… models in AutoModel This ensures `AutoModel::fromPretrained` returns the most generic model class when no specific task model is found, improving expected behavior and reducing accidental task-specific model selection.

CodeWithKyrian added 30 commits November 20, 2024 09:46

feat: new PretrainedConfig reducing code repetition across model files

03df97a

feat: Update exampledx

cbfd6e1

refactor: Simplify library components and improve error handling

a7dbad6

- Updated Samplerate FFI method to handle errors more robustly - Modified Audio and Image utility classes to use new constructor patterns

chore: Update README and getting-started documentation for clarity an…

8ce4c9c

…d consistency - Enhanced explanations for model conversion and usage to better guide users.

feat: Bump shared libraries versions, onnxruntime -> 1.21.0, rindowma…

e7a91cf

…tlib -> 1.1.0 - Improved the Native libraries to work with versioning in binary names - Implemented dynamic library path generation based on platform-specific templates.

fix: depedency version compatibility for PHP 8.1

d39a23f

chore: Update tests for improved clarity and functionality

4a01a49

fix(tests): Correct path joining in HubTest to use DIRECTORY_SEPARATO…

fb57fca

…R for consistency

feat: Enhance image processing methods and improve Vips integration

d63ad09

feat: streamline generation config merging

5e479da

feat: Remove redundant install command

38b8c5f

feat: code style improvements across board

e89b340

test: add comprehensive tests Inference session

9e2ee6b

tests: add comprehensive tests for Image utility

875ba04

refactor: Move AutoConfig and PretrainedConfig to appropriate directo…

57d019e

…ries

refactor: Improve tokenizer type detection logic when not specified.

024ed11

feat: Add support for eos and last_token pooling in FeatureExtrac…

bf080f7

…tionPipeline

feat: add PSR-3 logging support

0ec177a

refactor: rename PretrainedMixin to AutoModelBase

688ef6c

feat: extend PretrainedConfig with additional model support

a1da846

[skip ci]

feat: add support for new models and update text generation pipeline

894bc90

- Introduced new model classes: Gemma, Gemma2, Gemma3, Qwen3, and Phi with their respective causal language models. - Enhanced model handling in auto models for better lookup

refactor: improve auto-model resolution for task

5c90090

chore: update rindow matlib binary version from 1.1.0 to 1.1.1

648e1b4

CodeWithKyrian added 3 commits July 21, 2025 10:22

fix: explicitly check for null when adding BPE node to queue

da188aa

chore: update jinja-php version from 1.0 to 2.0

c295dca

Merge branch 'main' into model-revamp

3b64b88

CodeWithKyrian merged commit 5c1de50 into main Jul 21, 2025
24 checks passed

CodeWithKyrian deleted the model-revamp branch July 21, 2025 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revamp model internals, unify generation flow, and extend platform support #89

Revamp model internals, unify generation flow, and extend platform support #89

Uh oh!

CodeWithKyrian commented Jul 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Revamp model internals, unify generation flow, and extend platform support #89

Revamp model internals, unify generation flow, and extend platform support #89

Uh oh!

Conversation

CodeWithKyrian commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Highlights:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CodeWithKyrian commented Jul 21, 2025 •

edited

Loading