Skip to content

Conversation

@CodeWithKyrian
Copy link
Owner

@CodeWithKyrian CodeWithKyrian commented Jul 21, 2025

This PR delivers a major refactor focused on simplifying the generation pipeline, improving cross-platform native support, extending model capabilities, and enhancing overall maintainability.

Highlights:

  • Stopping Criteria: Added MaxLength, MaxTime, EosToken and Interruptable criteria for better generation control.
  • PretrainedConfig: Introduced base config to reduce boilerplate across model files.
  • Platform Package: Converted to platform-package with dynamic loading support for Linux, macOS, and Windows (x86_64 & ARM64).
  • Optimized FFI: Cleaned up FFI bindings for better reliability and developer ergonomics.
  • AutoModel Refactor: Unified resolution logic to prioritize generic models and avoid accidental task-specific selection.
  • New Model Support: Added Gemma, Qwen3, Phi, and other language models.
  • Introduced token-per-second (TPS) benchmarking
  • Improved logits processors and model session handling
  • Enhanced Audio/Image Utilities: Upgraded constructors and added better error handling.
  • Refactored BPEModel to support new merge format
  • PSR-3 Logging: Logging support throughout the codebase, now defaults to NullLogger
  • Added full inference and utility test coverage
  • Bumped shared library versions -onnxruntime and rindowmatlib
  • Updated documentation and examples

Resolves #88
Resolves #29

No known breaking changes.

- New Stopping Criteria: Added MaxLength, MaxTime, and Interruptable stopping criteria for more flexible generation control.
- Streamers Refactor: Simplified streamer implementation to improve clarity.
- Performance Benchmarking: Introduced a token-per-second (TPS) metric to benchmark model performance across updates.
- Bug Fixes:
  - Fixed an error in the Tensor::slice() method
  - Corrected the RepetitionPenaltyLogitsProcessor to properly utilize tokens.
- Converted library to a platform-package with cross-platform native library support
- Added platform-specific shared libraries for Linux, macOS, and Windows (x86_64 and ARM64)
- Refactored FFI library loading mechanism to support dynamic library resolution
- Updated composer configuration to support platform-specific package installation
- Introduced new native library classes for OpenBLAS, RindowMatlib, and other dependencies
- Simplified native library interaction and improved platform compatibility
- Updated example configurations and test cases to reflect new architecture
- Updated Samplerate FFI method to handle errors more robustly
- Modified Audio and Image utility classes to use new constructor patterns
…logic, and FFI usage

- Reorganized methods within `PretrainedModel` for better readability and logical flow
- Fixed usage logic for various LogitsProcessors including `NoBadWordsLogitsProcessor`, `MinNewTokensLengthLogitsProcessor`, `ForcedBOSTokenLogitsProcessor`, and `SuppressTokensAtBeginLogitsProcessor`.
- Removed usage of `ForceTokensLogitsProcessor`.
- Refactored `Samplerate` and `Sndfile` FFI wrappers from static methods to an instance-based approach.
…d consistency

- Enhanced explanations for model conversion and usage to better guide users.
…tlib -> 1.1.0

- Improved the Native libraries to work with versioning in binary names
- Implemented dynamic library path generation based on platform-specific templates.
- Use `$sessions` array in PretrainedModel instead of separate session properties
- Remove redundant constructors in model subclasses
- Minor logging and doc improvements in Audio and Image utils
- Update .gitignore for log files
- Updated model class mapping constants in AutoModel and its subclasses to use a unified naming convention.
- Removed redundant MODEL_CLASS_MAPPINGS constants in favor of direct usage of MODELS.
- Improved code readability and maintainability by consolidating model definitions.
- Introduced new model classes: Gemma, Gemma2, Gemma3, Qwen3, and Phi with their respective causal language models.
- Enhanced model handling in auto models for better lookup
…mapping

- Added support for the new merge format in BPEModel, allowing for direct usage of merges as arrays.
- Improved the creation of the ranks map by using JSON encoded pairs as keys for better compatibility and performance.
… models in AutoModel

This ensures `AutoModel::fromPretrained`  returns the most generic model class when no specific task model is found, improving expected behavior and reducing accidental task-specific model selection.
@CodeWithKyrian CodeWithKyrian merged commit 5c1de50 into main Jul 21, 2025
24 checks passed
@CodeWithKyrian CodeWithKyrian deleted the model-revamp branch July 21, 2025 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Matlib version error Add support for PHI3 model

2 participants