-
Notifications
You must be signed in to change notification settings - Fork 49
Revamp model internals, unify generation flow, and extend platform support #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- New Stopping Criteria: Added MaxLength, MaxTime, and Interruptable stopping criteria for more flexible generation control. - Streamers Refactor: Simplified streamer implementation to improve clarity. - Performance Benchmarking: Introduced a token-per-second (TPS) metric to benchmark model performance across updates. - Bug Fixes: - Fixed an error in the Tensor::slice() method - Corrected the RepetitionPenaltyLogitsProcessor to properly utilize tokens.
- Converted library to a platform-package with cross-platform native library support - Added platform-specific shared libraries for Linux, macOS, and Windows (x86_64 and ARM64) - Refactored FFI library loading mechanism to support dynamic library resolution - Updated composer configuration to support platform-specific package installation - Introduced new native library classes for OpenBLAS, RindowMatlib, and other dependencies - Simplified native library interaction and improved platform compatibility - Updated example configurations and test cases to reflect new architecture
- Updated Samplerate FFI method to handle errors more robustly - Modified Audio and Image utility classes to use new constructor patterns
…logic, and FFI usage - Reorganized methods within `PretrainedModel` for better readability and logical flow - Fixed usage logic for various LogitsProcessors including `NoBadWordsLogitsProcessor`, `MinNewTokensLengthLogitsProcessor`, `ForcedBOSTokenLogitsProcessor`, and `SuppressTokensAtBeginLogitsProcessor`. - Removed usage of `ForceTokensLogitsProcessor`. - Refactored `Samplerate` and `Sndfile` FFI wrappers from static methods to an instance-based approach.
…d consistency - Enhanced explanations for model conversion and usage to better guide users.
…tlib -> 1.1.0 - Improved the Native libraries to work with versioning in binary names - Implemented dynamic library path generation based on platform-specific templates.
…R for consistency
- Use `$sessions` array in PretrainedModel instead of separate session properties - Remove redundant constructors in model subclasses - Minor logging and doc improvements in Audio and Image utils - Update .gitignore for log files
- Updated model class mapping constants in AutoModel and its subclasses to use a unified naming convention. - Removed redundant MODEL_CLASS_MAPPINGS constants in favor of direct usage of MODELS. - Improved code readability and maintainability by consolidating model definitions.
- Introduced new model classes: Gemma, Gemma2, Gemma3, Qwen3, and Phi with their respective causal language models. - Enhanced model handling in auto models for better lookup
…mapping - Added support for the new merge format in BPEModel, allowing for direct usage of merges as arrays. - Improved the creation of the ranks map by using JSON encoded pairs as keys for better compatibility and performance.
… models in AutoModel This ensures `AutoModel::fromPretrained` returns the most generic model class when no specific task model is found, improving expected behavior and reducing accidental task-specific model selection.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR delivers a major refactor focused on simplifying the generation pipeline, improving cross-platform native support, extending model capabilities, and enhancing overall maintainability.
Highlights:
MaxLength,MaxTime,EosTokenandInterruptablecriteria for better generation control.BPEModelto support new merge formatResolves #88
Resolves #29
No known breaking changes.