feat: Add from_borrowed() constructor#33
Conversation
- `from_pretrained` now delegates to `from_raw_parts` - Fixes BPE tokenizer support (unk_token_id now optional)
Pringled
left a comment
There was a problem hiding this comment.
Thanks for making this PR @zharinov! This is a nice functionality to have I think, and good catch about the unk_token. I have two small (but nice to have) improvements; if you could implement those this is good to go. Thanks for updating the tests as well 👍
|
@zharinov one additional comment, could you also run clippy to fix the formatting issues? |
from_raw_parts() constructorfrom_borrowed() constructor
|
Hey, I wanted to support zero-copy initialization with The second attempt transforms Also, I applied the suggestion for |
|
UPD. Once I had benchmarks set up locally, I've done some additional research if you're interested. Here is the report I've got:
Benchmark Results:
Branch: https://github.com/zharinov/model2vec-rs/tree/opt/all-optimizations |
|
@zharinov thanks for resolving the comments and for adding these features, everything looks good to me, I'll include this in the next release! |
Adds
from_raw_parts()for constructing models from pre-parsed components,from_pretrained()now delegates to it.Also fixes a bug where loading would fail if the tokenizer doesn't define an
unk_token(not all tokenizers have one).