-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feature/qwen3 #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature/qwen3 #311
Conversation
zpitroda
commented
Apr 29, 2025
- Updated models from Qwen 2.5 to Qwen 3 equivalents
- Updated transformers and torch python packages
- Updated from qwen 2.5 to qwen 3 models - Updated transformers and torch python packages
Updated llama.cpp for Qwen3 support
README.md
Outdated
| For model deployment, we utilized [llama.cpp](https://github.com/ggml-org/llama.cpp), which provides efficient inference capabilities. | ||
|
|
||
| Our base models primarily come from the [Qwen2.5](https://huggingface.co/Qwen) series. | ||
| Our base models primarily come from the [Qwen3](https://huggingface.co/Qwen) series. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is Second Me's model update policy. As a community user I definitely want to use Qwen3 considering its SOTA capabilities (but I haven't tested yet so maybe there will be some in & outs).
Huge thanks to your work!
On top of that, It would be nice to add Qwen 3 on top of existing Qwen 2.5 model support, like adding a new support model instead of replace the existing Qwen 2.5 model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what is Second Me's model update policy. As a community user I definitely want to use Qwen3 considering its SOTA capabilities (but I haven't tested yet so maybe there will be some in & outs).
Huge thanks to your work!
On top of that, It would be nice to add Qwen 3 on top of existing Qwen 2.5 model support, like adding a new support model instead of replace the existing Qwen 2.5 model.
I was wondering that as well, I'm testing right now to ensure qwen 3 doesn't break anything but yeah I don't know if the Second Me team currents actually wants to update . I can also keep the 2.5 models and add the option for 3 as well if that's preferable?
Updated convert_hf_to_gguf script and gguf-py package to support qwen3 models
Disabled thinking mode and updated backend dockerfile to work with new llama.cpp
|
Everything is working except during inference it outputs the |
Added Qwen 2.5 models back along with 3
|
Hi, I've test it a bit, it fails when downloading models... |
Sorry about that! The base_dir variable was accidentally indented into the "if" block above it but should be working now |
|
Hi, it works now. so once the conflict is resolved, it will be added to develop branch :) |
|
@kevin-mindverse sounds good! I think I have a temporary solution to the extra block being outputted until the llama.pp pr is merged, I'll try to have that and the conflict fixes pushed tomorrow |
|
Ok I actually quickly pushed what I think should do it, shouldn't break anything but haven't double check it's fixed |
|
@yingapple I'm away from my computer this week and unable to test, but it should hopefull now only add no_think flags when using a qwen3 model |