Skip to content

issue/348 - add Baichuan causal LM model support#351

Open
JoeZhang-0x000 wants to merge 1 commit intoInfiniTensor:mainfrom
JoeZhang-0x000:issue/348
Open

issue/348 - add Baichuan causal LM model support#351
JoeZhang-0x000 wants to merge 1 commit intoInfiniTensor:mainfrom
JoeZhang-0x000:issue/348

Conversation

@JoeZhang-0x000
Copy link
Copy Markdown

Summary

  • Add Baichuan model config adapter (csrc/models/baichuan/)
  • Register "baichuan" in config_factory.cpp classic_models list and auto_config.py
  • Add Baichuan weight remapping (W_packq/k/v_proj) in modeling_utils.py
  • Update test_infer.py for Baichuan tokenization and chat prompt handling

Closes #348
Parent issue: #332

@JoeZhang-0x000 JoeZhang-0x000 requested a review from a team May 7, 2026 06:30
- Add Baichuan model config adapter (csrc/models/baichuan/)
- Register baichuan in config_factory.cpp and auto_config.py
- Add Baichuan weight remapping (W_pack -> q/k/v_proj) in modeling_utils.py
- Update test_infer.py for Baichuan tokenization and chat prompt handling
@wooway777
Copy link
Copy Markdown
Collaborator

wooway777 commented May 7, 2026

image 116044cdd2a3fc19b4f0b0cd2cd8ab15

这个是我这边tokenizer版本或者精度问题么?我看好像确实没输出eos
第一张图里不会正常终止,第二张图无标点

Comment thread examples/test_infer.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这样修改是不是意味着只能做单轮推理,bench、精度测试、服务都无法使用?
如何做通用我们也需要花时间看一眼

it->second(model_config);
} else {
std::vector<std::string> classic_models = {"llama", "qwen2", "minicpm", "fm9g", "fm9g7b"};
std::vector<std::string> classic_models = {"llama", "qwen2", "minicpm", "fm9g", "fm9g7b", "baichuan"};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型不需要修改这里,请删除csrc/config/config_factory.cpp文件的修改

@@ -47,4 +47,7 @@ def from_pretrained(model_path):
cfg.model_type = "minicpmv"
return cfg

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增模型不需要修改这里,请删除python/infinilm/auto_config.py文件的修改


namespace infinilm::models::baichuan {

std::shared_ptr<infinilm::config::ModelConfig> create_baichuan_model_config(
Copy link
Copy Markdown
Collaborator

@pengcheng888 pengcheng888 May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要明确给出BaichuanForCausalLM的定义: 添加 using BaichuanForCausalLM = infinilm::models::llama::LlamaForCausalLM,


INFINILM_REGISTER_CAUSAL_LM_MODEL(
baichuan,
infinilm::models::llama::LlamaForCausalLM,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除csrc/models/baichuan/baichuan_for_causal_lm.cpp文件中#include "../llama/llama_for_causal_lm.hpp"。

将infinilm::models::llama::LlamaForCausalLM修改为infinilm::models::baichuan ::BaichuanForCausalLM


namespace {

#ifndef USE_CLASSIC_LLAMA
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

新增的模型不需要放到USE_CLASSIC_LLAMA宏中。请删除csrc/models/baichuan/baichuan_for_causal_lm.cpp文件中的 USE_CLASSIC_LLAMA

return new_sd


def maybe_remap_weights(state_dict, model):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数改名叫adjust_state_dict或者就叫remap_weights吧,maybe略显随意

}


def _split_first_dim(tensor, sizes, name):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

将这个_split_first_dim 函数放到 _remap_baichuan_weights函数里面吧,作为_remap_baichuan_weights函数专用的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Baichuan model

3 participants