-
Notifications
You must be signed in to change notification settings - Fork 553
Description
Is there an existing issue ? / 是否已有相关的 issue ?
- I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。
Describe the bug / 描述这个 bug
🐛 Bug Description
When running inference on MiniCPM4 using transformers==4.49.0, the model throws an ImportError followed by a ValueError. These errors prevent the model from successfully executing the forward pass in environments that have not upgraded to the latest transformers versions.
🔍 Root Cause Analysis
We identified two distinct but related issues during the model initialization and the first forward pass:
1. DynamicLayer Import Error
DynamicLayer was introduced in transformers version 4.54.1. In older versions (e.g., 4.49.0), importing it directly from transformers.cache_utils causes a fatal crash:
ImportError: cannot import name 'DynamicLayer' from 'transformers.cache_utils' 2. past_key_values Initialization Logic Flaw
During the first forward pass, past_key_values is naturally None. However, the current logic in MiniCPMModel.forward (around line 1940) misinterprets None as a legacy tuple cache because isinstance(None, Cache) evaluates to False:
ValueError: You must use the new past_key_values format, such as the Cache class, instead of the old tuple format
🛠️ Proposed Solution
To maximize backward compatibility without forcing users to upgrade their transformers package (which might break other dependencies), we propose the following minimal-impact fixes:
- Self-Contained Cache Classes:
- Embed
CacheLayerMixinandDynamicLayerdefinitions directly intomodeling_minicpm.pyas a fallback. (If use newer versions, it can be neglected)
- Embed
- Refined Cache Check:
- Update the validation logic in the
forwardmethod to explicitly allowpast_key_values is Noneduring the first pass and correctly initializeInfLLMv2CacheorDynamicCache.
- Update the validation logic in the
To Reproduce / 如何复现
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"openbmb/MiniCPM4-8B",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM4-8B", trust_remote_code=True)
prompt = "GitHub community standards dictate clear code reproduction."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model(
input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
use_cache=True
)Expected behavior / 期望的结果
Normal Inference
Screenshots / 截图
No response
Environment / 环境
- **Model:** MiniCPM4-8B / MiniCPM4-0.5B
- **Transformers Version:** <= 4.54.0 (e.g., 4.49.0)
- **PyTorch Version:** (Add your version here, e.g., 2.2.0)Additional context / 其他信息
No response