-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Description
Hi, thanks for your great work! I encountered an error when training the LLaVA-OneVision-1.5-4B-stage0 model using the transformers trainer. The reason seems to be a mismatch between vocab_size in the config.json file and out_features in lm_head.
[rank0]: File "/mnt/workspace/.cache/huggingface/modules/transformers_modules/llavaOV_4b_stage0/modeling_llavaonevision1_5.py", line 1896, in forward
[rank0]: loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/mnt/workspace/envs/llavaOV/lib/python3.11/site-packages/transformers/loss/loss_utils.py", line 63, in ForCausalLMLoss
[rank0]: logits = logits.view(-1, vocab_size)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: RuntimeError: shape '[-1, 152064]' is invalid for input of size 5621632
The vocab_size of this model is also different from that of the base model and the instruction model. Do I need to make any changes here?
Metadata
Metadata
Assignees
Labels
No labels