Skip to content

Conversation

@paper-turtle
Copy link

max_position_embeddings is missing from the Gemma 3 4B config. It defaults to 8192 and the model will error if you pass more than 2*8192 tokens.

This PR sets max_position_embeddings to 131072 (128*1024) to match the 128K context length stated in the paper.

Note: In the Kaggle checkpoints for gemma-3-4b-it and gemma-3-4b-pt, the local_freqs_cis and global_freqs_cis tensors currently have shape (16384, 128). The checkpoints will need to be updated, otherwise load_state_dict will fail due to the shape mismatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant