Set Gemma 3 4B max_position_embeddings to 128K #95

paper-turtle · 2025-12-31T04:55:47Z

max_position_embeddings is missing from the Gemma 3 4B config. It defaults to 8192 and the model will error if you pass more than 2*8192 tokens.

This PR sets max_position_embeddings to 131072 (128*1024) to match the 128K context length stated in the paper.

Note: In the Kaggle checkpoints for gemma-3-4b-it and gemma-3-4b-pt, the local_freqs_cis and global_freqs_cis tensors currently have shape (16384, 128). The checkpoints will need to be updated, otherwise load_state_dict will fail due to the shape mismatch.

Set Gemma 3 4B max_position_embeddings to 128K

8dc44bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set Gemma 3 4B max_position_embeddings to 128K #95

Set Gemma 3 4B max_position_embeddings to 128K #95

paper-turtle commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Set Gemma 3 4B max_position_embeddings to 128K #95

Are you sure you want to change the base?

Set Gemma 3 4B max_position_embeddings to 128K #95

Conversation

paper-turtle commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant