Set Gemma 3 4B max_position_embeddings to 128K #95
+1
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
max_position_embeddingsis missing from the Gemma 3 4B config. It defaults to8192and the model will error if you pass more than2*8192tokens.This PR sets
max_position_embeddingsto131072(128*1024) to match the 128K context length stated in the paper.Note: In the Kaggle checkpoints for
gemma-3-4b-itandgemma-3-4b-pt, thelocal_freqs_cisandglobal_freqs_cistensors currently have shape(16384, 128). The checkpoints will need to be updated, otherwiseload_state_dictwill fail due to the shape mismatch.