-
Notifications
You must be signed in to change notification settings - Fork 490
Description
Proposal
Add Gemma 3n support, both E2B and E4B checkpoints. Initially, focusing only to the "text-only mode". The loader should ignore the vision tower and audio encoder so that only the core causal-decoder is used.
Motivation
Gemma 3n offers state-of-the-art performance for the compute it requires, making it ideal for running mechanistic interpretability experiments on consumer hardware. Its unique architecture: featuring sparse updates, low-rank residuals, and nested layers; introduces novel mechanisms that are worth exploring. Gemma 3n could allow the M.I community to study a pair powerful models and enable reproducible interpretability research without the need for expensive compute.
Pitch
Add support for Gemma 3n models (E2B and E4B), starting with text-only inputs, by bypassing the vision and audio components. This would let users without high-end compute do Mechanistic Interpretability experiments on a high-efficiency, high-performance model with novel architectural features such as sparse alternating updates (AltUp), low-rank residual augmentation (LAuReL), and nested sub-models (MatFormer, E4B).
Additional context
HF checkpoints: google/gemma-3n-2b-it, google/gemma-3n-4b-it
Key obstacles to be aware of (no solutions proposed here):
Nested‐layer design (Matryoshka): E4B contains the E2B sub-model.
Extra per-block modules: Each transformer block adds AltUp sparsity gates, LAuReL low-rank residuals, and Per-Layer Embeddings (PLE).
Memory optimizations: The PLE use caching to offload much of its embedding parameters to CPU. So, many weights might not reside on the GPU at runtime.
Mixed local / global attention.
Huge vocab and reserved multimodal IDs.
Multimodality: Loading text-only mode means safely bypassing vision and audio parameters while keeping their references (could this be done?).
Checklist
- I have checked that there is no similar issue in the repo (required)
NOTE: This is my first issue ever on a open source project, any observation or recommendation is welcomed!