Skip to content

[Proposal] Add basic support for Gemma 3n (E2B & E4B) models #953

@lmmontoya-ai

Description

@lmmontoya-ai

Proposal

Add Gemma 3n support, both E2B and E4B checkpoints. Initially, focusing only to the "text-only mode". The loader should ignore the vision tower and audio encoder so that only the core causal-decoder is used.

Motivation

Gemma 3n offers state-of-the-art performance for the compute it requires, making it ideal for running mechanistic interpretability experiments on consumer hardware. Its unique architecture: featuring sparse updates, low-rank residuals, and nested layers; introduces novel mechanisms that are worth exploring. Gemma 3n could allow the M.I community to study a pair powerful models and enable reproducible interpretability research without the need for expensive compute.

Pitch

Add support for Gemma 3n models (E2B and E4B), starting with text-only inputs, by bypassing the vision and audio components. This would let users without high-end compute do Mechanistic Interpretability experiments on a high-efficiency, high-performance model with novel architectural features such as sparse alternating updates (AltUp), low-rank residual augmentation (LAuReL), and nested sub-models (MatFormer, E4B).

Additional context

HF checkpoints: google/gemma-3n-2b-it, google/gemma-3n-4b-it

Key obstacles to be aware of (no solutions proposed here):

Nested‐layer design (Matryoshka): E4B contains the E2B sub-model.

Extra per-block modules: Each transformer block adds AltUp sparsity gates, LAuReL low-rank residuals, and Per-Layer Embeddings (PLE).

Memory optimizations: The PLE use caching to offload much of its embedding parameters to CPU. So, many weights might not reside on the GPU at runtime.

Mixed local / global attention.

Huge vocab and reserved multimodal IDs.

Multimodality: Loading text-only mode means safely bypassing vision and audio parameters while keeping their references (could this be done?).

Checklist

  • I have checked that there is no similar issue in the repo (required)

NOTE: This is my first issue ever on a open source project, any observation or recommendation is welcomed!

Metadata

Metadata

Assignees

Labels

complexity-highVery complicated changes for people to address who are quite familiar with the code

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions