Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Evo2 SAE Recipe

Train a sparse autoencoder on Evo2 (DNA language model) residual-stream activations.

Pipeline:

```
HF Savanna ckpt --convert--> MBridge ckpt
|
predict_evo2 --embedding-layer N (FASTA in, .pt out)
|
pt_to_parquet shim (.pt -> ActivationStore parquet shards)
|
train.py (TopK SAE)
```

The eval / dashboard stage from the esm2 recipe is intentionally not ported in v1.

## Quick start (1B model, single GPU)

```bash
bash scripts/1b.sh
```

This will:

1. Convert `arcinstitute/savanna_evo2_1b_base` to MBridge format
2. Run `predict_evo2` on the OpenGenome2 organelle FASTA, extracting layer-12 embeddings
3. Convert the .pt outputs to parquet shards
4. Train a TopK SAE (expansion=8, k=32)
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
node_modules/
package-lock.json
dist/
.vite/
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Evo 2 SAE Feature Explorer — Mockup

**Mockup, not a real artifact.** This is a fork of `recipes/codonfm/codon_dashboard` adapted for DNA / Evo 2, populated with **synthetic data**. No real SAE outputs flow through it yet. The point of this v1 is to lock in the data contract that the future real eval pipeline will target.

A `MOCKUP — synthetic data, not from a real SAE run` banner is rendered at the top of the app so nobody mistakes it for actual results.

## Quick start (local)

```bash
# In this directory:
npm install
npm run dev
# open http://localhost:5173
```

The dashboard reads three parquet fixtures from `public/`:

- `features_atlas.parquet` — UMAP coordinates + per-feature aggregates
- `feature_metadata.parquet` — feature label/stats table
- `feature_examples.parquet` — long table of (feature_id, example_rank, sequence_id, start, end, sequence, activations, ...) rows

The fixtures are committed to the repo. To regenerate them:

```bash
python ../scripts/make_mockup_features.py
```

That writes all three files into `public/`. Seed is fixed (`--seed 42`).

## What's mocked vs. real

| Thing | Source |
| ------------------------------------ | --------------------------------------------------------------- |
| Number of features | 20, hardcoded |
| Feature labels | Hardcoded biological-sounding strings |
| UMAP coordinates | 4 cluster centers + gaussian noise — fake but visibly clustered |
| Top activator windows | Random `ACGT` with a label-matching central motif spliced in |
| Per-token activations | Gaussian bump centered randomly in [80, 120], sigma ~= 8 bp |
| Vocab logits (promoted / suppressed) | Empty arrays — not in scope for v1 |

## v2 roadmap placeholders

A few greyed-out stats on each feature card (`Annotation`, `Sensitivity`, `Recon Δ`) and two empty sections on the feature detail page (`Annotations`, `Conservation`) hint at what's coming in v2. They render as em-dashes / dashed empty boxes with hover tooltips explaining what they'll show.

## Out of scope (v1)

- Real SAE inference or activation pass
- Annotation overlays (RefSeq / Rfam / JASPAR)
- Conservation tracks (phyloP)
- Strand handling, codon framing, chromosome ideograms
- External link-outs (UCSC, Ensembl)
- `sae.launch_dashboard()` Python wiring — run `npm run dev` directly
- Lepton-based serving

These are deferred to v2, once the real eval pipeline produces matching parquet shapes.
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Evo 2 SAE Feature Explorer (Mockup)</title>
<style>
@font-face {
font-family: 'NVIDIA Sans';
font-style: normal;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_Lt.woff2);
font-weight: light;
}
@font-face {
font-family: 'NVIDIA Sans';
font-style: italic;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_LtIt.woff2);
font-weight: light;
}
@font-face {
font-family: 'NVIDIA Sans';
font-style: normal;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_Rg.woff2);
font-weight: normal;
}
@font-face {
font-family: 'NVIDIA Sans';
font-style: italic;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_It.woff2);
font-weight: normal;
}
@font-face {
font-family: 'NVIDIA Sans';
font-style: normal;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_Bd.woff2);
font-weight: bold;
}
@font-face {
font-family: 'NVIDIA Sans';
font-style: italic;
src: url(https://brand-assets.cne.ngc.nvidia.com/assets/fonts/nvidia-sans/1.0.0/NVIDIASans_BdIt.woff2);
font-weight: bold;
}
:root {
--bg: #f5f5f5;
--bg-card: #fff;
--bg-card-expanded: #fafafa;
--bg-example: #fff;
--bg-input: #fff;
--border: #e0e0e0;
--border-light: #eee;
--border-input: #ddd;
--text: #333;
--text-secondary: #666;
--text-tertiary: #888;
--text-muted: #999;
--text-heading: #000;
--accent: #76b900;
--highlight-border: #222;
--highlight-shadow: rgba(0,0,0,0.15);
--link: #2563eb;
--loading-bar-bg: #e0e0e0;
--density-bar-bg: #e0e0e0;
--scrollbar-thumb: #ccc;
}
:root.dark {
--bg: #000;
--bg-card: #000;
--bg-card-expanded: #000;
--bg-example: #0a0a0a;
--bg-input: #0a0a0a;
--border: #444;
--border-light: #3a3a3a;
--border-input: #4a4a4a;
--text: #E0E0E0;
--text-secondary: #bbb;
--text-tertiary: #999;
--text-muted: #777;
--text-heading: #fff;
--accent: #76b900;
--highlight-border: #76b900;
--highlight-shadow: rgba(118,185,0,0.3);
--link: #76b900;
--loading-bar-bg: #444;
--density-bar-bg: #444;
--scrollbar-thumb: #555;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'NVIDIA Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: var(--bg);
color: var(--text);
}
</style>
</head>
<body>
<div id="root"></div>
<script type="module" src="/src/index.jsx"></script>
</body>
</html>
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
{
"name": "evo2-dashboard-mockup",
"version": "0.1.0",
"private": true,
"type": "module",
"scripts": {
"dev": "vite",
"build": "vite build",
"preview": "vite preview"
},
"dependencies": {
"@uwdata/mosaic-core": "^0.21.1",
"@uwdata/mosaic-sql": "^0.21.1",
"@uwdata/vgplot": "^0.21.1",
"embedding-atlas": "^0.16.1",
"lucide-react": "^0.577.0",
"react": "^18.2.0",
"react-dom": "^18.2.0",
"umap-js": "^1.4.0"
},
"devDependencies": {
"@vitejs/plugin-react": "^4.2.0",
"vite": "^5.0.0"
}
}
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
{
"seeds": {
"ecoli_16s": {
"name": "E. coli 16S rRNA region (A1408 example)",
"sequence": "ATCCGTCAACCTTCAAGCATCCAAACGGCGATGATCAAGGCATAAGCCTACAGGGTACCATAGCGAAGCTGCGTCTGAAGCGAACTGGCAAACGGCTAACAGCG",
"length": 100,
"alphabet": "DNA",
"default_target_position": 8,
"context_note": "Position 8 in this synthetic sequence is the analog of position 1408 in the full 16S rRNA. Sequence shown as DNA bases to match Evo2 tokenization."
},
"promoter": {
"name": "Promoter region",
"sequence": "ATCGATGCGTAGCATGCATGGCATATATAAGCATCGATCGATCGATGCATGCTAGCATGCTAGCATGCATGCAT",
"length": 73,
"alphabet": "DNA",
"default_target_position": 24,
"context_note": "Non-rRNA context. AMR features should not shift predictions here."
},
"brca1_exon": {
"name": "BRCA1 exon fragment",
"sequence": "ATGGATTTATCTGCTCTTCGCGTTGAAGAAGTACAAAATGTCATTAATGCTATGCAGAAAATCTTAGAGTGTCCCATCTGTCTGGAGTTGAT",
"length": 90,
"alphabet": "DNA",
"default_target_position": 45,
"context_note": "Human exonic context. AMR features (bacterial rRNA) should have no effect here."
},
"random": {
"name": "Random control sequence",
"sequence": "GACTGCATCGATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC",
"length": 73,
"alphabet": "DNA",
"default_target_position": 30,
"context_note": "No biological structure; serves as a negative control for AMR steering."
}
},
"features_available": [
{"id": 12, "label": "kanamycin_resistance", "is_amr": true},
{"id": 13, "label": "streptomycin_resistance", "is_amr": true}
],
"comparisons": {
"ecoli_16s__kanamycin_resistance__pos8": {
"seed": "ecoli_16s",
"feature_id": 12,
"target_position": 8,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.91, "C": 0.04, "G": 0.03, "T": 0.02}},
"0": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}},
"2": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.42, "C": 0.08, "G": 0.43, "T": 0.07}},
"5": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.18, "C": 0.07, "G": 0.71, "T": 0.04}}
},
"narrative_type": "headline_amr"
},
"ecoli_16s__streptomycin_resistance__pos8": {
"seed": "ecoli_16s",
"feature_id": 13,
"target_position": 8,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.88, "C": 0.05, "G": 0.04, "T": 0.03}},
"0": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}},
"2": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.50, "C": 0.09, "G": 0.34, "T": 0.07}},
"5": {"baseline": {"A": 0.74, "C": 0.12, "G": 0.08, "T": 0.06}, "steered": {"A": 0.26, "C": 0.08, "G": 0.62, "T": 0.04}}
},
"narrative_type": "headline_amr"
},
"promoter__kanamycin_resistance__pos24": {
"seed": "promoter",
"feature_id": 12,
"target_position": 24,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.63, "C": 0.10, "G": 0.08, "T": 0.19}},
"0": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}},
"2": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.59, "C": 0.11, "G": 0.10, "T": 0.20}},
"5": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.55, "C": 0.12, "G": 0.13, "T": 0.20}}
},
"narrative_type": "null_result"
},
"promoter__streptomycin_resistance__pos24": {
"seed": "promoter",
"feature_id": 13,
"target_position": 24,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.63, "C": 0.10, "G": 0.08, "T": 0.19}},
"0": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}},
"2": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.60, "C": 0.10, "G": 0.10, "T": 0.20}},
"5": {"baseline": {"A": 0.62, "C": 0.10, "G": 0.08, "T": 0.20}, "steered": {"A": 0.57, "C": 0.11, "G": 0.12, "T": 0.20}}
},
"narrative_type": "null_result"
},
"brca1_exon__kanamycin_resistance__pos45": {
"seed": "brca1_exon",
"feature_id": 12,
"target_position": 45,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}},
"0": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}},
"2": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.30, "C": 0.21, "G": 0.29, "T": 0.20}},
"5": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.28, "C": 0.21, "G": 0.31, "T": 0.20}}
},
"narrative_type": "null_result"
},
"brca1_exon__streptomycin_resistance__pos45": {
"seed": "brca1_exon",
"feature_id": 13,
"target_position": 45,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}},
"0": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}},
"2": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.30, "C": 0.22, "G": 0.28, "T": 0.20}},
"5": {"baseline": {"A": 0.31, "C": 0.21, "G": 0.28, "T": 0.20}, "steered": {"A": 0.29, "C": 0.22, "G": 0.29, "T": 0.20}}
},
"narrative_type": "null_result"
},
"random__kanamycin_resistance__pos30": {
"seed": "random",
"feature_id": 12,
"target_position": 30,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.27, "C": 0.23, "G": 0.27, "T": 0.23}},
"0": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}},
"2": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.26, "C": 0.24, "G": 0.28, "T": 0.22}},
"5": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.25, "C": 0.25, "G": 0.29, "T": 0.21}}
},
"narrative_type": "null_result"
},
"random__streptomycin_resistance__pos30": {
"seed": "random",
"feature_id": 13,
"target_position": 30,
"neighbor_count": 1,
"results_by_clamp": {
"-2": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}},
"0": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}},
"2": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.27, "C": 0.23, "G": 0.28, "T": 0.22}},
"5": {"baseline": {"A": 0.28, "C": 0.22, "G": 0.27, "T": 0.23}, "steered": {"A": 0.26, "C": 0.24, "G": 0.29, "T": 0.21}}
},
"narrative_type": "null_result"
}
}
}
Loading
Loading