Vector Bloom is a dark, interactive word embedding explorer built with Next.js 16 and React 19.
It visualizes a large precomputed vocabulary in a 2D semantic space, supports classic word arithmetic such as king - man + woman, and adds a context inspector for ambiguous words whose meaning changes with surrounding text.
The project is organized around three main ideas:
Static vocabulary exploration: browse a large shared embedding space, inspect words, and run analogy arithmetic.Context-aware overlays: project sentence-specific meaning for words likebank,apple,python, andbat.Offline-first data preparation: generate heavy vocabulary assets ahead of time so the runtime app stays lightweight.
The current shipped dataset contains:
50,000vocabulary entries- metadata-first rendering for fast initial load
- lazily loaded vector chunks for arithmetic and nearest-neighbor operations
- precomputed sense clusters for selected ambiguous words
Next.js 16.2.3React 19.2.4TypeScriptTailwind CSS 4@fontsource/inter- custom
Canvas + SVGvisualization - dev-time vocabulary generation using
word-list
word2vec/
├─ public/
│ └─ data/
│ ├─ model-manifests.json
│ └─ english-core-50k/
│ ├─ metadata.json
│ ├─ senses.json
│ └─ chunks/
│ ├─ chunk-0.json
│ ├─ chunk-1.json
│ └─ ...
├─ scripts/
│ └─ generate-vocab-assets.mjs
├─ src/
│ ├─ app/
│ │ ├─ globals.css
│ │ ├─ layout.tsx
│ │ └─ page.tsx
│ ├─ components/
│ │ ├─ embedding-canvas.tsx
│ │ ├─ word2vec-app.tsx
│ │ └─ word2vec-client-shell.tsx
│ └─ lib/
│ ├─ contextual-meaning.ts
│ ├─ model-store.ts
│ ├─ types.ts
│ └─ vector-math.ts
├─ tests/
│ └─ run-tests.ts
├─ package.json
└─ README.md
Files:
src/app/layout.tsxsrc/app/page.tsxsrc/components/word2vec-client-shell.tsx
Responsibilities:
- load global styles and typography
- mount the client-side experience safely
- render the main application entrypoint
word2vec-client-shell.tsx acts as the hydration-safe client wrapper before the full interactive UI is shown.
File:
src/components/word2vec-app.tsx
Responsibilities:
- load manifests, metadata, senses, and vectors
- manage UI mode switching between static and context views
- orchestrate arithmetic requests
- orchestrate context interpretation
- manage selected and hovered words
- coordinate sidebar, canvas, and result panels
This is the main controller for the product.
File:
src/components/embedding-canvas.tsx
Responsibilities:
- render the 2D semantic space on HTML canvas
- overlay labels, analogy paths, and context markers with SVG
- support pan, zoom, hover, and click selection
- keep the large vocabulary map responsive on screen
The canvas renders the global point cloud, while SVG is used for readable overlays and motion paths.
Files:
src/lib/vector-math.tssrc/lib/contextual-meaning.tssrc/lib/model-store.tssrc/lib/types.ts
Responsibilities:
- vector arithmetic and cosine similarity
- analogy parsing and result building
- contextual sense scoring and overlay generation
- manifest, metadata, sense, and chunk loading
- shared type contracts across the app
The runtime architecture is built around a few core contracts.
Describes an available embedding model and where its assets live.
Important fields:
idnamevectorDimvocabSizechunkCountchunkSizemetadataPathvectorsBasePathsensesPathprojectionBasis
Lightweight data for drawing the global map without loading full vectors.
Important fields:
wordIdwordxyfrequencyclusterIdlabelPriority
Chunked high-dimensional vectors loaded only when needed.
Important fields:
chunkIdwordIdsvectors
Precomputed contextual meaning anchors for ambiguous words.
Important fields:
wordsenseIdlabelxyvectorkeywordsexampleContextsnearestStaticWords
- The app loads
public/data/model-manifests.json. - The selected model loads
metadata.jsonandsenses.json. - The canvas renders the full 2D point set using metadata only.
- When the user runs arithmetic or inspects similarity, vector chunks are loaded on demand.
- Results are projected back onto the same shared map.
- The user enters a sentence.
- A token is selected.
- The app looks for precomputed sense clusters for that word.
- If matching sense clues exist, it projects the best fitting sense.
- If not, it falls back to a blended context-adjusted meaning based on nearby token vectors.
- The contextual result is overlaid on the static embedding map.
File:
scripts/generate-vocab-assets.mjs
Responsibilities:
- build the large vocabulary asset set
- assign vectors and projection coordinates
- split vectors into chunks
- emit metadata, senses, and model manifests
Generated files:
public/data/model-manifests.jsonpublic/data/english-core-50k/metadata.jsonpublic/data/english-core-50k/senses.jsonpublic/data/english-core-50k/chunks/chunk-*.json
This is intentionally an offline step so the web app does not need to perform heavy NLP preprocessing in the browser.
Focused on:
- exploring the full vocabulary space
- selecting words directly from the map
- inspecting nearest neighbors
- running arithmetic analogies
Example expressions:
king - man + womanparis - france + englandrome - italy + japan
Focused on:
- sentence-specific interpretation
- meaning shifts for ambiguous tokens
- sense overlays projected onto the global map
Example targets:
bankapplepythonbat
pnpm devStarts the local development server.
pnpm lintRuns ESLint on the project.
pnpm testRuns the lightweight TypeScript and runtime checks in tests/run-tests.ts.
pnpm build
pnpm startBuilds and serves the production app.
pnpm generate:dataRebuilds the vocabulary dataset under public/data/.
Current checks cover:
- analogy expression parsing
- manifest generation and expected scale
- classic arithmetic behavior from chunked vectors
- cosine similarity math
- contextual sense separation for ambiguous words
The test entrypoint is:
tests/run-tests.ts
The UI is intentionally built as a full-screen dark workspace:
- large central visualization area
- compact side panels
- minimal static chrome
- internal panel scrolling instead of excessive page scrolling
The goal is to keep attention on the embedding map rather than on dashboard-style decoration.
The current shipped dataset is precomputed and scalable, but it is still generated rather than coming from a true pretrained GloVe/word2vec checkpoint.
That means:
- the app architecture is production-oriented
- the UI and loading model are ready
- the next major step is replacing the generated vectors with a real pretrained embedding source