Transform unstructured markdown/text notes into an interconnected knowledge graph using semantic similarity analysis.
- Automatic Link Discovery - Infer semantic relationships between notes without manual linking
- Multiple Embedding Methods - TF-IDF baseline, Sentence-BERT, or OpenAI embeddings
- Graph Visualization - Interactive D3.js web interface or Pyvis HTML export
- Community Detection - Discover topic clusters using Louvain algorithm
- Topic Aggregation - Group related notes and generate topic summaries
- Incremental Updates - SQLite storage for efficient vault synchronization
- Link Reasoning - Understand why notes are connected with shared terms
Similar solutions around "Document linking" for human interpretablity fall short by focusing on purely key-word matching approaches like this one
Some ideas for how to visualize the semantic linking is given here even though some of those are a bit over-the-top and would only constitute UI complexity which is to be avoided.
- Infer semantic relationships between textual notes
- Compare keyword-based and embedding-based similarity methods
- Construct a sparse, interpretable note graph
- Analyze graph structure and semantic coherence
- Provide a CLI tool that allows for easy usage
# Core installation (TF-IDF only, lightweight)
pip install semlink
# With neural embeddings (includes PyTorch)
pip install semlink[sbert]
# With OpenAI embeddings
pip install semlink[openai]
# With web server for React frontend
pip install semlink[server]
# With Pyvis visualization
pip install semlink[viz]
# Everything
pip install semlink[all]# Run full pipeline on a vault
semlink run ./my-notes --output ./output
# Or use incremental sync with SQLite
semlink sync ./my-notes --db .semlink.db
# Start the web interface
semlink serve --db .semlink.db
# Aggregate notes into topics
semlink aggregate output/graph.json --notes output/notes.json -o topics/| Command | Description |
|---|---|
ingest |
Load and preprocess notes from a vault directory |
embed |
Generate embeddings (TF-IDF, SBERT, or OpenAI) |
link |
Infer links between notes based on similarity |
analyze |
Compute graph metrics and detect communities |
visualize |
Generate HTML, PNG, D3 JSON, or Obsidian export |
compare |
Compare different embedding methods |
run |
Full pipeline: ingest → embed → link → analyze → visualize |
sync |
Incremental vault sync with SQLite storage |
export |
Export graph from SQLite database |
status |
Show database statistics |
aggregate |
Group notes into topics by community |
serve |
Start web server for React frontend |
info |
Display available models and strategies |
# Process a vault with TF-IDF (default, lightweight)
semlink run ./vault --output ./output
# Use Sentence-BERT for better semantic matching
semlink run ./vault --method sbert --output ./output
# Filter weak links (< 25% similarity)
semlink run ./vault --min-weight 0.25 --output ./output# Initial sync (creates .semlink.db)
semlink sync ./vault
# Re-run after editing notes (only processes changes)
semlink sync ./vault
# Export graph for visualization
semlink export --db .semlink.db -o graph.json
# Check database status
semlink status# Build frontend (first time only)
cd frontend && npm install && npm run build && cd ..
# Start server
semlink serve --db .semlink.db
# Opens at http://localhost:8000# Generate topic clusters from graph
semlink aggregate graph.json --notes notes.json --format markdown -o topics/
# Export as Obsidian vault structure
semlink aggregate graph.json --notes notes.json --format obsidian -o vault/
# More granular topics (higher resolution)
semlink aggregate graph.json --notes notes.json --resolution 1.5 -k 7| Method | Description | Requirements |
|---|---|---|
tfidf |
TF-IDF keyword matching (default) | scikit-learn |
sbert |
Sentence-BERT semantic similarity | sentence-transformers |
openai |
OpenAI text embeddings API | openai, tiktoken |
| Strategy | Description |
|---|---|
threshold |
Connect notes with similarity ≥ threshold |
knn |
Connect to k nearest neighbors |
mutual_knn |
Connect only if mutually nearest |
hybrid |
KNN + threshold (recommended) |
semlink/
├── frontend/ # React + D3.js web interface
│ ├── src/
│ │ ├── components/ # ForceGraph, SidePanel, ControlPanel
│ │ ├── hooks/ # useGraphData
│ │ └── types/ # TypeScript definitions
│ └── package.json
├── src/semlink/
│ ├── core/
│ │ ├── ingest.py # Note discovery and preprocessing
│ │ ├── chunk.py # Chunking strategies
│ │ ├── tfidf.py # TF-IDF embeddings
│ │ ├── embeddings.py # SBERT, OpenAI embeddings
│ │ ├── linker.py # Link inference strategies
│ │ ├── graph.py # NetworkX graph building
│ │ ├── analysis.py # Metrics, community detection
│ │ ├── visualize.py # Pyvis, D3, Obsidian export
│ │ ├── evaluate.py # Method comparison
│ │ ├── storage.py # SQLite persistence
│ │ └── aggregate.py # Topic aggregation
│ ├── server.py # FastAPI backend
│ └── cli.py # Typer CLI
└── pyproject.toml
# Clone and install
git clone https://github.com/KreativeThinker/SemLink.git
cd SemLink
uv sync
# Install pre-commit hooks
pre-commit install
# Run CLI
uv run semlink --helpThe digital note-taking software market is experiencing significant growth, with projections reaching $1.35-1.5 billion by 2027-2028. Knowledge workers are increasingly adopting tools like Obsidian, Roam Research, and Logseq for personal knowledge management. However, manual linking does not scale beyond a few hundred notes, leaving valuable relationships unexpressed.
This project addresses the problem of automatically identifying and representing semantic relationships between notes without user intervention, using NLP techniques to infer contextual relationships and represent them as a knowledge graph.
- Arnold, M., Goldschmitt, M., & Rigotti, T. (2023). Dealing with information overload: A comprehensive review. Frontiers in Psychology, 14.
- Karr-Wisniewski, P., & Lu, Y. (2010). When more is too much: Operationalizing technology overload. Computers in Human Behavior, 26(5), 1061-1072.
- Li, H., et al. (2023). Knowledge graphs in practice: characterizing users, challenges, and visualization opportunities. IEEE TVCG.
Please refer to the guidelines in CONTRIBUTING.md.
MIT License - see LICENSE for details.