WaveformRAG

WaveformRAG is a modular Semantic Routing Engine designed for RAG-based code generation. It uses neural embeddings and advanced retrieval techniques to map natural language user intent to specific API tools, followed by an LLM-powered generation stage with "MAKER-style" voting.

Version: 2.1.0
Author: Jordan Elevons

🌊 Overview

WaveformRAG acts as a bridge between high-level user goals (e.g., "smoothly move an object") and low-level API implementations (e.g., Vector3.Lerp). Unlike traditional RAG, it focuses on semantic routing—identifying the exact "tool" or code pattern needed before involving an LLM for final code assembly.

Key Features:

🔬 Neural Semantic Routing: Uses subspace projection and orthogonalized concept vectors (Intent, Motion, Dimension, Domain) to filter and rank candidates.
🗳️ MAKER-Style Voting: A multi-round voting process where the LLM selects the best tool from a list of candidates to eliminate position bias and improve accuracy.
🌍 Multi-Language Support: Easily switch between frameworks (Unity C#, BabylonJS, etc.) using modular JSON profiles.
🧠 Advanced Retrieval: Features query decomposition, semantic expansion, and inverse filtering to handle complex or noisy user prompts.
🏗️ Dataset Pipeline: Includes a full suite of tools for scraping API docs, mining user phrases, synthesizing tool definitions, and calibrating search thresholds.

📂 Project Structure

The project has been consolidated into a modular package for ease of use and maintainability:

waveform_rag/: Core package
- config.py: Centralized configuration for models, paths, and query processing.
- engine.py: Core logic for vector mathematics, retrieval, and re-ranking.
- llm.py: Interface for LLM interaction and MAKER voting.
- dataset.py: Comprehensive tools for building and calibrating the knowledge base.
- main.py: Command-line interface and interactive shell.
run_waveform.py: Convenient entry point script (run from project root).
requirements.txt: Python dependencies for easy installation.
Benchmarking/: Evaluation suite to compare WaveformRAG against raw LLM baselines.
waveform_rag/Dataset/: Storage for tool libraries (unity_textbook.json), neural caches, and scraped source files.
waveform_rag/language_profiles/: JSON profiles for different programming environments.

🚀 Getting Started

Quick Start

Clone the repository:

git clone <your-repo-url>
cd <repo-name>

Install dependencies:
```
pip install -r requirements.txt
```
Configure your LLM server (see Configuration below)
Run WaveformRAG:
```
python run_waveform.py -i
```

Prerequisites

Python 3.9+
Ollama or another OpenAI-compatible local LLM server.
Anthropic API Key (for tool synthesis and benchmarking).

Installation

Install all required dependencies using the provided requirements.txt:

pip install -r requirements.txt

This will install:

sentence-transformers - For neural embeddings
numpy - For vector operations
requests - For HTTP requests to LLM servers
anthropic - For Anthropic API (tool synthesis & benchmarking)
beautifulsoup4 - For HTML parsing (dataset generation)

Configuration

Edit waveform_rag/config.py to match your local setup:

LLM_HOST = "http://127.0.0.1:1234"  # Your LLM server URL
LLM_MODEL = "granite4:tiny-h"      # Your generation model
MODEL_NAME = "all-MiniLM-L6-v2"    # Your embedding model

Usage

All commands should be run from the project root directory.

Interactive Mode:

python run_waveform.py -i

Single Query:

python run_waveform.py "make the player jump when I press space"

Module Mode (Alternative):

python -m waveform_rag.main -i

🛠️ Dataset Generation

WaveformRAG provides a automated pipeline to build your own "textbook" of tools:

Scraping: APIScraper extracts class/method info from local HTML documentation.
Mining: mine_corpus extracts realistic user phrases from chat logs.
Synthesis: ToolSynthesizer uses Claude to generate full JSON tool definitions (patterns, code, metadata).
Calibration: calibrate_library computes neural centroids and optimal search thresholds.

📈 Benchmarking

The Benchmarking/ folder contains scripts to grade WaveformRAG's output using Claude-4 Sonnet, comparing it against a raw LLM baseline across multiple criteria:

Correctness
API Usage
Completeness
Code Quality

python Benchmarking/benchmark_waveform.py --limit 10

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.specstory		.specstory
Benchmarking		Benchmarking
waveform_rag		waveform_rag
.cursorindexingignore		.cursorindexingignore
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_waveform.py		run_waveform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WaveformRAG

🌊 Overview

Key Features:

📂 Project Structure

🚀 Getting Started

Quick Start

Prerequisites

Installation

Configuration

Usage

🛠️ Dataset Generation

📈 Benchmarking

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WaveformRAG

🌊 Overview

Key Features:

📂 Project Structure

🚀 Getting Started

Quick Start

Prerequisites

Installation

Configuration

Usage

🛠️ Dataset Generation

📈 Benchmarking

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages