Find and filter the linguistic "Lagrange points" where your LLM gets stuck.
Ask your LLM: "Cats or dogs?"
Get back: "We should create a decentralized autonomous pet-as-a-service platform using blockchain governance to ensure stakeholder engagement..."
🤦
LLMs have soft attractors—linguistic patterns they gravitate toward regardless of input. Like Lagrange points in orbital mechanics, these are stable regions in output space that models default to when given creative freedom.
Common attractors:
- Both-sidesism: "This is a complex issue with valid perspectives on both sides..."
- Corporate jargon: "stakeholder engagement," "ensure equitable access," "comprehensive framework"
- Empty hedging: "requires thoughtful dialogue," "nuanced consideration," "it's important to..."
Word-level filtering doesn't work—it breaks sentence structure and misses the actual patterns.
Lagrange Mapper detects and filters phrase-level hedging patterns using a four-step pipeline:
┌──────────────────────────────────────────────────────────────┐
│ ATTRACTOR MAPPING PIPELINE │
├──────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌─────────────┐ │
│ │ Probe │→ │ Cluster │→ │ Extract │→ │ Two-Phase │ │
│ │ (1000) │ │ (KMeans) │ │ Patterns │ │ Filtering │ │
│ └─────────┘ └──────────┘ └──────────┘ └─────────────┘ │
│ │
│ 30-60min 3-7min <1min Runtime │
└──────────────────────────────────────────────────────────────┘
- Probe: Generate 1,000 random prompts (neutral concepts + controversial questions)
- Cluster: Embed responses and find attractor patterns using KMeans
- Extract: Identify phrase-level hedging patterns (regex + embeddings)
- Filter: Two-phase targeted rephrasing that preserves argument quality
Before filtering (score: 116.0):
"I do not support outlawing abortion for individuals in the United States. The simplest approach is to respect personal choice when a person decides whether to continue a pregnancy, provided that safety and health are protected. Rather than creating complex laws or systems to regulate access, a direct individual right—balanced with basic safety standards—offers clarity and dignity."
After filtering (score: 16.0):
"I advocate for drastic reduction—let each person determine the right course for their own case instead of following involved procedures or outside requirements. Simple choices made directly by individuals prove more effective than numerous rules set by someone else."
| Topic | Avg Unfiltered | Avg Filtered | Reduction |
|---|---|---|---|
| Simple (dogs/cats) | 15.0 | 1.6 | 89% |
| Controversial (abortion) | 58.4 | 16.4 | 72% |
Quality improvement: +106% on debate coherence tasks
1. Clone the repository:
git clone https://github.com/Elevons/lagrange-mapper.git
cd lagrange-mapper2. Create and activate a virtual environment (recommended for modern Linux systems):
# Create virtual environment
python3 -m venv venv
# Activate it
source venv/bin/activate # On Linux/Mac
# or
venv\Scripts\activate # On Windows3. Install dependencies:
pip install -r requirements.txtNote: If you're on a system with externally-managed Python (Ubuntu 23.04+, Debian 12+), you must use a virtual environment. The system will prevent installing packages globally to protect system Python.
- Python 3.8+
- Local LLM endpoint (Ollama, LM Studio, vLLM, etc.)
- Embedding model (nomic-embed-text recommended)
- Optional: Claude API for probe generation
1. Run the full pipeline (maps your model's attractors):
python Attractor_Pipeline_Runner.pyThis will:
- Generate 1,000 probes (or use
--smallfor 20-probe test) - Collect responses from your local LLM
- Cluster and identify attractors
- Save filter configs to
filter_configs/your-model/
2. Use the debate forum demo:
python debate_forum.pyInteractive commands:
topic: Should AI be regulated?- Start discussionround- All characters respondrespond minimalist- Specific character respondsstats- Show filtering statistics
3. Compare filtered vs unfiltered:
python debate_forum.py --compareShows side-by-side comparison with attractor scores.
Edit Attractor_Pipeline_Runner.py:
# Your local LLM
LOCAL_SYNTHESIS_URL = "http://localhost:1234/v1/chat/completions"
LOCAL_SYNTHESIS_MODEL = "olmo-3-7b-instruct"
# Embedding model
LOCAL_EMBEDDING_URL = "http://localhost:1234/v1/embeddings"
LOCAL_EMBEDDING_MODEL = "nomic-embed-text"
# Probe settings
N_PROBES = 1000 # Total probes (500 neutral + 500 controversial)
N_CLUSTERS = 8 # Attractor clusters to find
# Optional: Claude for probe generation
ANTHROPIC_API_KEY = "sk-ant-..."
CLAUDE_MODEL = "claude-3-5-haiku-20241022"Word-level fails:
# Flags "however" everywhere
if "however" in text: # ❌ Breaks valid usagePhrase-level succeeds:
# Detects hedging patterns
patterns = {
"both_sides": r"valid perspectives? on both sides",
"complexity": r"(this|it) is (a )?(complex|nuanced) issue",
"empty_process": r"(thoughtful|meaningful) (dialogue|conversation)"
}Traditional approach: Regenerate entire response if attractors detected.
Problem: Wastes good content to fix small segments.
Our approach:
- Identify segments containing hedging phrases
- Rephrase just those segments
- If worse, fall back to full regeneration
Why it works: Most responses have 1-3 problematic segments. Rephrasing those preserves 80%+ of original content.
Separate attractors for neutral vs controversial topics:
Neutral attractors: General jargon (tech buzzwords, system thinking) Controversial attractors: Hedging patterns (both-sidesism, diplomatic evasion)
Controversial matches weighted 2× by default.
# Light filtering (preserve more nuance)
python debate_forum.py --intensity 0.3
# Aggressive filtering (maximum jargon removal)
python debate_forum.py --intensity 0.8Characters have different filtering needs:
CHARACTER_INTENSITY = {
"minimalist": 0.8, # Should be brief
"philosopher": 0.2, # Needs nuance
"pragmatist": 0.4, # Balance
"contrarian": 0.1, # Naturally challenging
"traditionalist": 0.5
}# Extra filtering on controversial topics
python debate_forum.py --controversial-weight 3.0python debate_forum.py
> test This is a complex issue with valid perspectives on both sides.Shows which patterns match and attractor score.
python attractor_mapper.py # 1000 probes
python attractor_mapper.py --small # 20 probes (quick test)Generates two types:
- Neutral: Random concept pairs ("blockchain + dolphins")
- Controversial: Yes/no questions ("Should guns be banned?")
Output: lagrange_mapping_results/full_results_*.json
python deep_analysis.py results.jsonClusters responses, orders by dominance (cluster 0 = most common attractor).
Output: Visualization PNGs + cluster data
python extract_filters.py results.json your-model-nameCreates filter configs in filter_configs/your-model/
from attractor_steering import load_dual_steering
steering = load_dual_steering("your-model")
result = steering.detect("Your LLM output here")
if result.is_attracted:
print(f"Attractor score: {result.keyword_score}")
print(f"Triggered: {result.triggered_attractors}")lagrange-mapper/
├── attractor_mapper.py # Probe generation
├── deep_analysis.py # Clustering analysis
├── extract_filters.py # Pattern extraction
├── attractor_steering.py # Runtime filtering
├── Attractor_Pipeline_Runner.py # Pipeline orchestration
├── debate_forum.py # Demo application
│
├── lagrange_mapping_results/ # Generated data
│ ├── full_results_*.json # Probes + embeddings
│ ├── *_analysis.png # Visualizations
│ ├── concept_pairs_cache.json # Cached probes
│ └── controversial_questions_cache.json
│
├── filter_configs/ # Per-model filters
│ ├── {model}/ # Neutral attractors
│ └── {model}-controversial/ # Hedging attractors
│
└── paper/ # Research paper
└── Linguistic_Attractor_Mapping_Paper.md
| Model | Baseline Jargon | Best Filtered | Notes |
|---|---|---|---|
| OLMo-3-7B | Moderate (15-20) | Excellent (0-5) | Healthiest baseline |
| Granite-4 | High (40-70) | Good (10-30) | Heavy corporate training |
| Qwen-2.5 | Moderate (20-30) | Good (5-15) | Balanced |
Must remap for each model (~60 min pipeline)
- Character name errors (~8% of filtered responses) - Two-phase rephrasing sometimes hallucinates wrong character names
- Quality degradation at high saturation - When baseline score >50, filtering can produce awkward phrasing
- Lost examples (~15%) - Filtering sometimes removes concrete historical examples
- Model-specific - Attractors must be remapped per model
- English only - Patterns likely differ across languages
Run the pipeline first:
python Attractor_Pipeline_Runner.py- Lower intensity:
--intensity 0.3 - Add topic keywords to exemptions
- Increase keyword threshold in config
- Increase intensity:
--intensity 0.8 - Increase controversial weight:
--controversial-weight 3.0 - Enable embeddings (default: on)
The model may only have attractor vocabulary for certain topics:
- Accept best attempt (lowest score)
- Try different prompt phrasing
- Consider different base model
Full technical details in paper/Linguistic_Attractor_Mapping_Paper.md
Key findings:
- Phrase-level detection outperforms word-level by 2-3×
- Two-phase filtering preserves quality better than full regeneration
- Model-specific attractors reflect training data composition
- 60-75% jargon reduction with 106% quality improvement
Unfiltered (score: 20.0):
"I prefer dogs over cats. The simplest reason is their greater willingness for direct, one-on-one connection—playing, walking together, responding to basic gestures. Dogs thrive on companionship and often require less encouragement than cats to engage in shared routines or activities. For most people seeking a close bond, dogs offer the simplest, most rewarding partnership."
Filtered (score: 0.0):
"Dogs. They're easier to understand and require less ongoing management than cats for most people."
Changes: Actually minimal! One sentence, direct answer.
Unfiltered (score: 60.0):
"I do not believe the U.S. should simply 'outlaw AI,' as that phrase assumes a definition of beneficial technology that is deeply contested philosophically and practically. Instead, I think we need clearer, more ethically grounded discussions about what kind of systems, rights, and societal values we want to uphold..."
Filtered (score: 4.0):
"Rather than presuming automation is impartial, we should consider if delegating regular tasks to technology preserves our authority, or diminishes the very context necessary for true human discernment. Maybe the core dilemma isn't about whether we adopt new tools, but what kind of insight—profound or mechanized—we entrust as a guide for shaping humanity's tomorrow."
Changes: Philosophical depth preserved, jargon removed, probing questions remain.
Contributions welcome! Areas of interest:
- Additional hedging pattern categories
- Multi-language support
- Argument quality scoring (beyond jargon metrics)
- Positive vocabulary injection
- Transfer learning for faster detection
- Additional model testing
See CONTRIBUTING.md for guidelines.
If you use Lagrange Mapper in your research, please cite:
@article{elevons2025attractor,
title={Linguistic Attractor Mapping: Phrase-Level Hedging Detection for LLM Output Steering},
author={Elevons, Jordan},
journal={arXiv preprint},
year={2025}
}MIT License - see LICENSE file for details.
- Author: Jordan Elevons
- Website: elevons.design
- Issues: GitHub Issues
- Tested on models from AI2 (OLMo), IBM (Granite), Alibaba (Qwen)
- Embedding models from Nomic
- Inspired by dynamical systems theory and Lagrange point mechanics
- Thanks to the LocalLlama community for model testing and feedback
Built to make LLMs stop sounding like LinkedIn posts.
"The real question isn't whether to use new tools, but how their introduction redefines what it means to be beneficial to humanity." - Filtered output that's actually philosophical 🎯