Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 65 additions & 112 deletions docs/context/README.md

Large diffs are not rendered by default.

721 changes: 99 additions & 622 deletions docs/context/ROADMAP.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/context/SCHEMA_VALIDATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ The schema completeness test runs as part of the test suite. To ensure schema st

- **Deprecated fields**: Fields marked `@deprecated` in TypeScript (e.g., `entryPathAbs`, `entryPathRel`, `os`) may not be in the schema if they're no longer generated. This is intentional for backward compatibility - old files with these fields may still exist.

- **Conditional validation**: The schema doesn't currently validate language-specific field combinations (e.g., `style` shouldn't exist on `node:api` contracts). This is planned for v0.8.x (see ROADMAP.md).
- **Conditional validation**: The schema doesn't currently validate language-specific field combinations (e.g., `style` shouldn't exist on `node:api` contracts). This is planned for a future release (see [ROADMAP.md](../ROADMAP.md)).

- **Style mode variants**: The schema supports both `lean` and `full` style modes. Lean mode uses count fields (e.g., `selectorCount`, `componentCount`), while full mode uses arrays (e.g., `selectors`, `components`). Both formats are valid and should be tested separately.

Expand Down
139 changes: 39 additions & 100 deletions docs/context/cli/compare-modes.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,29 @@
# Mode Comparison Guide

Compare token costs across all context compilation modes to choose the right one for your workflow.
Compare token costs across context modes.

```bash
stamp context --compare-modes
```

## Overview

Four modes balance information vs token cost:
Four modes balance detail vs tokens:

- **none** - Contracts only (props, state, hooks, dependencies), no source code
- **header** - Contracts plus JSDoc headers and function signatures
- **header+style** - Header mode plus style metadata (Tailwind, SCSS, Material UI, animations, layout)
- **full** - Everything including complete source code

The comparison shows token costs for all modes so you can see the tradeoffs.
- **none** - Contracts only (props, state, hooks, dependencies), no source
- **header** - Contracts plus JSDoc headers and signatures
- **header+style** - Header plus style metadata (Tailwind, SCSS, MUI, layout, motion)
- **full** - Contracts plus complete embedded source

## Output Format

The output shows three things:
**1. Token estimation** — Which tokenizers ran, or character fallback.

**2. Comparison vs raw source** — See [baselines](#what-the-two-baselines-mean). Negative % means summed header bundles exceed one copy of all source files, but a **single** root’s bundle is often still far smaller than raw because the row totals **every** root, not the one bundle you attach in chat.

**1. Token estimation method** - Shows which tokenizers are being used, or if it's falling back to approximations
**3. Mode breakdown** — Each mode vs **full** (same bundle set): how much smaller than maximum.

**2. Comparison vs raw source** - Savings compared to including all source files directly:
Example tables:

```
Comparison:
Expand All @@ -34,10 +34,6 @@ The output shows three things:
Header+style | 170,466 | 184,864 | 38%
```

Header mode saves ~70% by extracting contracts and signatures without implementation code. Header+style saves ~38% but adds visual context. Full mode actually costs more than raw source (~30% overhead) due to contract structure.

**3. Mode breakdown** - All modes compared to the maximum (full):

```
Mode breakdown:
Mode | Tokens GPT-4o | Tokens Claude | Savings vs Full Context
Expand All @@ -50,130 +46,73 @@ Header mode saves ~70% by extracting contracts and signatures without implementa

## Token Estimation

By default, the tool uses character-based approximations (~4 chars/token for GPT-4o, ~4.5 for Claude). These are usually within 10-15% of actual counts, which is fine for most cases.

For accurate counts, LogicStamp includes `@dqbd/tiktoken` (GPT-4) and `@anthropic-ai/tokenizer` (Claude) as optional dependencies. npm installs them automatically when you install `logicstamp-context`. If that works, you get exact token counts. If it fails (normal for optional deps), it falls back to approximation.

You only need to install tokenizers manually if:
- You need exact counts (not approximations)
- AND automatic installation failed
Defaults use ~4 chars/token (GPT-4o) and ~4.5 (Claude). Optional deps `@dqbd/tiktoken` and `@anthropic-ai/tokenizer` give exact counts when install succeeds.

```bash
npm install @dqbd/tiktoken @anthropic-ai/tokenizer
```

Accurate counts matter for production deployments, tight budgets, or comparing tools. For development, approximations are usually fine.
## What the two baselines mean

## Mode Selection Guide
**Raw source** — Every project `.ts` / `.tsx` (tests excluded), each file **once**, joined. No JSON bundles.

**Header / header+style (first table)** — Tokens for **all root bundles**, formatted and concatenated. Anything imported by many roots is repeated; bundle JSON adds overhead. So the table can show header **above** raw on large multi-root graphs, even when **one** feature bundle is still cheap. In chat you usually send one bundle—that is **not** the same as this summed row.

**none** - Maximum compression (~18% of raw source)
- Contracts only, no code or style
- Good for: CI/CD validation, dependency analysis, architecture reviews
- Skip if: You need implementation details or visual context
**Tailwind + header+style** — Style extraction expands utilities into structured text, so that row often grows vs raw more than on SCSS-heavy repos (in addition to duplication above).

**header** - Balanced compression (~30% of raw source) *recommended default*
- Contracts + JSDoc headers + function signatures
- Good for: Most AI chat workflows, code review, understanding interfaces
- This is what most people need 90% of the time
## Mode Selection Guide

**header+style** - Visual context (~62% of raw source)
- Everything from header + style metadata (Tailwind, SCSS, animations, layout patterns)
- Good for: UI/UX discussions, design system work, frontend generation
- Adds ~13% token overhead vs header mode
**~N% vs raw** below = rough when summed header output beats one pass over all files ([above](#what-the-two-baselines-mean)).

**full** - Complete context (~130% of raw source)
- Everything including full source code
- Good for: Deep reviews, complex refactoring, bug investigation
- Note: Costs more than raw source due to contract structure overhead
**none** — ~18% of raw in typical cases. CI, graphs, no UI detail.
**header** — ~30% when totals beat raw. Default for most chat/review.
**header+style** — ~62% when totals beat raw; Tailwind pushes it up. UI and design-system work.
**full** — ~130% vs one file pass; contracts + full source. Deep refactors, bugs.

## Example Workflows

**Budget planning:**
```bash
stamp context --compare-modes
stamp context --include-code header --max-nodes 50
```

**Style cost analysis:**
```bash
stamp context --compare-modes
stamp context style # only if budget allows
```

**Production optimization:**
```bash
stamp context --compare-modes | tee token-analysis.txt
stamp context --include-code none --profile ci-strict
stamp context style # if budget allows
```

**Multi-repo comparison:**
```bash
for repo in api web mobile; do
echo "=== $repo ==="
cd $repo && stamp context --compare-modes --quiet && cd ..
done
stamp context --compare-modes --stats # writes context_compare_modes.json
```

**MCP integration:**
```bash
stamp context --compare-modes --stats
# Creates context_compare_modes.json with structured data for MCP servers
```

## Understanding the Numbers

**Savings vs Raw Source:** Shows how much you save compared to just concatenating all source files. Higher is better. Header mode typically saves ~70%, header+style saves ~38%. Full mode actually costs more (~30% overhead) due to contract structure.

**Savings vs Full Context:** Shows efficiency compared to the maximum mode. Header saves ~77%, header+style saves ~52%.

**GPT-4o vs Claude:** Token counts differ slightly (usually 5-10%) because each model tokenizes differently. Both estimates are shown so you can plan for either.

**Accuracy:** Approximations are usually within 10-15% and fine for planning. Tokenizers give exact counts but require installation.

## Common Questions

**Why are my numbers different from raw file sizes?**
Token counts ≠ character counts. Tokenizers split text into semantic units—common words are 1 token, rare words are multiple, code symbols vary, whitespace compresses.
**Why don’t tokens match file size?** Tokens ≠ bytes; tokenizers split code and prose unevenly.

**Should I always use accurate tokenizers?**
Use approximations for development/prototyping. Use tokenizers for production, tight budgets, or comparing tools.
**When are tokenizers worth it?** Tight budgets, production gates, comparing tools. Approximations are fine for day-to-day.

**How much overhead do contracts add?**
In `full` mode, contracts add ~30% overhead vs raw source due to JSON structure and metadata. The overhead is worth it for structured dependency graphs and better AI comprehension, but `header` mode avoids most of it while still giving you what you need.
**`full` overhead?** JSON + metadata on top of embedded source; `header` avoids most of that.

**Why do the savings percentages seem generous?**
"Savings vs raw source" compares against simple file concatenation. Header mode saves 70% because it extracts contracts and signatures without implementation code. Full mode actually costs more than raw source (~30% overhead) due to contract structure. The real win: header mode gives you 90% of what you need at 30% of the cost.
### Why can Header show more tokens than Raw source?

**Can I compare specific folders?**
Yes:
```bash
stamp context ./src/components --compare-modes
```
[What the two baselines mean](#what-the-two-baselines-mean) — raw is one copy of each file; the header row is **every root bundle** summed.

**Does --compare-modes write files?**
No, it's analysis-only by default. It generates contracts in memory, computes estimates, and displays tables. Use `stamp context` (without the flag) to actually generate context files.
**Folder scope?** `stamp context ./src/components --compare-modes`

With `--stats`, it writes `context_compare_modes.json` for MCP integration:
```bash
stamp context --compare-modes --stats
```
**Writes files?** No, unless `--stats` (JSON). Normal `stamp context` writes bundles.

## Performance

Takes 2-3x longer than normal generation because it regenerates contracts with and without style for accurate comparison. Uses in-memory processing (no disk writes). Typical execution: 5-15 seconds for medium projects (50-150 files).
~2–3× normal run; in-memory, no bundle files unless `--stats`.

## Related Commands

- [`stamp context`](context.md) - Generate context files
- [`stamp context style`](style.md) - Generate context with style metadata
- [`stamp context compare`](compare.md) - Compare context changes over time
- [`stamp context validate`](validate.md) - Validate schema compliance
- [`stamp context`](context.md)
- [`stamp context style`](style.md)
- [`stamp context compare`](compare.md)
- [`stamp context validate`](validate.md)

## Tips

- Run `--compare-modes` before committing to a mode
- Use approximations in dev, tokenizers in production
- Default to `header` mode—it covers most use cases
- Add `header+style` only when you need visual context
- Reserve `full` mode for deep implementation work
- Check costs regularly as your codebase grows
- Default **header**; add **header+style** only for UI-heavy prompts.
- Re-run as the repo grows.
10 changes: 5 additions & 5 deletions docs/context/cli/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ stamp context [path] [options]

**Setup:** `stamp context` respects preferences saved in `.logicstamp/config.json` and never prompts. On first run (no config), it defaults to skipping both `.gitignore` and `LLM_CONTEXT.md` setup for CI-friendly behavior. Use [`stamp init`](init.md) to configure these options (non-interactive by default; use `--no-secure` for interactive mode).

**File Exclusion:** `stamp context` respects `.stampignore` and excludes those files from context compilation. You'll see how many files were excluded (unless using `--quiet`). Use `stamp ignore <file>` to add files to `.stampignore`. `.stampignore` is completely optional and independent of security scanning. See [stampignore.md](../stampignore.md) for details.
**File Exclusion:** `stamp context` respects `.stampignore` and excludes those files from context compilation. You'll see how many files were excluded (unless using `--quiet`). Use `stamp ignore <file>` to add files to `.stampignore`. `.stampignore` is completely optional and independent of security scanning. See [stampignore.md](../reference/stampignore.md) for details.

**Secret Sanitization:** If a security report (`stamp_security_report.json`) exists, `stamp context` automatically replaces detected secrets with `"PRIVATE_DATA"` in the generated JSON files. **Your source code files are never modified** - only the generated context files contain sanitized values. See [security-scan.md](security-scan.md) for details.

Expand Down Expand Up @@ -187,9 +187,9 @@ Example `.stampignore`:
}
```

`.stampignore` is completely optional and can be created manually. It's independent of security scanning. See [stampignore.md](../stampignore.md) for complete documentation.
`.stampignore` is completely optional and can be created manually. It's independent of security scanning. See [stampignore.md](../reference/stampignore.md) for complete documentation.

For complete documentation on `.stampignore` file format, see [stampignore.md](../stampignore.md).
For complete documentation on `.stampignore` file format, see [stampignore.md](../reference/stampignore.md).

## Secret Sanitization

Expand All @@ -205,8 +205,8 @@ When generating context files, LogicStamp automatically sanitizes secrets if a s

Source code:
```typescript
const apiKey = 'sk_live_1234567890abcdef';
const password = 'mySecretPassword123';
const apiKey = 'PLACEHOLDER_KEY_1234567890abcdef';
const password = 'FAKE_PASSWORD_FOR_DOCS_12345678';
```

Generated `context.json`:
Expand Down
4 changes: 2 additions & 2 deletions docs/context/cli/ignore.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,7 +181,7 @@ This ensures consistent formatting regardless of how you specify paths.
- `src/**/*.test.ts` - Matches all `.test.ts` files under `src/`
- `config/*.json` - Matches all `.json` files directly in `config/`

See [stampignore.md](../stampignore.md) for more details on glob pattern syntax.
See [stampignore.md](../reference/stampignore.md) for more details on glob pattern syntax.

## Best Practices

Expand All @@ -193,7 +193,7 @@ See [stampignore.md](../stampignore.md) for more details on glob pattern syntax.

## See Also

- [stampignore.md](../stampignore.md) - Complete `.stampignore` file format and usage guide
- [stampignore.md](../reference/stampignore.md) - Complete `.stampignore` file format and usage guide
- [context.md](context.md) - How `.stampignore` affects context compilation
- [security-scan.md](security-scan.md) - Security scanning to detect secrets in your codebase

Loading
Loading