GitHub - K1-R1/smoosh: One command. Every doc. Smooshed. Pure bash CLI for RAG ingestion.

Turn any git repo into AI-ready context — for NotebookLM, Claude Projects, ChatGPT, or your own RAG pipeline. Pure bash, zero dependencies.

Quick Start · Why smoosh? · Features · Installation · Uninstall · Usage · AI Tools · Agent / CI · Config Reference · FAQ

Quick Start

# Install
brew install K1-R1/tap/smoosh

# In any git repo:
smoosh           # docs only (default)
smoosh --code    # docs + code files
smoosh --all     # everything tracked by git

Output lands in _smooshes/ — chunked, verified .md files ready to drop into your AI tool of choice.

Why smoosh?

AI tools are powerful when they have the right context. The hard part is getting an entire codebase into them — in the right format, within token limits, without accidentally including secrets. smoosh handles all of that in one command.

Understand your codebase in plain language. Upload smoosh output to NotebookLM and ask questions about architecture, module boundaries, or what that obscure utility actually does. Technical knowledge becomes accessible to everyone on the team — not just the people who wrote the code. Product, design, and leadership get answers without reading source files.

Give AI real context. Drop the output into Claude Projects or ChatGPT and get an assistant that actually knows your codebase. No hallucinated function signatures, no "I don't have access to that file." It can answer questions about any file, understand cross-module relationships, and suggest changes that fit your existing patterns.

Onboard in hours. New team members get a searchable snapshot of the entire codebase before they even clone the repo. Pair it with NotebookLM and they can ask the codebase questions on day one.

Ground your agents in fact. smoosh output is optimised for retrieval-augmented generation (RAG) — chunked within token limits, with file path metadata preserved. Instead of hallucinating, your agents retrieve real context from your actual code.

Private by default. Everything runs locally. Your code never leaves your machine unless you choose to upload it. No API keys, no SaaS accounts, no telemetry.

Features

File type presets — --docs (default: md, rst, txt, adoc), --code (adds all code extensions), --all (everything)
Smart chunking — stays within word limits; names chunks project_part1.md, project_part2.md
100% verification — every chunk is integrity-checked against the expected file list; exits 4 on mismatch
Interactive mode — guided first-run experience: scans your repo, shows a breakdown, lets you pick a mode
Remote repositories — smoosh https://github.com/user/repo — clones and processes in one step
Secrets detection — warns about AWS keys, GitHub PATs, PEM private key blocks; honest about scope
Output formats — Markdown (default), plain text, XML with CDATA sections
Table of contents — --toc generates a per-chunk file index with word counts
Line numbers — --line-numbers for code review workflows
Dry run — --dry-run shows what would be included with word counts, no files written
Agent-native — designed to be called by AI agents and CI pipelines, not just humans. --json for structured output, --no-interactive for headless runs, exit codes 0–7 for programmatic decision-making

Power user workflow

Preview, filter, and pipe — all from flags:

Installation

Homebrew (macOS / Linux)

brew install K1-R1/tap/smoosh

curl (macOS / Linux / Git Bash)

curl -fsSL https://raw.githubusercontent.com/K1-R1/smoosh/main/install.sh | bash

Installs to /usr/local/bin. Override with:

SMOOSH_INSTALL_DIR="$HOME/.local/bin" \
  curl -fsSL https://raw.githubusercontent.com/K1-R1/smoosh/main/install.sh | bash

The installer supports these environment variables:

Variable	Default	Description
`SMOOSH_INSTALL_DIR`	`/usr/local/bin`	Installation directory
`SMOOSH_VERSION`	latest	Pin a specific version (e.g. `1.0.1`)
`SMOOSH_NO_CONFIRM`	`0`	Set to `1` to skip confirmation prompt
`SMOOSH_NO_VERIFY`	`0`	Set to `1` to skip checksum verification (unsafe)

Manual

curl -fsSL https://github.com/K1-R1/smoosh/releases/latest/download/smoosh -o smoosh
curl -fsSL https://github.com/K1-R1/smoosh/releases/latest/download/smoosh.sha256 -o smoosh.sha256
sha256sum -c smoosh.sha256
chmod +x smoosh
sudo mv smoosh /usr/local/bin/

Uninstall

# Homebrew
brew uninstall smoosh

# curl / manual
rm "$(which smoosh)"

If you installed via both methods, check which smoosh after removing one — a second copy may remain in a different location.

Usage

Basics

smoosh                              # interactive mode when run with no args
smoosh .                            # current directory (docs mode)
smoosh /path/to/repo                # specific local repo
smoosh https://github.com/user/repo # remote repo — clone + process in one step

File types

smoosh --docs    # markdown, rst, txt, adoc, asciidoc, org, tex (default)
smoosh --code    # docs + py, js, ts, rs, go, java, rb, and many more
smoosh --all     # everything tracked by git (binary files excluded via MIME check)

Filtering

smoosh --only "*.py"                   # Python files only (overrides mode)
smoosh --include "*.vue,*.graphql"     # add extensions to current mode
smoosh --exclude "vendor/*,test/*"     # exclude matching paths
smoosh --include-hidden                # include .github/, .env.example, dotfiles

Output options

smoosh --format md             # Markdown with ### File: headers (default)
smoosh --format text           # plain text with === separators
smoosh --format xml            # XML with CDATA sections (for structured pipelines)
smoosh --toc                   # table of contents in each chunk
smoosh --line-numbers          # prefix each line with its number
smoosh --max-words 200000      # custom chunk size (default: 450,000)
smoosh --output-dir ./context  # write to a custom directory

Preview and automation

smoosh --dry-run               # show file list + word counts, no output written
smoosh --quiet                 # print output paths only, one per line (for piping)
smoosh --json                  # structured JSON to stdout
smoosh --no-interactive        # skip interactive mode, use flag defaults
smoosh --no-check-secrets      # skip the secrets scan

Combining flags

# Full code review context with TOC and line numbers
smoosh --code --toc --line-numbers

# Python-only export for a RAG pipeline
smoosh --only "*.py" --format xml --output-dir ./pipeline-input

# Preview what a remote repo contains before processing
smoosh --dry-run https://github.com/user/repo

# Quiet mode for scripting
files=$(smoosh --quiet --code .)
echo "Generated: ${files}"

Using smoosh with AI tools

NotebookLM

Step 1 — Install smoosh

brew install K1-R1/tap/smoosh

Step 2 — Run smoosh in your repo

cd your-project
smoosh          # docs only — usually the right start

Output lands in your-project/_smooshes/:

smoosh --code

Step 3 — Upload to NotebookLM

Go to notebooklm.google.com and create a notebook.
Click Add source → Upload file.
Upload each .md file from _smooshes/.
For large repos with multiple chunks, upload all of them.

Step 4 — Chat with your codebase

Ask about architecture, find functions, generate onboarding guides, or get plain-English explanations of complex modules. No hallucinations, all sources cited.

NotebookLM limits (as of early 2026):

Plan	Sources per notebook	Words per source
Free	50	500,000
Plus	300	500,000
Ultra	600	500,000

smoosh warns you when your repo produces more chunks than your plan allows.

Claude Projects

Run smoosh --code in your repo.
Create a new Claude Project and open the project knowledge panel.
Upload the files from _smooshes/.

Claude now has full context over your codebase — ask about any file, request changes that fit your existing patterns, or get architecture explanations grounded in your actual code.

ChatGPT

Run smoosh --code in your repo.
Open a ChatGPT conversation and attach the files from _smooshes/.
For ongoing use, add them as knowledge files in ChatGPT.

Works with any ChatGPT plan that supports file uploads.

Agents and CI pipelines

smoosh is designed to be called by AI agents and CI pipelines, not just humans.

Pre-flight check — estimate size before generating output:

smoosh --json --dry-run --all .

{
  "dry_run": true,
  "repo": "my-project",
  "files": [
    {"path": "README.md", "words": 194, "chunk": 1},
    {"path": "src/main.py", "words": 312, "chunk": 1}
  ],
  "total_words": 506,
  "estimated_tokens": 658,
  "estimated_chunks": 1
}

Generate output:

smoosh --no-interactive --json --all .

Key flags for automation:

Flag	Purpose
`--no-interactive`	Skip TTY detection and prompts
`--json`	Structured JSON to stdout (status messages go to stderr)
`--quiet`	Output file paths only, one per line
`--dry-run`	Preview without writing files
`--no-color`	Disable colour escape codes

Exit codes 0–7 are differentiated for programmatic decision-making — see Configuration Reference below.

Configuration Reference

Flag	Default	Description
`--docs`	yes	Include markdown, RST, TXT, AsciiDoc, Org, TeX
`--code`	—	Include docs + all code file types
`--all`	—	Include everything tracked by git
`--only GLOB`	—	Restrict to matching extensions (overrides mode)
`--include GLOB`	—	Add extensions to the current mode
`--exclude GLOB`	—	Exclude matching paths (comma-separated)
`--include-hidden`	—	Include dotfiles and dot-directories
`--max-words N`	450000	Words per output chunk
`--format FORMAT`	`md`	Output format: `md`, `text`, or `xml`
`--toc`	—	Add a table of contents to each chunk
`--line-numbers`	—	Prefix each line with its line number
`--output-dir PATH`	`_smooshes`	Directory for output files
`--dry-run`	—	Preview only — no output files written
`--quiet`	—	Print output paths only (stdout)
`--json`	—	Structured JSON to stdout
`--no-interactive`	—	Skip interactive mode even in a TTY
`--no-color`	—	Disable colour output
`--no-check-secrets`	—	Skip the basic secrets scan
`--version`	—	Print version and exit 0
`--help`	—	Print full usage and exit 0

Colour control: --no-color flag > NO_COLOR env var > FORCE_COLOR > CLICOLOR > TTY auto-detect. See no-color.org.

Exit codes:

Code	Meaning
0	Success
1	Invalid flags or arguments
2	Path not found or not a git repository
3	No matching files for current mode/filters
4	Verification failed — expected/actual file list mismatch
5	Remote clone failed (network, auth)
7	Write permission denied
130	Interrupted (Ctrl-C)

FAQ

Does smoosh respect .gitignore? Yes. It uses git ls-files which honours .gitignore. Untracked, ignored files are excluded by default.

What about large repos? smoosh chunks output at --max-words (default 450,000 words). Large repos produce multiple files named project_part1.md, project_part2.md, and so on.

Is the secrets detection reliable? No — it catches common patterns (AWS access keys, GitHub PATs, PEM private key blocks) but is not a substitute for dedicated tools like gitleaks or truffleHog. smoosh says this clearly when it warns.

Can I use smoosh with other AI tools? Yes — Gemini, Copilot, local models, custom pipelines. The output is plain Markdown, compatible with anything that accepts text files. Use --format text or --format xml if your tool prefers a different format.

Does it work on Windows? smoosh is tested on macOS and Linux. On Windows, use Git Bash or WSL.

The _smooshes/ directory appeared in my git status — is that normal? smoosh adds _smooshes/ to your .gitignore automatically on first run. If it still appears, check that your .gitignore syntax is correct.

Why is my word count different from what I expected? smoosh counts words using wc -w, which splits on whitespace. Code files with dense syntax (JSON, minified JS) count differently than prose.

Is it overengineered for a shell script? Absolutely. 228 tests, 100% file inclusion verification, CDATA escaping for XML output, and a box-drawing letter logo. But your codebase deserves to be smooshed properly.

Contributing

See CONTRIBUTING.md for development setup, code style, and the PR process.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.agent		.agent
.github		.github
assets		assets
docs/plans		docs/plans
test		test
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.shellcheckrc		.shellcheckrc
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MAINTAINING.md		MAINTAINING.md
README.md		README.md
SECURITY.md		SECURITY.md
cspell-project-words.txt		cspell-project-words.txt
cspell.json		cspell.json
install.sh		install.sh
prek.toml		prek.toml
smoosh		smoosh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick Start

Why smoosh?

Features

Power user workflow

Installation

Homebrew (macOS / Linux)

curl (macOS / Linux / Git Bash)

Manual

Uninstall

Usage

Basics

File types

Filtering

Output options

Preview and automation

Combining flags

Using smoosh with AI tools

NotebookLM

Claude Projects

ChatGPT

Agents and CI pipelines

Configuration Reference

FAQ

Contributing

License

About

Uh oh!

Releases 3

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Why smoosh?

Features

Power user workflow

Installation

Homebrew (macOS / Linux)

curl (macOS / Linux / Git Bash)

Manual

Uninstall

Usage

Basics

File types

Filtering

Output options

Preview and automation

Combining flags

Using smoosh with AI tools

NotebookLM

Claude Projects

ChatGPT

Agents and CI pipelines

Configuration Reference

FAQ

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 3

Contributors

Uh oh!

Languages