AI-powered code scanner for immediate, background review of uncommitted changes as you work. Code Scanner continuously monitors your working directory and provides instant feedback on code issues before you commit—helping you catch bugs, style problems, and architectural issues early in your local development workflow. Uses local LLMs (LM Studio or Ollama) to identify issues based on configurable checks. Your code never leaves your machine.
⭐ Star this project on GitHub to support its development! Code-Scanner on GitHub
Code Scanner is like having a senior code reviewer watching over your shoulder 24/7—without sending your code to the cloud.
- Privacy First: All analysis happens on your machine. Your proprietary code stays yours.
- Zero Cost: No API subscriptions, no token limits. Use your own hardware.
- Language Agnostic: Works with any programming language—Python, JavaScript, C++, Java, Rust, and more.
- Continuous Monitoring: Runs in the background, scanning every change you make in real-time.
- Smart Context: AI tools let the LLM explore your codebase to find architectural issues that simple linters miss.
Get Code Scanner running in 5 minutes:
Ubuntu/Debian:
sudo apt install git universal-ctags ripgrepmacOS:
brew install git universal-ctags ripgrepWindows:
For detailed Windows setup including multiple installation options, see Windows Setup Guide.
Quick option using Chocolatey (if installed):
choco install git universal-ctags ripgrepOr using Scoop (if installed):
scoop install git universal-ctags ripgreppip install code-scannerOr using uv (recommended):
uv pip install code-scannerVerify installation:
code-scanner --versionLM Studio (GUI-based, easier)
- Download from lmstudio.ai
- Load the model "qwen2.5-coder-7b-instruct"
- Start the server (default:
localhost:1234)
Copy a sample config to your project (choose one based on your programming language):
- Python:
sample_configs/python-config.toml - JavaScript:
sample_configs/javascript-config.toml - C++:
sample_configs/cpp-config.tomlorsample_configs/cpp-qt-config.toml - Java:
sample_configs/java-config.toml - Android:
sample_configs/android-config.toml - iOS/macOS:
sample_configs/ios-macos-config.toml
code-scanner /path/to/your/project -c code_scanner_config.tomlThe scanner will:
- Monitor your Git repository for changes
- Scan modified files every 30 seconds
- Report issues to
code_scanner_results.md
Open code_scanner_results.md in your project directory to see:
- Files with issues
- Line numbers
- Issue descriptions
- Suggested fixes
- Resolution status (OPEN/RESOLVED)
Pro tip: Keep the results file open in your IDE. It updates in real-time as issues are found!
- 🏠 100% Local (Privacy first): Uses LM Studio or Ollama with local APIs. All processing happens on your machine, no cloud required.
- 🖥️ Hardware Efficient: Designed for small local models. Runs comfortably on consumer GPUs like NVIDIA RTX 3060.
- 💰 Cost Effective: Zero token costs. Use your local resources instead of expensive API subscriptions.
- 🔍 Language-agnostic: Works with any programming language.
- 🧰 AI Tools for Context Expansion: LLM can interactively request additional codebase information (find usages, read files, list directories) for sophisticated architectural checks.
- 🛡️ Hallucination Prevention: Validates file paths from LLM responses with helpful suggestions for similar files when paths don't exist.
- ⚡ Continuous Monitoring: Runs in background mode, monitoring Git changes every 30 seconds and scanning indefinitely until stopped (
Ctrl+C). - 🔄 Smart Change Detection: Efficient git status caching with configurable TTL prevents redundant git operations. When changes are detected mid-scan, continues from current check with refreshed file contents (preserves progress).
- 📊 Issue Tracking: Tracks issue lifecycle (new, existing, resolved) with scoped resolution—issues are only resolved for files that were actually scanned.
- 📝 Real-time Updates: Output file updates immediately when issues are found (not just at end of scan).
- 🔧 Configurable Checks: Define checks in plain English via TOML configuration with file pattern support.
- 📖 Daemon-Ready: Fully uninteractive mode—no prompts, configurable via file only. Supports autostart on all platforms.
- ✅ Well-Tested: 89% code coverage with 1064 unit tests ensuring reliability and maintainability.
Code Scanner helps developers and teams in various scenarios:
Find issues before pushing to remote:
- Memory leaks and null pointer dereferences
- Race conditions and thread safety issues
- Missing error handling
- Logic errors and edge cases
Example:
# Before Code Scanner
def process_data(data):
result = []
for item in data:
result.append(item * 2)
return result
# Code Scanner reports:
# ❌ process_data.py:3 - Missing type hints
# Suggested: def process_data(data: list[int]) -> list[int]:
#
# ❌ process_data.py:4 - Inefficient list append in loop
# Suggested: Use list comprehension: return [item * 2 for item in data]Ensure team-wide consistency:
- C++: RAII usage, proper const correctness, smart pointers
- Python: Type hints, docstrings, PEP 8 compliance
- JavaScript: ESLint rules, async/await patterns
- Java: Proper exception handling, resource management
Detect high-level design issues:
- MVC pattern violations
- Circular dependencies
- Layering violations (UI accessing database directly)
- Duplicate code across modules
Identify potential vulnerabilities:
- SQL injection risks
- XSS vulnerabilities in web code
- Hardcoded credentials
- Insecure random number generation
Find improvement opportunities:
- Dead code and unused functions
- Performance bottlenecks
- Code complexity issues
- Missing documentation
| Feature | Code Scanner | Traditional Linters | Cloud AI Tools |
|---|---|---|---|
| Privacy | ✅ 100% local | ✅ 100% local | ❌ Code sent to cloud |
| Cost | ✅ Free (your hardware) | ✅ Free | ❌ API subscriptions |
| Context Awareness | ✅ AI tools explore codebase | ❌ File-by-file only | ✅ Full codebase |
| Custom Checks | ✅ Plain English prompts | ❌ Complex rules | ✅ Plain English |
| Real-time | ✅ Continuous monitoring | ❌ Manual runs | ❌ Manual runs |
| Language Support | ✅ Any language | ❌ Language-specific | ✅ Any language |
| Architectural Analysis | ✅ Cross-file analysis | ❌ Single file | ✅ Cross-file |
- Python 3.10 or higher
- Git (for tracking file changes)
- Universal Ctags (for symbol indexing)
- ripgrep (for fast code search)
pip install code-scanneruv is faster and more reliable for Python package management:
pip install uv
uv pip install code-scannergit clone https://github.com/ubego/Code-Scanner.git
cd Code-Scanner
uv pip install -e .Or using uv sync (recommended for development):
git clone https://github.com/ubego/Code-Scanner.git
cd Code-Scanner
uv sync
uv run code-scanner --helpcode-scanner --versionExpected output:
code-scanner X.Y.Z
For detailed setup instructions including autostart configuration:
- Linux Setup - systemd service, desktop integration
- macOS Setup - LaunchAgent, Homebrew setup
- Windows Setup - Task Scheduler, Chocolatey/Scoop installation
Code Scanner uses TOML configuration files to define:
- LLM backend settings
- File patterns to scan
- Custom checks in plain English
[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
timeout = 120 # Request timeout in seconds
context_limit = 32768 # Model's context window (tokens)
[[checks]]
pattern = "*.py"
checks = [
"Check for bugs and issues",
"Check for security vulnerabilities",
"Check for type hints and docstrings"
][llm]
backend = "lm-studio"
host = "localhost"
port = 1234
# model = "specific-model-name" # Optional - uses first loaded model
context_limit = 32768 # Model's context window (tokens)
[[checks]]
pattern = "*.cpp, *.h"
checks = [
"Check for memory leaks",
"Check that RAII is used properly",
"Check for null pointer dereferences"
]| Parameter | Required | Description | Example |
|---|---|---|---|
backend |
✅ Yes | LLM backend: "ollama" or "lm-studio" |
"ollama" |
host |
✅ Yes | Backend server host | "localhost" |
port |
✅ Yes | Backend server port | 11434 |
model |
Ollama only | Model name (Ollama) | "qwen3:4b" |
timeout |
No | Request timeout in seconds | 120 |
context_limit |
✅ Yes | Model's context window size in tokens | 16384 |
Recommended context_limit values:
4096- Small models (e.g., CodeLlama 7B)8192- Medium models16384- Recommended minimum for most use cases32768- Large models (e.g., DeepSeek-Coder V2)131072- Very large models
Each [[checks]] section defines a group of checks for files matching a pattern.
| Parameter | Required | Description | Example |
|---|---|---|---|
pattern |
✅ Yes | Glob pattern for files | "*.py" or "*.cpp, *.h" |
checks |
✅ Yes | List of check prompts | ["Check for bugs"] |
Pattern Examples:
"*"- All files"*.py"- Python files only"*.cpp, *.h"- C++ files (headers and sources)"*.js, *.ts, *.jsx, *.tsx"- JavaScript/TypeScript files"src/**/*.py"- Python files in src directory
Exclude files from scanning using empty checks lists:
[[checks]]
pattern = "*.md, *.txt, *.json"
checks = [] # Empty list = ignore these files
[[checks]]
pattern = "/*tests*/, /*build*/, /*vendor*/"
checks = [] # Ignore entire directoriesIgnore pattern syntax:
- File patterns:
"*.md, *.txt"- matches by extension - Directory patterns:
"/*tests*/"- matches files in any directory named "tests" - Wildcards:
"/*cmake-build-*/"- matches cmake-build-debug, cmake-build-release, etc.
Ready-to-use configs for common languages:
sample_configs/python-config.tomlsample_configs/javascript-config.tomlsample_configs/cpp-config.tomlsample_configs/cpp-qt-config.tomlsample_configs/java-config.tomlsample_configs/android-config.tomlsample_configs/ios-macos-config.toml
Code Scanner validates configs strictly:
- Only
[llm]and[[checks]]sections are allowed - Unknown parameters cause immediate errors
- Missing required parameters cause immediate errors
Error messages show:
- Unsupported parameters
- Supported alternatives
- Line numbers for easy fixing
Scan a single project:
code-scanner /path/to/projectUses default config: code_scanner_config.toml in the project directory.
code-scanner /path/to/project -c /path/to/config.tomlMonitor multiple projects simultaneously:
code-scanner /path/to/project1 -c /path/to/config1 /path/to/project2 -c /path/to/config2How it works:
- Each project has its own config and output file
- Scanner automatically switches to the project with most recent changes
- State is preserved for all projects in memory
- Seamless switching without restarting
Scan changes relative to a specific commit:
code-scanner /path/to/project --commit abc123 -c /path/to/config.tomlUseful for:
- Scanning cumulative changes against a parent branch
- Reviewing pull requests before merging
- Analyzing feature branches
Each project generates:
code_scanner_results.md- Issues and findings (in project directory)code_scanner_results.md.bak- Backup of previous results (in project directory)
System-wide files:
~/.code-scanner/code_scanner.log- Detailed logs (platform-specific path)~/.code-scanner/code_scanner.lock- Lock file to prevent multiple instances
nohup code-scanner /path/to/project > /dev/null 2>&1 &Start-Process -WindowStyle Hidden code-scanner -ArgumentList "/path/to/project"Run Code Scanner automatically when your system starts:
Linux:
./scripts/autostart-linux.sh install "/path/to/project -c /path/to/config.toml"macOS:
./scripts/autostart-macos.sh install "/path/to/project -c /path/to/config.toml"Windows:
scripts\autostart-windows.bat install "/path/to/project -c /path/to/config.toml"See platform-specific setup docs for details.
Press Ctrl+C to stop. The scanner:
- Completes the current check
- Cleans up lock files
- Exits gracefully
Code Scanner provides powerful AI tools that let the LLM explore your codebase during analysis. This enables sophisticated checks that go beyond simple pattern matching.
| Tool | Description | Use Case |
|---|---|---|
search_text |
Fast text search using ripgrep | Find function usages, locate patterns |
read_file |
Read file contents | Get context from related files |
list_directory |
List directory contents | Understand project structure |
get_file_diff |
Get git diff for a file | See what changed |
get_file_summary |
Get file statistics | Understand file before reading |
symbol_exists |
Check if symbol exists | Verify function/class exists |
find_definition |
Find symbol definition | Locate where symbol is defined |
find_symbols |
Find symbols by pattern | Search for related functions |
get_enclosing_scope |
Get parent scope | Understand code context |
find_usages |
Find all usages of a symbol | Track symbol usage across codebase |
[[checks]]
pattern = "*"
checks = [
"Check for MVC pattern violations: UI code should not directly access database. Use search_text to find database queries in UI files, and read_file to verify context."
]How the LLM uses tools:
- Scans UI files for database-related keywords (
search_text) - Reads suspicious files to verify context (
read_file) - Reports violations with specific file locations and line numbers
[[checks]]
pattern = "*.py"
checks = [
"Find duplicate or similar function implementations. Use search_text to find function definitions with similar names, then read_file to compare implementations."
][[checks]]
pattern = "*"
checks = [
"Check for inconsistent naming patterns across the codebase. Use list_directory to explore structure, then read_file to verify naming conventions."
]- Ctags: Used for symbol indexing (functions, classes, variables)
- Ripgrep: Used for fast text search
- Git: Used for diff operations
- All tools are local and respect
.gitignore
- Symbol index is generated asynchronously at startup
- Index is regenerated when switching projects
- Tool calls are tracked to prevent context overflow
- Inactive projects don't maintain indexes (saves memory)
Monitor multiple projects with a single scanner instance:
code-scanner /path/to/project1 -c /path/to/config1 /path/to/project2 -c /path/to/config2Key Features:
- Automatic Switching: Scanner switches to project with most recent changes
- Non-blocking: Current check completes before switching
- State Preservation: Each project maintains its own issue tracker state
- Smart LLM Management: Client reused if configs are identical
- Separate Outputs: Each project has its own
code_scanner_results.md
Project Switching Behavior:
- Scanner detects which project has most recent changes (based on file modification times)
- Waits for current check to complete
- Switches to active project
- If LLM configs differ, disconnects and reconnects
- Regenerates ctags index for new active project
- Continues scanning
Scan changes relative to a specific commit:
code-scanner /path/to/project --commit abc123 -c /path/to/config.tomlUse Cases:
- Review cumulative changes in a feature branch
- Analyze pull request before merging
- Compare against stable branch
Behavior:
- Scans all uncommitted changes
- Compares against specified commit
- Includes untracked files
- Continues monitoring for new changes after initial scan
Code Scanner tracks issue state within a session:
States:
NEW- Issue detected for the first timeEXISTING- Issue detected in previous scan, still presentRESOLVED- Issue was detected before, no longer present
Smart Matching:
- Issues are matched by file and issue nature (not just line number)
- Fuzzy string comparison with configurable threshold (default: 0.8)
- Line numbers update if code shifts
- Prevents duplicate issues for the same problem
Scoped Resolution:
- Issues only resolved for files that were actually scanned
- Prevents false resolution from LLM non-determinism
- Resolved issues remain in log for historical tracking
When code exceeds model's context window:
- Group by directory hierarchy - Batch files from same directory
- Deterministic batching - Sort alphabetically, deepest-first
- File-by-file fallback - If directory group still too large
- Skip oversized files - Log warning and continue
- Merge results - Combine issues from all batches
Token Tracking:
- Uses 55% of context limit for file content
- Tracks accumulated tokens during tool calls
- Stops tool calling at 85% context usage
- Prevents overflow during multi-turn conversations
Change Detection:
- Monitors staged, unstaged, and untracked files
- Respects
.gitignorepatterns - Efficient caching with configurable TTL (default: 5 seconds) for low CPU usage during idle
- Detects file modifications via content hashes
Conflict Handling:
- Waits for merge/rebase conflicts to resolve
- Polls for completion status
- Resumes scanning automatically
Binary Files:
- Silently skipped during scanning
- Tracked to prevent infinite rescan loops
Code Scanner works with local LLM servers that provide OpenAI-compatible APIs.
Best for: GUI users, trying different models, visual feedback
Installation:
- Download from lmstudio.ai
- Install and launch
- Load the model "qwen2.5-coder-7b-instruct"
- Start server (default:
localhost:1234)
Recommended Models:
- DeepSeek-Coder V2 (excellent for code analysis)
- Qwen2.5-Coder (fast, good balance)
- CodeLlama 7B/13B (lighter option)
Configuration:
[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
context_limit = 32768Best for: CLI users, automation, simpler setup, headless servers
Installation:
curl -fsSL https://ollama.ai/install.sh | shPull a model:
ollama pull qwen3:4b
ollama run qwen3:4bRecommended Models:
qwen3:4b- Fast, good for code (4B parameters)qwen3:7b- Better accuracy (7B parameters)deepseek-coder:6.7b- Excellent for code analysiscodellama:7b- Lightweight option
Configuration:
[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
context_limit = 32768Minimum:
- CPU: Any modern multi-core processor
- RAM: 8GB (16GB recommended)
- GPU: Not required (CPU inference works, but slower)
Recommended:
- GPU: NVIDIA RTX 3060 (12GB VRAM) or better
- RAM: 16GB+
- Storage: SSD for faster file access
Performance Tips:
- Use smaller models (4B-7B) for faster scanning
- Increase
context_limitfor larger codebases - Use GPU acceleration if available
- Reduce check complexity for faster results
Problem: Scanner can't connect to LM Studio or Ollama
Solutions:
-
Ensure LM Studio or Ollama is running:
- LM Studio: Check that server is started (green indicator)
- Ollama: Run
ollama listto verify it's running
-
Check host/port in config:
# Test connection curl http://localhost:1234/v1/models # LM Studio curl http://localhost:11434/api/tags # Ollama
-
Verify model is loaded:
- LM Studio: Model must be loaded in the server
- Ollama: Run
ollama listto see available models
Problem: Target directory isn't a Git repository
Solution: Initialize git in your project:
cd /path/to/project
git init
git add .
git commit -m "Initial commit"Problem: Scanner is waiting for changes
Solutions:
- Make a change to a tracked file
- Add a new file:
touch newfile.py - Stage files:
git add . - Check if files are in
.gitignore
Problem: Code is too large for model's context window
Solutions:
- Increase
context_limitin config (if model supports it) - Use ignore patterns to exclude large files/directories
- Use a model with larger context window
- Split checks into smaller, more specific queries
Problem: Scanner takes too long to complete
Solutions:
-
Use ignore patterns to exclude test/build directories:
[[checks]] pattern = "/*tests*/, /*build*/, /*node_modules*/" checks = []
-
Reduce
context_limitif model is slow -
Use a smaller/faster model (e.g., 4B instead of 7B)
-
Reduce number of checks
-
Use file patterns to scan only relevant files
Problem: LLM returns invalid JSON
Solutions:
- Scanner automatically retries (up to 3 times)
- Try a different model
- Reduce
context_limitto prevent overflow - Simplify check prompts
Problem: "Another instance is already running"
Solutions:
- Check if another scanner is running:
ps aux | grep code-scanner - If not, remove stale lock file:
rm ~/.code-scanner/code_scanner.lock - Verify PID in lock file is not running
- Documentation: Check platform-specific setup docs
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
Enable verbose logging for troubleshooting:
code-scanner /path/to/project --verboseLog file location (platform-specific):
- Linux:
~/.code-scanner/code_scanner.log - macOS:
~/Library/Application Support/code-scanner/code_scanner.log - Windows:
%APPDATA%\code-scanner\code_scanner.log
src/code_scanner/
├── models.py # Data models (LLMConfig, Issue, etc.)
├── config.py # Configuration loading and validation
├── base_client.py # Abstract base class for LLM clients
├── lmstudio_client.py # LM Studio client implementation
├── ollama_client.py # Ollama client implementation
├── ai_tools.py # AI tool executor for context expansion
├── text_utils.py # Text processing utilities
├── git_watcher.py # Git repository monitoring
├── issue_tracker.py # Issue lifecycle management
├── output.py # Markdown report generation
├── scanner.py # AI scanning logic
├── cli.py # CLI and application coordinator
├── utils.py # Utility functions
└── __main__.py # Entry point
# Run all tests
uv run pytest
# Verbose output
uv run pytest -v
# Specific test file
uv run pytest tests/test_scanner.py -v
# Coverage report
uv run pytest --cov=code_scanner --cov-report=term-missing
# HTML coverage report
uv run pytest --cov=code_scanner --cov-report=html
# Open htmlcov/index.html in browser92% code coverage with 905 unit tests
We welcome contributions! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
GNU Affero General Public License v3.0
See LICENSE for details.
For detailed platform-specific setup instructions:
- Linux Setup - systemd service, desktop integration
- macOS Setup - LaunchAgent, Homebrew setup
- Windows Setup - Task Scheduler, Chocolatey/Scoop installation
Made with ❤️ by the Ubego team
