Code Scanner

AI-powered code scanner for immediate, background review of uncommitted changes as you work. Code Scanner continuously monitors your working directory and provides instant feedback on code issues before you commit—helping you catch bugs, style problems, and architectural issues early in your local development workflow. Uses local LLMs (LM Studio or Ollama) to identify issues based on configurable checks. Your code never leaves your machine.

⭐ Star this project on GitHub to support its development! Code-Scanner on GitHub

Why Code Scanner?

Code Scanner is like having a senior code reviewer watching over your shoulder 24/7—without sending your code to the cloud.

Privacy First: All analysis happens on your machine. Your proprietary code stays yours.
Zero Cost: No API subscriptions, no token limits. Use your own hardware.
Language Agnostic: Works with any programming language—Python, JavaScript, C++, Java, Rust, and more.
Continuous Monitoring: Runs in the background, scanning every change you make in real-time.
Smart Context: AI tools let the LLM explore your codebase to find architectural issues that simple linters miss.

Quick Start

Get Code Scanner running in 5 minutes:

Step 1: Install Prerequisites

Ubuntu/Debian:

sudo apt install git universal-ctags ripgrep

macOS:

brew install git universal-ctags ripgrep

Windows:

For detailed Windows setup including multiple installation options, see Windows Setup Guide.

Quick option using Chocolatey (if installed):

choco install git universal-ctags ripgrep

Or using Scoop (if installed):

scoop install git universal-ctags ripgrep

Step 2: Install Code Scanner

pip install code-scanner

Or using uv (recommended):

uv pip install code-scanner

Verify installation:

code-scanner --version

Step 3: Start a Local LLM

LM Studio (GUI-based, easier)

Download from lmstudio.ai
Load the model "qwen2.5-coder-7b-instruct"
Start the server (default: localhost:1234)

Step 4: Configuration

Copy a sample config to your project (choose one based on your programming language):

Python: sample_configs/python-config.toml
JavaScript: sample_configs/javascript-config.toml
C++: sample_configs/cpp-config.toml or sample_configs/cpp-qt-config.toml
Java: sample_configs/java-config.toml
Android: sample_configs/android-config.toml
iOS/macOS: sample_configs/ios-macos-config.toml

Step 5: Run the scanner

code-scanner /path/to/your/project -c code_scanner_config.toml

The scanner will:

Monitor your Git repository for changes
Scan modified files every 30 seconds
Report issues to code_scanner_results.md

Step 6: View Results

Open code_scanner_results.md in your project directory to see:

Files with issues
Line numbers
Issue descriptions
Suggested fixes
Resolution status (OPEN/RESOLVED)

Pro tip: Keep the results file open in your IDE. It updates in real-time as issues are found!

Features

Core Capabilities

🏠 100% Local (Privacy first): Uses LM Studio or Ollama with local APIs. All processing happens on your machine, no cloud required.
🖥️ Hardware Efficient: Designed for small local models. Runs comfortably on consumer GPUs like NVIDIA RTX 3060.
💰 Cost Effective: Zero token costs. Use your local resources instead of expensive API subscriptions.
🔍 Language-agnostic: Works with any programming language.

AI-Powered Analysis

🧰 AI Tools for Context Expansion: LLM can interactively request additional codebase information (find usages, read files, list directories) for sophisticated architectural checks.
🛡️ Hallucination Prevention: Validates file paths from LLM responses with helpful suggestions for similar files when paths don't exist.

Continuous Monitoring

⚡ Continuous Monitoring: Runs in background mode, monitoring Git changes every 30 seconds and scanning indefinitely until stopped (Ctrl+C).
🔄 Smart Change Detection: Efficient git status caching with configurable TTL prevents redundant git operations. When changes are detected mid-scan, continues from current check with refreshed file contents (preserves progress).
📊 Issue Tracking: Tracks issue lifecycle (new, existing, resolved) with scoped resolution—issues are only resolved for files that were actually scanned.
📝 Real-time Updates: Output file updates immediately when issues are found (not just at end of scan).

Configuration & Deployment

🔧 Configurable Checks: Define checks in plain English via TOML configuration with file pattern support.
📖 Daemon-Ready: Fully uninteractive mode—no prompts, configurable via file only. Supports autostart on all platforms.
✅ Well-Tested: 89% code coverage with 1064 unit tests ensuring reliability and maintainability.

Use Cases

Code Scanner helps developers and teams in various scenarios:

Before Commit: Catch Bugs Early

Find issues before pushing to remote:

Memory leaks and null pointer dereferences
Race conditions and thread safety issues
Missing error handling
Logic errors and edge cases

Example:

# Before Code Scanner
def process_data(data):
    result = []
    for item in data:
        result.append(item * 2)
    return result

# Code Scanner reports:
# ❌ process_data.py:3 - Missing type hints
#    Suggested: def process_data(data: list[int]) -> list[int]:
#
# ❌ process_data.py:4 - Inefficient list append in loop
#    Suggested: Use list comprehension: return [item * 2 for item in data]

Code Standards Enforcement

Ensure team-wide consistency:

C++: RAII usage, proper const correctness, smart pointers
Python: Type hints, docstrings, PEP 8 compliance
JavaScript: ESLint rules, async/await patterns
Java: Proper exception handling, resource management

Architectural Reviews

Detect high-level design issues:

MVC pattern violations
Circular dependencies
Layering violations (UI accessing database directly)
Duplicate code across modules

Security Scanning

Identify potential vulnerabilities:

SQL injection risks
XSS vulnerabilities in web code
Hardcoded credentials
Insecure random number generation

Code Quality

Find improvement opportunities:

Dead code and unused functions
Performance bottlenecks
Code complexity issues
Missing documentation

Code Scanner vs Other Tools

Feature	Code Scanner	Traditional Linters	Cloud AI Tools
Privacy	✅ 100% local	✅ 100% local	❌ Code sent to cloud
Cost	✅ Free (your hardware)	✅ Free	❌ API subscriptions
Context Awareness	✅ AI tools explore codebase	❌ File-by-file only	✅ Full codebase
Custom Checks	✅ Plain English prompts	❌ Complex rules	✅ Plain English
Real-time	✅ Continuous monitoring	❌ Manual runs	❌ Manual runs
Language Support	✅ Any language	❌ Language-specific	✅ Any language
Architectural Analysis	✅ Cross-file analysis	❌ Single file	✅ Cross-file

Installation

System Requirements

Python 3.10 or higher
Git (for tracking file changes)
Universal Ctags (for symbol indexing)
ripgrep (for fast code search)

Install Code Scanner

Using pip

pip install code-scanner

Using uv (recommended)

uv is faster and more reliable for Python package management:

pip install uv
uv pip install code-scanner

From Source

git clone https://github.com/ubego/Code-Scanner.git
cd Code-Scanner
uv pip install -e .

Or using uv sync (recommended for development):

git clone https://github.com/ubego/Code-Scanner.git
cd Code-Scanner
uv sync
uv run code-scanner --help

Verify Installation

code-scanner --version

Expected output:

code-scanner X.Y.Z

Platform-Specific Setup

For detailed setup instructions including autostart configuration:

Linux Setup - systemd service, desktop integration
macOS Setup - LaunchAgent, Homebrew setup
Windows Setup - Task Scheduler, Chocolatey/Scoop installation

Configuration

Code Scanner uses TOML configuration files to define:

LLM backend settings
File patterns to scan
Custom checks in plain English

Basic Configuration

For Ollama

[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
timeout = 120                # Request timeout in seconds
context_limit = 32768        # Model's context window (tokens)

[[checks]]
pattern = "*.py"
checks = [
    "Check for bugs and issues",
    "Check for security vulnerabilities",
    "Check for type hints and docstrings"
]

For LM Studio

[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
# model = "specific-model-name"  # Optional - uses first loaded model
context_limit = 32768        # Model's context window (tokens)

[[checks]]
pattern = "*.cpp, *.h"
checks = [
    "Check for memory leaks",
    "Check that RAII is used properly",
    "Check for null pointer dereferences"
]

Configuration Parameters

[llm] Section

Parameter	Required	Description	Example
`backend`	✅ Yes	LLM backend: `"ollama"` or `"lm-studio"`	`"ollama"`
`host`	✅ Yes	Backend server host	`"localhost"`
`port`	✅ Yes	Backend server port	`11434`
`model`	Ollama only	Model name (Ollama)	`"qwen3:4b"`
`timeout`	No	Request timeout in seconds	`120`
`context_limit`	✅ Yes	Model's context window size in tokens	`16384`

Recommended context_limit values:

4096 - Small models (e.g., CodeLlama 7B)
8192 - Medium models
16384 - Recommended minimum for most use cases
32768 - Large models (e.g., DeepSeek-Coder V2)
131072 - Very large models

[[checks]] Sections

Each [[checks]] section defines a group of checks for files matching a pattern.

Parameter	Required	Description	Example
`pattern`	✅ Yes	Glob pattern for files	`".py"` or `".cpp, *.h"`
`checks`	✅ Yes	List of check prompts	`["Check for bugs"]`

Pattern Examples:

"*" - All files
"*.py" - Python files only
"*.cpp, *.h" - C++ files (headers and sources)
"*.js, *.ts, *.jsx, *.tsx" - JavaScript/TypeScript files
"src/**/*.py" - Python files in src directory

Ignore Patterns

Exclude files from scanning using empty checks lists:

[[checks]]
pattern = "*.md, *.txt, *.json"
checks = []  # Empty list = ignore these files

[[checks]]
pattern = "/*tests*/, /*build*/, /*vendor*/"
checks = []  # Ignore entire directories

Ignore pattern syntax:

File patterns: "*.md, *.txt" - matches by extension
Directory patterns: "/*tests*/" - matches files in any directory named "tests"
Wildcards: "/*cmake-build-*/" - matches cmake-build-debug, cmake-build-release, etc.

Sample Configurations

Ready-to-use configs for common languages:

Configuration Validation

Code Scanner validates configs strictly:

Only [llm] and [[checks]] sections are allowed
Unknown parameters cause immediate errors
Missing required parameters cause immediate errors

Error messages show:

Unsupported parameters
Supported alternatives
Line numbers for easy fixing

Usage

Basic Usage

Scan a single project:

code-scanner /path/to/project

Uses default config: code_scanner_config.toml in the project directory.

Specify Config File

code-scanner /path/to/project -c /path/to/config.toml

Multiple Projects

Monitor multiple projects simultaneously:

code-scanner /path/to/project1 -c /path/to/config1 /path/to/project2 -c /path/to/config2

How it works:

Each project has its own config and output file
Scanner automatically switches to the project with most recent changes
State is preserved for all projects in memory
Seamless switching without restarting

Scan Specific Commit

Scan changes relative to a specific commit:

code-scanner /path/to/project --commit abc123 -c /path/to/config.toml

Useful for:

Scanning cumulative changes against a parent branch
Reviewing pull requests before merging
Analyzing feature branches

Output Files

Each project generates:

code_scanner_results.md - Issues and findings (in project directory)
code_scanner_results.md.bak - Backup of previous results (in project directory)

System-wide files:

~/.code-scanner/code_scanner.log - Detailed logs (platform-specific path)
~/.code-scanner/code_scanner.lock - Lock file to prevent multiple instances

Running in Background

Linux/macOS

nohup code-scanner /path/to/project > /dev/null 2>&1 &

Windows (PowerShell)

Start-Process -WindowStyle Hidden code-scanner -ArgumentList "/path/to/project"

Autostart on Boot

Run Code Scanner automatically when your system starts:

Linux:

./scripts/autostart-linux.sh install "/path/to/project -c /path/to/config.toml"

macOS:

./scripts/autostart-macos.sh install "/path/to/project -c /path/to/config.toml"

Windows:

scripts\autostart-windows.bat install "/path/to/project -c /path/to/config.toml"

See platform-specific setup docs for details.

Stopping the Scanner

Press Ctrl+C to stop. The scanner:

Completes the current check
Cleans up lock files
Exits gracefully

AI Tools

Code Scanner provides powerful AI tools that let the LLM explore your codebase during analysis. This enables sophisticated checks that go beyond simple pattern matching.

Available Tools

Tool	Description	Use Case
`search_text`	Fast text search using ripgrep	Find function usages, locate patterns
`read_file`	Read file contents	Get context from related files
`list_directory`	List directory contents	Understand project structure
`get_file_diff`	Get git diff for a file	See what changed
`get_file_summary`	Get file statistics	Understand file before reading
`symbol_exists`	Check if symbol exists	Verify function/class exists
`find_definition`	Find symbol definition	Locate where symbol is defined
`find_symbols`	Find symbols by pattern	Search for related functions
`get_enclosing_scope`	Get parent scope	Understand code context
`find_usages`	Find all usages of a symbol	Track symbol usage across codebase

Example: Architectural Check

[[checks]]
pattern = "*"
checks = [
    "Check for MVC pattern violations: UI code should not directly access database. Use search_text to find database queries in UI files, and read_file to verify context."
]

How the LLM uses tools:

Scans UI files for database-related keywords (search_text)
Reads suspicious files to verify context (read_file)
Reports violations with specific file locations and line numbers

Example: Duplicate Code Detection

[[checks]]
pattern = "*.py"
checks = [
    "Find duplicate or similar function implementations. Use search_text to find function definitions with similar names, then read_file to compare implementations."
]

Example: Naming Consistency

[[checks]]
pattern = "*"
checks = [
    "Check for inconsistent naming patterns across the codebase. Use list_directory to explore structure, then read_file to verify naming conventions."
]

Tool Integration Details

Ctags: Used for symbol indexing (functions, classes, variables)
Ripgrep: Used for fast text search
Git: Used for diff operations
All tools are local and respect .gitignore

Performance Considerations

Symbol index is generated asynchronously at startup
Index is regenerated when switching projects
Tool calls are tracked to prevent context overflow
Inactive projects don't maintain indexes (saves memory)

Advanced Features

Multi-Project Support

Monitor multiple projects with a single scanner instance:

code-scanner /path/to/project1 -c /path/to/config1 /path/to/project2 -c /path/to/config2

Key Features:

Automatic Switching: Scanner switches to project with most recent changes
Non-blocking: Current check completes before switching
State Preservation: Each project maintains its own issue tracker state
Smart LLM Management: Client reused if configs are identical
Separate Outputs: Each project has its own code_scanner_results.md

Project Switching Behavior:

Scanner detects which project has most recent changes (based on file modification times)
Waits for current check to complete
Switches to active project
If LLM configs differ, disconnects and reconnects
Regenerates ctags index for new active project
Continues scanning

Commit-Based Scanning

Scan changes relative to a specific commit:

code-scanner /path/to/project --commit abc123 -c /path/to/config.toml

Use Cases:

Review cumulative changes in a feature branch
Analyze pull request before merging
Compare against stable branch

Behavior:

Scans all uncommitted changes
Compares against specified commit
Includes untracked files
Continues monitoring for new changes after initial scan

Issue Lifecycle Management

Code Scanner tracks issue state within a session:

States:

NEW - Issue detected for the first time
EXISTING - Issue detected in previous scan, still present
RESOLVED - Issue was detected before, no longer present

Smart Matching:

Issues are matched by file and issue nature (not just line number)
Fuzzy string comparison with configurable threshold (default: 0.8)
Line numbers update if code shifts
Prevents duplicate issues for the same problem

Scoped Resolution:

Issues only resolved for files that were actually scanned
Prevents false resolution from LLM non-determinism
Resolved issues remain in log for historical tracking

Context Overflow Strategy

When code exceeds model's context window:

Group by directory hierarchy - Batch files from same directory
Deterministic batching - Sort alphabetically, deepest-first
File-by-file fallback - If directory group still too large
Skip oversized files - Log warning and continue
Merge results - Combine issues from all batches

Token Tracking:

Uses 55% of context limit for file content
Tracks accumulated tokens during tool calls
Stops tool calling at 85% context usage
Prevents overflow during multi-turn conversations

Git Integration

Change Detection:

Monitors staged, unstaged, and untracked files
Respects .gitignore patterns
Efficient caching with configurable TTL (default: 5 seconds) for low CPU usage during idle
Detects file modifications via content hashes

Conflict Handling:

Waits for merge/rebase conflicts to resolve
Polls for completion status
Resumes scanning automatically

Binary Files:

Silently skipped during scanning
Tracked to prevent infinite rescan loops

Supported LLM Backends

Code Scanner works with local LLM servers that provide OpenAI-compatible APIs.

LM Studio

Best for: GUI users, trying different models, visual feedback

Installation:

Download from lmstudio.ai
Install and launch
Load the model "qwen2.5-coder-7b-instruct"
Start server (default: localhost:1234)

Recommended Models:

DeepSeek-Coder V2 (excellent for code analysis)
Qwen2.5-Coder (fast, good balance)
CodeLlama 7B/13B (lighter option)

Configuration:

[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
context_limit = 32768

Ollama

Best for: CLI users, automation, simpler setup, headless servers

Installation:

curl -fsSL https://ollama.ai/install.sh | sh

Pull a model:

ollama pull qwen3:4b
ollama run qwen3:4b

Recommended Models:

qwen3:4b - Fast, good for code (4B parameters)
qwen3:7b - Better accuracy (7B parameters)
deepseek-coder:6.7b - Excellent for code analysis
codellama:7b - Lightweight option

Configuration:

[llm]
backend = "lm-studio"
host = "localhost"
port = 1234
context_limit = 32768

Hardware Requirements

Minimum:

CPU: Any modern multi-core processor
RAM: 8GB (16GB recommended)
GPU: Not required (CPU inference works, but slower)

Recommended:

GPU: NVIDIA RTX 3060 (12GB VRAM) or better
RAM: 16GB+
Storage: SSD for faster file access

Performance Tips:

Use smaller models (4B-7B) for faster scanning
Increase context_limit for larger codebases
Use GPU acceleration if available
Reduce check complexity for faster results

Troubleshooting

Common Issues

"Connection refused" or "Cannot connect to LLM backend"

Problem: Scanner can't connect to LM Studio or Ollama

Solutions:

Ensure LM Studio or Ollama is running:
- LM Studio: Check that server is started (green indicator)
- Ollama: Run ollama list to verify it's running

Check host/port in config:

# Test connection
curl http://localhost:1234/v1/models  # LM Studio
curl http://localhost:11434/api/tags  # Ollama

Verify model is loaded:
- LM Studio: Model must be loaded in the server
- Ollama: Run ollama list to see available models

"Not a git repository" error

Problem: Target directory isn't a Git repository

Solution: Initialize git in your project:

cd /path/to/project
git init
git add .
git commit -m "Initial commit"

"No changes detected" message

Problem: Scanner is waiting for changes

Solutions:

Make a change to a tracked file
Add a new file: touch newfile.py
Stage files: git add .
Check if files are in .gitignore

"Context limit exceeded" warning

Problem: Code is too large for model's context window

Solutions:

Increase context_limit in config (if model supports it)
Use ignore patterns to exclude large files/directories
Use a model with larger context window
Split checks into smaller, more specific queries

Scanning is too slow

Problem: Scanner takes too long to complete

Solutions:

Use ignore patterns to exclude test/build directories:

[[checks]]
pattern = "/*tests*/, /*build*/, /*node_modules*/"
checks = []

Reduce context_limit if model is slow
Use a smaller/faster model (e.g., 4B instead of 7B)
Reduce number of checks
Use file patterns to scan only relevant files

"Malformed JSON response" errors

Problem: LLM returns invalid JSON

Solutions:

Scanner automatically retries (up to 3 times)
Try a different model
Reduce context_limit to prevent overflow
Simplify check prompts

Lock file errors

Problem: "Another instance is already running"

Solutions:

Check if another scanner is running: ps aux | grep code-scanner
If not, remove stale lock file:
```
rm ~/.code-scanner/code_scanner.lock
```
Verify PID in lock file is not running

Getting Help

Documentation: Check platform-specific setup docs
GitHub Issues: Report bugs or request features
Discussions: Ask questions and share ideas

Debug Mode

Enable verbose logging for troubleshooting:

code-scanner /path/to/project --verbose

Log file location (platform-specific):

Linux: ~/.code-scanner/code_scanner.log
macOS: ~/Library/Application Support/code-scanner/code_scanner.log
Windows: %APPDATA%\code-scanner\code_scanner.log

Development

Project Structure

src/code_scanner/
├── models.py              # Data models (LLMConfig, Issue, etc.)
├── config.py              # Configuration loading and validation
├── base_client.py         # Abstract base class for LLM clients
├── lmstudio_client.py     # LM Studio client implementation
├── ollama_client.py       # Ollama client implementation
├── ai_tools.py            # AI tool executor for context expansion
├── text_utils.py          # Text processing utilities
├── git_watcher.py         # Git repository monitoring
├── issue_tracker.py       # Issue lifecycle management
├── output.py              # Markdown report generation
├── scanner.py             # AI scanning logic
├── cli.py                 # CLI and application coordinator
├── utils.py               # Utility functions
└── __main__.py            # Entry point

Running Tests

# Run all tests
uv run pytest

# Verbose output
uv run pytest -v

# Specific test file
uv run pytest tests/test_scanner.py -v

# Coverage report
uv run pytest --cov=code_scanner --cov-report=term-missing

# HTML coverage report
uv run pytest --cov=code_scanner --cov-report=html
# Open htmlcov/index.html in browser

Current Coverage

92% code coverage with 905 unit tests

Contributing

We welcome contributions! Please:

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

GNU Affero General Public License v3.0

See LICENSE for details.

Documentation

For detailed platform-specific setup instructions:

Linux Setup - systemd service, desktop integration
macOS Setup - LaunchAgent, Homebrew setup
Windows Setup - Task Scheduler, Chocolatey/Scoop installation

Made with ❤️ by the Ubego team

⭐ Star us on GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
docs		docs
images		images
plans		plans
sample_configs		sample_configs
scripts		scripts
src/code_scanner		src/code_scanner
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
PRD.md		PRD.md
README.md		README.md
code_scanner_config.toml		code_scanner_config.toml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Code Scanner

Why Code Scanner?

Quick Start

Step 1: Install Prerequisites

Step 2: Install Code Scanner

Step 3: Start a Local LLM

Step 4: Configuration

Step 5: Run the scanner

Step 6: View Results

Features

Core Capabilities

AI-Powered Analysis

Continuous Monitoring

Configuration & Deployment

Use Cases

Before Commit: Catch Bugs Early

Code Standards Enforcement

Architectural Reviews

Security Scanning

Code Quality

Code Scanner vs Other Tools

Installation

System Requirements

Install Code Scanner

Using pip

Using uv (recommended)

From Source

Verify Installation

Platform-Specific Setup

Configuration

Basic Configuration

For Ollama

For LM Studio

Configuration Parameters

[llm] Section

[[checks]] Sections

Ignore Patterns

Sample Configurations

Configuration Validation

Usage

Basic Usage

Specify Config File

Multiple Projects

Scan Specific Commit

Output Files

Running in Background

Linux/macOS

Windows (PowerShell)

Autostart on Boot

Stopping the Scanner

AI Tools

Available Tools

Example: Architectural Check

Example: Duplicate Code Detection

Example: Naming Consistency

Tool Integration Details

Performance Considerations

Advanced Features

Multi-Project Support

Commit-Based Scanning

Issue Lifecycle Management

Context Overflow Strategy

Git Integration

Supported LLM Backends

LM Studio

Ollama

Hardware Requirements

Troubleshooting

Common Issues

"Connection refused" or "Cannot connect to LLM backend"

"Not a git repository" error

"No changes detected" message

"Context limit exceeded" warning

Scanning is too slow

"Malformed JSON response" errors

Lock file errors

Packages