This guide covers optimizing git-copilot for large repositories and slow networks.
- Number of files: More files = more LLM context = longer analysis
- LLM provider speed: GPT-4o is faster than GPT-4, Claude Sonnet is moderate, Ollama local models vary
- Parallelism:
maxConcurrentcontrols how many agents run simultaneously - Network latency: API round-trips add overhead
- Token limits: Each LLM has context window limits; large diffs may be truncated
Default settings work well:
maxConcurrent: 4- Reduce
maxConcurrentto 2 or 3 if you hit rate limits - Use faster models (e.g.,
gpt-4oinstead ofgpt-4-turbo) - Consider enabling
output.format: terminalto avoid extra rendering overhead
maxConcurrent: 3
providers:
openai:
model: gpt-4o- Set
maxConcurrent: 1or 2 to avoid overwhelming the LLM - Use
git-copilot review --rangeto analyze only recent changes - Split review into multiple runs (e.g., by directory)
- Increase timeouts if needed
maxConcurrent: 2
providers:
openai:
timeout: 120000 # 2 minutes
maxRetries: 5Ollama runs locally and avoids network latency, but model speed depends on your hardware.
activeProvider: ollama
providers:
ollama:
baseUrl: http://localhost:11434
model: llama3.3:70b-instruct-q4_K_M # Quantized for speedEnsure you have enough RAM/GPU memory. Pre-load the model before running.
- The custom memory SQLite DB is small and fast. If it grows too large,
retentionDaysautomatically prunes old entries. - Use
beads.custom.maxFindingsPerTaskto limit how many findings each agent can store.
Currently, each review run is independent. Future versions will add result caching based on file content hashes.
Enable debug logging to see timing:
DEBUG=* git-copilot reviewOr set logger level in config (future).
| Repo Size | Agents | LLM | Concurrent | Typical Time |
|---|---|---|---|---|
| Small (50 files) | 6 | GPT-4o | 4 | 30-60s |
| Medium (500 files) | 6 | GPT-4o | 3 | 1-2 min |
| Large (2000 files) | 6 | GPT-4o | 2 | 3-5 min |
| Large (2000 files) | 6 | Ollama (local) | 2 | 5-10 min |
These are approximate and depend on file content complexity.
- Reduce
maxConcurrentto 1 - Switch to a faster model (e.g.,
gpt-4o) - Use
--rangeto limit the review scope - Check network latency to your LLM provider
- Ensure your API key has sufficient rate limits
- Streaming: Stream LLM responses to reduce perceived latency
- Parallel file chunking: Split large files across multiple LLM calls
- Result caching: Reuse findings for unchanged files
- Incremental review: Only analyze files changed since last review
- Adaptive concurrency: Automatically adjust based on rate-limit responses