This document describes GitFlow Analytics' comprehensive caching system and the new cache optimization features.
GitFlow Analytics uses SQLite-based caching to dramatically improve performance on subsequent runs. The cache stores analyzed commit data, pull request information, and issue tracking data.
Pre-loads all commits from your repositories into the cache for maximum performance on subsequent runs.
# Warm cache for 4 weeks of history
gitflow-analytics -c config.yaml --warm-cache --weeks 4
# Warm cache and run analysis
gitflow-analytics -c config.yaml --warm-cache --weeks 8Benefits:
- Subsequent runs are dramatically faster (cache hit rates >95%)
- Ideal for CI/CD environments with repeated analysis
- Batch processing optimizes database performance
When to use:
- First run on a new system
- After clearing cache
- Before running multiple analyses on the same data
Validates cache integrity and identifies potential issues.
# Validate cache only
gitflow-analytics -c config.yaml --validate-cache
# Validate cache and warm if needed
gitflow-analytics -c config.yaml --validate-cache --warm-cacheValidation checks:
- Missing required fields
- Duplicate entries
- Data integrity issues
- Very old entries (older than 2×TTL)
- Negative change counts
Example output:
✅ Cache validation passed
Cache contains 1,247 commits
Warning: Found 3 very old cache entries (older than 336h)
Detailed cache performance metrics displayed at the end of every run.
Rich Display (default):
📊 Cache Performance Summary
Total requests: 856
Cache hits: 823 (96.1%)
Cache misses: 33
Time saved: 1.4 minutes
💾 Cache Storage
Cached commits: 1,247
Database size: 12.3 MB
Simple Display:
📊 Cache Performance:
- Total requests: 856
- Cache hits: 823 (96.1%)
- Cache misses: 33
- Time saved: 1.4 minutes
- Cached commits: 1,247
- Database size: 12.3 MB
Enhanced debugging output for cache operations and performance analysis.
# Enable debug mode for detailed cache logging
GITFLOW_DEBUG=1 gitflow-analytics -c config.yaml --weeks 2Debug output includes:
- Individual cache hits/misses for each commit
- Bulk cache lookup statistics
- Progress bar tracking details
- Cache validation verbose output
Example debug output:
DEBUG: Cache HIT for a1b2c3d4 in /repos/myproject
DEBUG: Cache MISS for e5f6g7h8 in /repos/myproject
DEBUG: Bulk cache lookup - 95 hits, 5 misses for 100 commits
DEBUG: Batch: 100 commits, Progress: 100/856, Processed: 100
Resolved the issue where progress bars would show incorrect totals (e.g., "190/95").
Improvements:
- Accurate progress tracking with safety checks
- Better batch processing visualization
- Real-time cache hit rate display
- Debug information in postfix
Progress bar format:
Analyzing myproject: 100%|████████| 856/856 [00:15<00:00, cache_hit_rate=96.1%, processed=856/856]
The cache uses three main SQLite tables:
cached_commits- Commit analysis resultspull_request_cache- PR metadata and metricsissue_cache- Issue tracking data from various platforms
- Default:
.gitflow-cache/in config file directory - Commits:
.gitflow-cache/gitflow_cache.db - Identities:
.gitflow-cache/identities.db - ML Predictions:
.gitflow-cache/ml_predictions.db(if ML enabled)
- Default: 168 hours (7 days)
- Configuration:
cache.ttl_hoursin config - Disable expiration: Set to 0
cache:
ttl_hours: 336 # 2 weeks
directory: .gitflow-cacheFor optimal performance, follow this workflow:
# Initial setup (first time)
gitflow-analytics -c config.yaml --warm-cache --weeks 12
# Daily/regular analysis (fast)
gitflow-analytics -c config.yaml --weeks 2
# Weekly deep analysis
gitflow-analytics -c config.yaml --weeks 8The analyzer processes commits in batches for optimal memory usage and database performance:
- Default batch size: 1000 commits
- Memory vs. Performance: Larger batches use more memory but reduce database round-trips
- Configurable through analyzer initialization (advanced usage)
Regular maintenance for optimal performance:
# Validate cache health monthly
gitflow-analytics -c config.yaml --validate-cache
# Clear old cache if needed
gitflow-analytics -c config.yaml --clear-cache
# Re-warm after major repository changes
gitflow-analytics -c config.yaml --warm-cache --clear-cache --weeks 8Problem: Cache validation fails with integrity errors
# Solution: Clear and rebuild cache
gitflow-analytics -c config.yaml --clear-cache --warm-cacheProblem: Poor cache hit rates (<50%)
# Solution: Warm cache for longer history
gitflow-analytics -c config.yaml --warm-cache --weeks 12Problem: Large database file sizes
# Check cache statistics
gitflow-analytics -c config.yaml --validate-cache
# Consider reducing TTL if needed
# Edit config.yaml: cache.ttl_hours: 72 # 3 daysEnable debug mode to identify specific issues:
GITFLOW_DEBUG=1 gitflow-analytics -c config.yaml --validate-cacheCommon debug scenarios:
- High cache miss rate: Shows which commits are not cached
- Progress bar issues: Displays batch processing details
- Performance problems: Shows detailed timing information
For programmatic access to cache features:
from gitflow_analytics.core.cache import GitAnalysisCache
from pathlib import Path
# Initialize cache
cache = GitAnalysisCache(Path(".gitflow-cache"))
# Get statistics
stats = cache.get_cache_stats()
print(f"Hit rate: {stats['hit_rate_percent']:.1f}%")
print(f"Database size: {stats['database_size_mb']:.1f} MB")
# Validate cache
validation = cache.validate_cache()
if not validation["is_valid"]:
print("Cache validation failed:", validation["issues"])
# Warm cache
repo_paths = ["/path/to/repo1", "/path/to/repo2"]
warming_result = cache.warm_cache(repo_paths, weeks=4)
print(f"Cached {warming_result['commits_cached']} new commits")cache:
ttl_hours: 336 # 2 weeks retention
directory: .gitflow-cache
analysis:
# Optimize for speed
exclude_paths:
- "*.log"
- "node_modules/*"
- ".git/*"cache:
ttl_hours: 0 # Never expire (for build reproducibility)
directory: /cache/gitflow # Persistent cache locationcache:
ttl_hours: 24 # Short retention for active development
directory: .dev-cacheTypical performance improvements with caching:
| Repository Size | First Run | Cached Run | Improvement |
|---|---|---|---|
| Small (100 commits) | 5s | 1s | 5x faster |
| Medium (1K commits) | 30s | 3s | 10x faster |
| Large (10K commits) | 300s | 15s | 20x faster |
Cache warming adds initial overhead but provides consistent fast performance:
| Operation | Time | Cache Hit Rate |
|---|---|---|
| Cold cache | 100s | 0% |
| Warm cache | 120s | 0% (warming) |
| Subsequent runs | 5s | 95%+ |
The break-even point is typically after 2-3 runs on the same dataset.