A lightweight Python library for automatic caching of LLM API responses. Reduce costs and improve response times by caching repeated API calls to OpenAI, Anthropic, Google Gemini, and other LLM providers.
ai-cache transparently intercepts LLM API calls and caches responses locally using SQLite. When the same request is made again, the cached response is returned instantly without making an actual API call. This saves time, reduces costs, and enables offline development.
- Automatic caching - No code changes required, works with existing applications
- Multi-provider support - Compatible with OpenAI, Anthropic, and Google Gemini APIs
- Local storage - All data stored in SQLite database on your machine
- Zero dependencies - Built using Python standard library only
- Configurable expiration - Optional TTL (time-to-live) for cache entries
- Cache management - Clear, invalidate, and monitor cache statistics
- Privacy-focused - No data leaves your machine
pip install ai-cacheimport ai_cache
# Enable caching globally
ai_cache.enable()
# Use any supported LLM API as normal
import openai
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "What is Python?"}]
)
# Subsequent identical calls return instantly from cache
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "What is Python?"}]
)import ai_cache
# Enable with default settings (cache stored in ~/.ai-cache/)
ai_cache.enable()
# Enable with custom cache directory
ai_cache.enable(cache_dir="./my_cache")
# Enable with TTL (cache expires after 1 hour)
ai_cache.enable(ttl=3600)
# Combine options
ai_cache.enable(cache_dir="./cache", ttl=7200)# Check if caching is enabled
is_active = ai_cache.is_enabled()
# Get cache statistics
stats = ai_cache.get_stats()
print(f"Cache hits: {stats['hits']}")
print(f"Cache misses: {stats['misses']}")
print(f"Hit rate: {stats['hit_rate']}")
print(f"Total entries: {stats['total_entries']}")
# Clear all cached entries
ai_cache.clear()
# Invalidate cache by provider
ai_cache.invalidate(provider="openai")
# Invalidate cache by model
ai_cache.invalidate(model="gpt-4")
# Invalidate specific provider and model combination
ai_cache.invalidate(provider="openai", model="gpt-4")
# Disable caching
ai_cache.disable()Compatible with both legacy and modern OpenAI API versions.
import ai_cache
import openai
ai_cache.enable()
# Legacy API (openai < 1.0.0)
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello"}]
)
# Modern API (openai >= 1.0.0)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "Hello"}]
)import ai_cache
from anthropic import Anthropic
ai_cache.enable()
client = Anthropic(api_key="your-api-key")
message = client.messages.create(
model="claude-3-sonnet-20240229",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)import ai_cache
import google.generativeai as genai
ai_cache.enable()
genai.configure(api_key="your-api-key")
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("What is machine learning?")Each API request is fingerprinted using SHA256 hashing of:
- Provider name (e.g., "openai", "anthropic")
- Model identifier (e.g., "gpt-4", "claude-3")
- Request parameters (messages, temperature, max_tokens, etc.)
Two requests are considered identical only if all components match exactly.
- Database: SQLite database stored locally
- Default location:
~/.ai-cache/cache.db - Schema: Indexed table with fingerprint, provider, model, response, and timestamps
- Thread safety: SQLite handles concurrent access automatically
When TTL is configured:
- Entries expire after specified number of seconds
- Expired entries are deleted automatically on access
- No background cleanup processes
Without TTL:
- Entries never expire automatically
- Manual invalidation or clearing required
The library uses monkey patching to intercept API client methods:
- Original methods are preserved and restored on disable
- Interception happens transparently without modifying your code
- If a provider library is not installed, its interceptor is silently skipped
- Development and Testing: Speed up development by avoiding repeated API calls during testing
- Prompt Engineering: Iterate on prompts without incurring costs for unchanged requests
- Batch Processing: Run evaluations or benchmarks with automatic caching
- Offline Development: Work with cached responses when internet is unavailable
- Cost Optimization: Reduce API costs in production for frequently repeated queries
- Demo Applications: Build demos that work reliably without exhausting API quotas
- Cache hits: Sub-millisecond response times (SQLite lookup)
- Cache misses: Original API latency + minimal overhead (~1ms for fingerprinting and storage)
- Storage: Minimal disk usage, approximately 1-10KB per cached response
- Memory: No in-memory cache, all data persisted to disk
- Streaming responses: Currently not supported
- Non-deterministic APIs: Temperature > 0 will generate different responses but use same cache key
- Parameter sensitivity: Small changes in parameters create new cache entries
- Binary responses: Image generation and similar APIs may not cache correctly
Enable LLM response caching.
Parameters:
cache_dir(str, optional): Directory for cache database. Default:~/.ai-cache/ttl(int, optional): Time-to-live in seconds for cache entries. Default:None(no expiration)
Returns: None
Disable LLM response caching and restore original API methods.
Returns: None
Clear all cached responses from the database.
Raises: RuntimeError if cache is not enabled.
Returns: None
Get cache statistics.
Returns: Dictionary containing:
hits(int): Number of cache hitsmisses(int): Number of cache misseshit_rate(str): Hit rate as percentagetotal_entries(int): Total cached entries in database
Raises: RuntimeError if cache is not enabled.
Check if caching is currently enabled.
Returns: bool - True if enabled, False otherwise
Invalidate cache entries by provider and/or model.
Parameters:
provider(str, optional): Provider name (e.g., 'openai', 'anthropic')model(str, optional): Model name (e.g., 'gpt-4', 'claude-3')
Raises: RuntimeError if cache is not enabled.
Returns: None
Issue: Cache not working
- Ensure
ai_cache.enable()is called before making API calls - Verify the provider library is installed (e.g.,
pip install openai)
Issue: Different responses on cache hit
- Cache returns exact stored response, check if request parameters match exactly
- Temperature and random seed affect cache keys
Issue: Disk space concerns
- Monitor cache size:
ls -lh ~/.ai-cache/cache.db - Clear periodically:
ai_cache.clear() - Configure TTL to auto-expire old entries
Issue: Permission errors
- Ensure write permissions on cache directory
- Use custom directory:
ai_cache.enable(cache_dir="./cache")
Contributions are welcome. Please submit issues and pull requests on GitHub.
MIT License - see LICENSE file for details.