Part 4: Production Patterns and Best Practices
My $2000 Lesson
Caching Strategies
Basic Response Caching
import hashlib
import json
from functools import lru_cache
from typing import Optional
def generate_cache_key(messages: list, model: str, temperature: float) -> str:
"""Generate cache key from request parameters."""
# Create deterministic string
cache_data = {
"messages": messages,
"model": model,
"temperature": temperature
}
# Hash for compact key
data_str = json.dumps(cache_data, sort_keys=True)
return hashlib.sha256(data_str.encode()).hexdigest()
# Simple in-memory cache
response_cache: dict = {}
async def cached_claude_call(messages: list, model: str, temperature: float):
"""Call Claude with caching."""
# Generate cache key
cache_key = generate_cache_key(messages, model, temperature)
# Check cache
if cache_key in response_cache:
logger.info(f"Cache hit: {cache_key[:8]}")
return response_cache[cache_key]
# Call API
logger.info(f"Cache miss: {cache_key[:8]}")
response = await call_claude_async(messages, model, temperature)
# Store in cache
response_cache[cache_key] = response
return responseRedis Caching (Production)
Semantic Caching
Circuit Breakers
Basic Circuit Breaker
Production Circuit Breaker with Fallback
Prompt Versioning
Prompt Version Manager
Cost Optimization
Token Budget Manager
Model Selection Strategy
Monitoring and Observability
Metrics Collection
Best Practices Summary
What's Next?
Last updated