I built llmswap to solve a problem I kept hitting in hackathons - burning through API credits while testing the same prompts repeatedly during development.
It's a simple Python package that provides a unified interface for OpenAI, Anthropic, Google Gemini, and local models (Ollama), with built-in response caching that can
cut API costs by 50-90%.
Key features:
- Intelligent caching with TTL and memory limits
- Context-aware caching for multi-user apps
- Auto-fallback between providers when one fails
- Zero configuration - works with environment variables
from llmswap import LLMClient
client = LLMClient(cache_enabled=True)
response = client.query("Explain quantum computing")
# Second identical query returns from cache instantly (free)
The caching is disabled by default for security. When enabled, it's thread-safe and includes context isolation for multi-user applications.
Built this from components of a hackathon project. Already at 2.2k downloads on PyPI. Hope it helps others save on API costs during development.
GitHub: https://github.com/sreenathmmenon/llmswap
PyPI: https://pypi.org/project/llmswap/