
Swappable LLM Architectures: Building Flexible AI Systems
In today's fast-evolving AI landscape, new models emerge daily, pricing structures change, and performance improvements happen constantly.
I'm always wondering how to structure code and build apps or scripts in a way that's easy to manage, test, and modify. With AI changing literally every day, it's crucial to have the flexibility to test new models, explore cheaper options, and adapt quickly to the evolving ecosystem. This post explores a architecture that allows easily switch between different LLM providers like OpenAI (ChatGPT), Anthropic (Claude), and Google (Gemini) without rewriting your entire codebase.
The Problem
I, like many developers, usually start like this:
import openai
openai.api_key = "YOUR_API_KEY"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": "Tell me a joke"}]
)
This works, but some considerations: - one AI company - Changing to a different AI means changing code everywhere - It's hard to test the script without using the real AI - If the AI service goes down - ...
A Simple Solution
The key is to create a set of rules that all AI models will follow.
Step 1: Create a Basic Template First, define a common interface that all LLM providers must implement:
from abc import ABC, abstractmethod
class LLMProvider(ABC):
@abstractmethod
def generate_text(self, prompt, **kwargs):
pass
@abstractmethod
def embed_text(self, text, **kwargs):
pass
Step 2: Create Specific Versions for Each AI
Next, we create specific versions for each AI model:
class OpenAIProvider(LLMProvider):
def __init__(self, api_key, model="gpt-4"):
from openai import OpenAI
self.client = OpenAI(api_key=api_key)
self.model = model
def generate_text(self, prompt, **kwargs):
response = self.client.chat.completions.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
**kwargs
)
return response.choices[0].message.content
def embed_text(self, text, **kwargs):
# Code for OpenAI embeddings
pass
class ClaudeProvider(LLMProvider):
def __init__(self, api_key, model="claude-3-opus-20240229"):
from anthropic import Anthropic
self.client = Anthropic(api_key=api_key)
self.model = model
def generate_text(self, prompt, **kwargs):
response = self.client.messages.create(
model=self.model,
messages=[{"role": "user", "content": prompt}],
**kwargs
)
return response.content[0].text
def embed_text(self, text, **kwargs):
# Code for Claude embeddings
pass
Step 3: Create a Service That Uses These Providers
Then we create a service class that can use any of these AI providers:
class LLMService:
def __init__(self, provider=None):
self.provider = provider
def set_provider(self, provider):
self.provider = provider
def generate_text(self, prompt, **kwargs):
if not self.provider:
raise ValueError("LLM provider not set")
return self.provider.generate_text(prompt, **kwargs)
def embed_text(self, text, **kwargs):
if not self.provider:
raise ValueError("LLM provider not set")
return self.provider.embed_text(text, **kwargs)
Even More Flexibility? - Settings Files
It's always good to keep AI settings in a separate configuration file:
# config.py
config = {
"default_provider": "openai",
"providers": {
"openai": {
"api_key": "YOUR_OPENAI_API_KEY",
"model": "gpt-4"
},
"claude": {
"api_key": "YOUR_ANTHROPIC_API_KEY",
"model": "claude-3-opus-20240229"
},
"gemini": {
"api_key": "YOUR_GEMINI_API_KEY",
"model": "gemini-pro"
}
}
}
This way, you can change AIs by just changing a setting.
This architecture also simplifies testing. You can easily create mock provider classes to simulate API responses, enabling thorough unit testing without relying on live API calls.
Here is an example of how to use this by switching providers or using them at the same time (thanks Claude =) )
# example_multi_llm.py - Example of using multiple LLMs in one application
from llm_service import LLMService, LLMProviderFactory
from config import config
import os
def create_provider(provider_name):
"""Helper function to create a provider from config"""
provider_config = config["providers"][provider_name]
api_key = os.environ.get(f"{provider_name.upper()}_API_KEY", provider_config["api_key"])
return LLMProviderFactory.create_provider(
provider_name,
api_key,
provider_config.get("model")
)
# Example 1: Switching between providers based on the task
def task_based_provider_selection(task_type, prompt):
"""Select different LLMs based on the task type"""
# Initialize services for different providers
services = {}
# Map tasks to the best provider for each task
task_provider_map = {
"creative_writing": "claude", # Claude is good at creative writing
"code_generation": "openai", # OpenAI might be better for code
"quick_response": "gemini", # Gemini might be faster for simple queries
"default": config["default_provider"]
}
# Get the best provider for this task
provider_name = task_provider_map.get(task_type, task_provider_map["default"])
# Create the provider if we haven't already
if provider_name not in services:
provider = create_provider(provider_name)
services[provider_name] = LLMService(provider)
# Generate the response using the selected provider
return services[provider_name].generate_text(prompt)
# Example 2: Using multiple LLMs for consensus or comparison
def multi_llm_consensus(prompt, providers=["openai", "claude", "gemini"]):
"""Get responses from multiple LLMs for comparison or consensus"""
results = {}
for provider_name in providers:
try:
# Create provider and service
provider = create_provider(provider_name)
service = LLMService(provider)
# Get response
response = service.generate_text(prompt)
results[provider_name] = response
print(f"\n--- {provider_name.upper()} RESPONSE ---")
print(response)
except Exception as e:
print(f"Error with {provider_name}: {e}")
results[provider_name] = f"Error: {str(e)}"
return results
# Example 3: Fallback chain - try providers in sequence until one works
def fallback_chain(prompt, provider_order=["openai", "claude", "gemini"]):
"""Try multiple providers in sequence until one succeeds"""
last_error = None
for provider_name in provider_order:
try:
# Create provider and service
provider = create_provider(provider_name)
service = LLMService(provider)
# Get response
response = service.generate_text(prompt)
print(f"Successfully used {provider_name}")
return response
except Exception as e:
print(f"Provider {provider_name} failed: {e}")
last_error = e
# If we get here, all providers failed
raise Exception(f"All providers failed. Last error: {last_error}")
# Example 4: A/B testing between models
def ab_test_providers(prompt, num_iterations=5):
"""Run A/B test between providers to compare performance"""
import time
import random
results = {
"openai": {"times": [], "responses": []},
"claude": {"times": [], "responses": []}
}
for i in range(num_iterations):
# Randomly choose provider order to avoid bias
providers = list(results.keys())
random.shuffle(providers)
for provider_name in providers:
try:
# Create provider and service
provider = create_provider(provider_name)
service = LLMService(provider)
# Time the response
start_time = time.time()
response = service.generate_text(prompt)
end_time = time.time()
# Record results
results[provider_name]["times"].append(end_time - start_time)
results[provider_name]["responses"].append(response)
except Exception as e:
print(f"Error with {provider_name}: {e}")
# Calculate average response times
for provider_name in results:
if results[provider_name]["times"]:
avg_time = sum(results[provider_name]["times"]) / len(results[provider_name]["times"])
print(f"{provider_name} average response time: {avg_time:.2f} seconds")
return results
# Example usage
if __name__ == "__main__":
# Example 1: Task-based selection
creative_response = task_based_provider_selection(
"creative_writing",
"Write a short poem about coding"
)
print(f"Creative writing response: {creative_response}\n")
# Example 2: Multi-LLM consensus
consensus_results = multi_llm_consensus(
"What is the best programming language for beginners?"
)
# Example 3: Fallback chain
fallback_response = fallback_chain(
"Explain quantum computing in simple terms"
)
print(f"Fallback chain response: {fallback_response}\n")
# Example 4: A/B testing
ab_test_results = ab_test_providers(
"Summarize the key benefits of clean code"
)
Why This Approach Is Better
This approach gives you many benefits: - Easy switching: Change AI models without changing your code - Use multiple AIs: Use different AIs for different tasks based on what they're good at - Backup plans: If one AI is down, switch to another - Testing: Compare answers from different AIs - Save money: Choose AIs based on price and performance - Future-proof: Easily add new AIs as they come ou
Wrap-Up
As AI keeps changing quickly, building apps that can easily switch between different AI providers isn't just nice to have - it's necessary. With this approach, it's straightforward to switch AI models and test new ones without significant code changes.
Which approach are you using for integrating LLMs into your applications? What has worked well for you?
