Back to all posts
python ollama ai programming

Swappable LLM Architectures: Building Flexible AI Systems

5 min read (1135 words)

In today's fast-evolving AI landscape, new models emerge daily, pricing structures change, and performance improvements happen constantly.

I'm always wondering how to structure code and build apps or scripts in a way that's easy to manage, test, and modify. With AI changing literally every day, it's crucial to have the flexibility to test new models, explore cheaper options, and adapt quickly to the evolving ecosystem. This post explores a architecture that allows easily switch between different LLM providers like OpenAI (ChatGPT), Anthropic (Claude), and Google (Gemini) without rewriting your entire codebase.

The Problem

I, like many developers, usually start like this:

import openai

openai.api_key = "YOUR_API_KEY"
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Tell me a joke"}]
)

This works, but some considerations: - one AI company - Changing to a different AI means changing code everywhere - It's hard to test the script without using the real AI - If the AI service goes down - ...

A Simple Solution

The key is to create a set of rules that all AI models will follow.

Step 1: Create a Basic Template First, define a common interface that all LLM providers must implement:

from abc import ABC, abstractmethod

class LLMProvider(ABC):
    @abstractmethod
    def generate_text(self, prompt, **kwargs):
        pass

    @abstractmethod
    def embed_text(self, text, **kwargs):
        pass

Step 2: Create Specific Versions for Each AI

Next, we create specific versions for each AI model:

class OpenAIProvider(LLMProvider):
    def __init__(self, api_key, model="gpt-4"):
        from openai import OpenAI
        self.client = OpenAI(api_key=api_key)
        self.model = model

    def generate_text(self, prompt, **kwargs):
        response = self.client.chat.completions.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            **kwargs
        )
        return response.choices[0].message.content

    def embed_text(self, text, **kwargs):
        # Code for OpenAI embeddings
        pass


class ClaudeProvider(LLMProvider):
    def __init__(self, api_key, model="claude-3-opus-20240229"):
        from anthropic import Anthropic
        self.client = Anthropic(api_key=api_key)
        self.model = model

    def generate_text(self, prompt, **kwargs):
        response = self.client.messages.create(
            model=self.model,
            messages=[{"role": "user", "content": prompt}],
            **kwargs
        )
        return response.content[0].text

    def embed_text(self, text, **kwargs):
        # Code for Claude embeddings
        pass

Step 3: Create a Service That Uses These Providers

Then we create a service class that can use any of these AI providers:

class LLMService:
    def __init__(self, provider=None):
        self.provider = provider

    def set_provider(self, provider):
        self.provider = provider

    def generate_text(self, prompt, **kwargs):
        if not self.provider:
            raise ValueError("LLM provider not set")
        return self.provider.generate_text(prompt, **kwargs)

    def embed_text(self, text, **kwargs):
        if not self.provider:
            raise ValueError("LLM provider not set")
        return self.provider.embed_text(text, **kwargs)

Even More Flexibility? - Settings Files

It's always good to keep AI settings in a separate configuration file:

# config.py
config = {
    "default_provider": "openai",
    "providers": {
        "openai": {
            "api_key": "YOUR_OPENAI_API_KEY",
            "model": "gpt-4"
        },
        "claude": {
            "api_key": "YOUR_ANTHROPIC_API_KEY",
            "model": "claude-3-opus-20240229"
        },
        "gemini": {
            "api_key": "YOUR_GEMINI_API_KEY",
            "model": "gemini-pro"
        }
    }
}

This way, you can change AIs by just changing a setting.

This architecture also simplifies testing. You can easily create mock provider classes to simulate API responses, enabling thorough unit testing without relying on live API calls.

Here is an example of how to use this by switching providers or using them at the same time (thanks Claude =) )

# example_multi_llm.py - Example of using multiple LLMs in one application

from llm_service import LLMService, LLMProviderFactory
from config import config
import os

def create_provider(provider_name):
    """Helper function to create a provider from config"""
    provider_config = config["providers"][provider_name]
    api_key = os.environ.get(f"{provider_name.upper()}_API_KEY", provider_config["api_key"])

    return LLMProviderFactory.create_provider(
        provider_name,
        api_key,
        provider_config.get("model")
    )

# Example 1: Switching between providers based on the task
def task_based_provider_selection(task_type, prompt):
    """Select different LLMs based on the task type"""

    # Initialize services for different providers
    services = {}

    # Map tasks to the best provider for each task
    task_provider_map = {
        "creative_writing": "claude",  # Claude is good at creative writing
        "code_generation": "openai",   # OpenAI might be better for code
        "quick_response": "gemini",    # Gemini might be faster for simple queries
        "default": config["default_provider"]
    }

    # Get the best provider for this task
    provider_name = task_provider_map.get(task_type, task_provider_map["default"])

    # Create the provider if we haven't already
    if provider_name not in services:
        provider = create_provider(provider_name)
        services[provider_name] = LLMService(provider)

    # Generate the response using the selected provider
    return services[provider_name].generate_text(prompt)

# Example 2: Using multiple LLMs for consensus or comparison
def multi_llm_consensus(prompt, providers=["openai", "claude", "gemini"]):
    """Get responses from multiple LLMs for comparison or consensus"""

    results = {}

    for provider_name in providers:
        try:
            # Create provider and service
            provider = create_provider(provider_name)
            service = LLMService(provider)

            # Get response
            response = service.generate_text(prompt)
            results[provider_name] = response

            print(f"\n--- {provider_name.upper()} RESPONSE ---")
            print(response)

        except Exception as e:
            print(f"Error with {provider_name}: {e}")
            results[provider_name] = f"Error: {str(e)}"

    return results

# Example 3: Fallback chain - try providers in sequence until one works
def fallback_chain(prompt, provider_order=["openai", "claude", "gemini"]):
    """Try multiple providers in sequence until one succeeds"""

    last_error = None

    for provider_name in provider_order:
        try:
            # Create provider and service
            provider = create_provider(provider_name)
            service = LLMService(provider)

            # Get response
            response = service.generate_text(prompt)
            print(f"Successfully used {provider_name}")
            return response

        except Exception as e:
            print(f"Provider {provider_name} failed: {e}")
            last_error = e

    # If we get here, all providers failed
    raise Exception(f"All providers failed. Last error: {last_error}")

# Example 4: A/B testing between models
def ab_test_providers(prompt, num_iterations=5):
    """Run A/B test between providers to compare performance"""

    import time
    import random

    results = {
        "openai": {"times": [], "responses": []},
        "claude": {"times": [], "responses": []}
    }

    for i in range(num_iterations):
        # Randomly choose provider order to avoid bias
        providers = list(results.keys())
        random.shuffle(providers)

        for provider_name in providers:
            try:
                # Create provider and service
                provider = create_provider(provider_name)
                service = LLMService(provider)

                # Time the response
                start_time = time.time()
                response = service.generate_text(prompt)
                end_time = time.time()

                # Record results
                results[provider_name]["times"].append(end_time - start_time)
                results[provider_name]["responses"].append(response)

            except Exception as e:
                print(f"Error with {provider_name}: {e}")

    # Calculate average response times
    for provider_name in results:
        if results[provider_name]["times"]:
            avg_time = sum(results[provider_name]["times"]) / len(results[provider_name]["times"])
            print(f"{provider_name} average response time: {avg_time:.2f} seconds")

    return results

# Example usage
if __name__ == "__main__":
    # Example 1: Task-based selection
    creative_response = task_based_provider_selection(
        "creative_writing", 
        "Write a short poem about coding"
    )
    print(f"Creative writing response: {creative_response}\n")

    # Example 2: Multi-LLM consensus
    consensus_results = multi_llm_consensus(
        "What is the best programming language for beginners?"
    )

    # Example 3: Fallback chain
    fallback_response = fallback_chain(
        "Explain quantum computing in simple terms"
    )
    print(f"Fallback chain response: {fallback_response}\n")

    # Example 4: A/B testing
    ab_test_results = ab_test_providers(
        "Summarize the key benefits of clean code"
    )

Why This Approach Is Better

This approach gives you many benefits: - Easy switching: Change AI models without changing your code - Use multiple AIs: Use different AIs for different tasks based on what they're good at - Backup plans: If one AI is down, switch to another - Testing: Compare answers from different AIs - Save money: Choose AIs based on price and performance - Future-proof: Easily add new AIs as they come ou

Wrap-Up

As AI keeps changing quickly, building apps that can easily switch between different AI providers isn't just nice to have - it's necessary. With this approach, it's straightforward to switch AI models and test new ones without significant code changes.

Which approach are you using for integrating LLMs into your applications? What has worked well for you?

Dmitry Golovach
About

Dmitry Golovach

Principal Network Engineer and AI enthusiast. Always learning, always building.

Share this post

All posts