LLM Routing

pattern

Directing requests to different language models based on task requirements, cost, or availability.

LLM routing is the practice of sending different requests to different models based on criteria like task complexity, cost, latency, or model capabilities. Instead of using one model for everything, a router selects the best model for each request.

Routing strategies include: capability-based (use GPT-4 for reasoning, Claude for analysis), cost-based (use smaller models for simple tasks), fallback chains (try the preferred model, fall back to alternatives on failure), and load balancing.

OpenRouter provides API-level routing across dozens of models. SUBCORP uses a models array for native fallback routing — if the primary model fails or returns empty, the system automatically tries the next model in the list. This ensures high availability without manual intervention.

Model Fallback OpenRouter Ollama

In the Directory

OpenRouter Ollama

LLM Routing

Related

In the Directory