Provider Routing - How Iris Picks the Right AI Model

Iris does not talk to one AI provider. It routes requests across multiple providers, picks the right model for the job, and fails over automatically when things go wrong.

The problem

Different models are good at different things. A thinking model is great for deep analysis but slow and expensive for simple tool calls. A fast model handles "remind me to..." efficiently but might struggle with complex research. And sometimes a provider just goes down.

You need a routing layer that can pick the right model, switch mid-conversation if needed, and recover from failures without the user noticing.

Strategy pattern

Iris uses a strategy pattern for routing. Each AI provider gets its own routing strategy that answers three questions: does this strategy handle this provider? What actual provider and model should be used? And is this provider allowed in the current environment?

Three strategies exist today:

Chutes AI - the primary cloud provider. Resolves to whichever model is configured as the default (currently Kimi-K2.5-TEE). Always allowed, always available.
Ollama - the local provider. In development, it routes to a locally running model on localhost for offline work and experimentation. In production, it is blocked and falls back to the cloud provider automatically. Same tool support, same agent configuration - just a different model underneath.
Default - a catch-all for anything else. Passes the request through as-is.

The routing service evaluates strategies in order until one claims the request. First match wins. This is configured in config/ai.php, where each provider has its own block with API keys, base URLs, and default models for text, image, video, and vision.

Tool routing - different models for different jobs

Here is something subtle: not every message should go to the same model.

When the orchestrator detects that a message likely needs tool calls - keywords like "remind", "schedule", "generate", "create", "remember" - it can route to a specialised tool-calling model instead of the user's preferred conversational model. This ensures tool-heavy requests go to a model that is reliable at structured output, while general chat uses whatever model you prefer.

The tool-routing target is configured separately from the primary model. You can have one model for conversation and a different one for tool execution, and the system picks the right one automatically.

Automatic failover

When a stream fails, the system classifies the error:

Model failed - the model rejected the request (bad parameters, unsupported format)
Timeout - the model took too long (common with thinking models on complex tasks)
Rate limited - the provider hit a quota or rate limit

If failover is enabled and the user did not manually select a specific model, the system resolves a fallback:

Try the configured failover provider and model
If that is the same as what just failed, try a different model from the same provider
Last resort: fall back to Chutes with the default model

The retry happens transparently. The user sees a brief pause, then the response continues from the failover model. A marker in the stream tells the frontend a failover occurred, but the experience is seamless.

Why this matters

In multi-agent orchestration, a single user request might trigger 3-5 model calls across different specialist agents. If one call fails and there is no failover, the entire chain breaks. The routing layer makes the system resilient.

It also means experimenting is free. Try a new model on Chutes, set it as default, and if it is unreliable, the failover catches it. Run Ollama locally for development without changing any agent configuration. Switch providers by changing one environment variable.

Adding new providers

Adding a provider means implementing the three-method strategy interface, registering it in the routing service, and adding a config block. No changes to agents, tools, or the chat pipeline. This is how Iris stays provider-agnostic while still being opinionated about defaults.