Infrastructure

Why Chutes AI Is My Primary Provider (And Ollama for Local)

January 23, 2026

Iris is provider-agnostic by design - the routing layer can switch between providers mid-conversation. But every system needs a default, and mine is Chutes AI.

Full disclosure: I'm not affiliated with Chutes in any way. No sponsorship, no partnership, no referral deal. I'm just a user who genuinely likes the service.

Why Chutes works for me:

Model variety - One API, access to Qwen, Kimi-K2, DeepSeek, and a rotating catalog of open models. When a new thinking model drops, it's usually available on Chutes before most other providers. For a project like Iris where I'm experimenting with different models for different agents, this is essential.

Pricing - Competitive rates on open models. When you're running multi-agent orchestration where a single user request might trigger 3-5 model calls across different specialists, cost per token matters. Chutes keeps this sustainable for a personal project. Check their pricing page - it speaks for itself.

Reliability - The API has been solid. Streaming works cleanly, timeouts are predictable, and the error responses are structured enough to handle gracefully.

The provider strategy pattern in Iris means Chutes isn't a hard dependency. The ChutesRoutingStrategy implements a provider interface. If Chutes goes down or I find something better, swapping is a config change, not a rewrite. But so far, I haven't needed to.

Ollama for local development

For local work and experimentation, Iris also supports Ollama. Same provider abstraction - OllamaRoutingStrategy routes to your local Ollama instance. This is useful for:

  • Offline development without burning API credits
  • Testing prompt changes quickly against smaller models
  • Privacy-sensitive workflows where data shouldn't leave the machine
  • Experimenting with new models the moment they're released on Hugging Face

The setup is simple: install Ollama, pull a model, point Iris at localhost:11434. The same agent configuration, tools, and orchestration work locally as they do with cloud providers.

Having both cloud (Chutes) and local (Ollama) running through the same abstraction means I can develop locally, test with cheap local models, then switch to production-grade models for real use. The agents don't know or care where their model lives.