Iris is provider-agnostic by design - the routing layer can switch between providers mid-conversation. But every system needs a default, and mine is Chutes AI.
Full disclosure: I'm not affiliated with Chutes in any way. No sponsorship, no partnership, no referral deal. I'm just a user who genuinely likes the service.
Why Chutes works for me:
Model variety - One API, access to Qwen, Kimi-K2, DeepSeek, and a rotating catalog of open models. When a new thinking model drops, it's usually available on Chutes before most other providers. For a project like Iris where I'm experimenting with different models for different agents, this is essential.
Pricing - Competitive rates on open models. When you're running multi-agent orchestration where a single user request might trigger 3-5 model calls across different specialists, cost per token matters. Chutes keeps this sustainable for a personal project. Check their pricing page - it speaks for itself.
Reliability - The API has been solid. Streaming works cleanly, timeouts are predictable, and the error responses are structured enough to handle gracefully.
The provider strategy pattern in Iris means Chutes isn't a hard dependency. The ChutesRoutingStrategy implements a provider interface. If Chutes goes down or I find something better, swapping is a config change, not a rewrite. But so far, I haven't needed to.
Ollama for local development
For local work and experimentation, Iris also supports Ollama. Same provider abstraction - OllamaRoutingStrategy routes to your local Ollama instance. This is useful for:
- Offline development without burning API credits
- Testing prompt changes quickly against smaller models
- Privacy-sensitive workflows where data shouldn't leave the machine
- Experimenting with new models the moment they're released on Hugging Face
The setup is simple: install Ollama, pull a model, point Iris at localhost:11434. The same agent configuration, tools, and orchestration work locally as they do with cloud providers.
Having both cloud (Chutes) and local (Ollama) running through the same abstraction means I can develop locally, test with cheap local models, then switch to production-grade models for real use. The agents don't know or care where their model lives.