Iris already had multi-agent orchestration. The problem was that it behaved more like a very capable manager with no operations playbook. The main agent could delegate. It could even fan out to multiple specialists. But whether a request stayed sequential, went parallel, or got refined was mostly left to prompt interpretation in the moment.
That is powerful. It is also sloppy.
As the system got more capable, the bottleneck stopped being "can it delegate?" and became "does it choose the right workflow early enough, fast enough, and consistently enough?"
This is about solving that problem.
The old shape
Before this change, the orchestration model looked roughly like this:
flowchart TD
U["User request"] --> M["Main agent"]
M --> D{"Delegate?"}
D -->|"No"| A["Answer directly"]
D -->|"Yes, maybe"| S["send_to_subagent"]
D -->|"Yes, maybe"| P["parallel_subagents"]
S --> M
P --> M
M --> A
The key word there is "maybe".
The parent had the tools to do the right thing, but there was no explicit workflow selection step ahead of time. So a request like "compare this from three angles" might stay mostly serial. A review-style prompt might get treated like ordinary sectioning. A refinement request might ship after one draft with no quality gate at all.
This is the AI version of a team where everybody is talented but nobody agrees on when to use a checklist, when to split the work, and when to ask for a second reviewer.
Workflow routing first
The biggest architectural change is that the system now classifies the request before the main run gets going.
Instead of one implicit orchestration mode, Iris now has explicit modes:
singleparallel_sectionparallel_voteevaluate_loop
This is just workflow routing. The same idea you use in software when a request first hits a router and gets sent down the right code path, instead of every handler trying to inspect everything for itself.
For example:
- "Compare this from security, performance, and maintenance angles" pushes toward
parallel_section - "Review this and give me a final verdict" pushes toward
parallel_vote - "Rewrite this and make it sharper" pushes toward
evaluate_loop
The main agent still does the work. It just starts with a much clearer execution shape.
Parallel sectioning vs parallel voting
These two look similar on paper because both use multiple specialists. They are not the same.
Parallel sectioning means different specialists do different jobs. Think of a house survey where one person checks the roof, another checks the plumbing, and another checks the electrics. You want coverage.
Parallel voting means multiple specialists inspect the same problem independently, then a final answer resolves agreement and disagreement. Think of sending the same contract to two lawyers and one finance person and asking for a final recommendation after they all challenge it from their own angle. You want confidence.
The old pipeline had some sectioning. It did not really have voting as a first-class pattern. Now it does.
Evaluator loops
Some prompts do not need more specialists. They need a stricter second pass.
That is what the evaluator loop is for.
The model drafts an answer, then a separate structured pass checks whether the answer actually satisfies the request. If it does not, the answer is rewritten against concrete issues instead of just "thinking again". That detail matters. Without explicit issues, rewrite loops drift. With explicit issues, they tighten.
This is similar to a linter plus an editor. First identify what is wrong. Then fix that specific thing.
It is slower than one-shot output, obviously. But it only runs on routes where the extra pass is worth paying for, and it can now be disabled from config if I want speed over polish.
Why the queue became a bottleneck
The most painful bug in the old parallel flow was not model quality. It was waiting.
When the parent used parallel_subagents, the system would often dispatch queue jobs and then sit there polling for results. From the user's perspective, specialist work had already started in queue processor, but the parent still had nothing useful to say yet. That meant delayed UI feedback, unnecessary timeouts, and a general "is this thing alive?" feeling.
The fix was not to delete queues. Queues are still the right primitive for larger batches.
The fix was to stop using the queue for small interactive fan-outs.
Now small parallel sub-agent runs execute inline with framework's Concurrency, inside the request, and the parent gets real tool results immediately. Larger batches still fall back to queued jobs.
So the orchestration shape is now:
flowchart TD
U["User request"] --> R["Workflow router"]
R --> S["single"]
R --> PS["parallel_section"]
R --> PV["parallel_vote"]
R --> EL["evaluate_loop"]
PS --> F1["Small fan-out: inline concurrency"]
PS --> F2["Large fan-out: queued batch"]
PV --> F1
PV --> F2
F1 --> Y["Synthesis"]
F2 --> Y
EL --> E["Evaluate and revise"]
S --> A["Final answer"]
Y --> A
E --> A
That hybrid is important. I did not want to turn the chat path into a mini job scheduler for every two-agent comparison.
Virtual workers
There was another weakness in the system: multi-agent orchestration depended too much on user-created agents in the database.
If a user had no agents configured, or only one, the system effectively lost most of its multi-agent ability. Technically it could still use tools and reason. But it could not really fan out in a meaningful specialist sense.
That is now fixed with built-in virtual workers:
- Researcher
- Critic
- Planner
- Synthesiser
- Risk Reviewer
- Automation Planner
- Memory Curator
- Briefing Agent
These are not top-level user agents. They are built-in specialist workers the orchestrator can delegate to when the user's own roster is thin.
This is the AI equivalent of having a small in-house bench you can always call on, even if the client has not staffed the full project team yet.
The prompts for those workers live in Markdown files now, not hardcoded in PHP. That makes them easier to iterate on without digging through service code.
The UI side
The old UX around delegation had two major problems:
- specialist work could start before the user saw any indication of what was happening
- raw sub-task output could flood the screen before the parent had synthesised anything useful
Both were bad.
The run timeline now emits an explicit "launching specialist agents" event up front. The chat view also collapses live sub-task transcripts behind a compact working card by default, with a manual toggle if you want to watch the raw specialist stream in real time.
There is also explicit progress now - how many specialist responses have completed, not just that "something is running".
That sounds cosmetic. It is not. In any asynchronous system, perceived reliability is shaped by whether the user can see the machine making progress.
Config gates
One thing I did not want was to hardwire all of this permanently. I needed a safe rollout to minimise the blast radius.
The inline parallel execution path can now be disabled from config. The evaluator loop can also be disabled from config, and when it is off the router stops sending requests into evaluate_loop.
That matters because there is no universally correct answer here.
Sometimes I want maximum quality. Sometimes I want minimum latency. Sometimes I want to A/B test the difference. These are operational levers, not permanent ideology.
What actually changed in practice
The system is better in four concrete ways:
- It picks a workflow shape earlier, with less prompt improvisation.
- Small multi-agent runs are faster because they do not wait on queued round-trips.
- Quality-sensitive prompts can go through voting or evaluation instead of one-shot drafting.
- Multi-agent orchestration still works even when the user has no custom agents configured.
The tradeoff is predictable:
- more structure
- more control
- better latency for small fan-out
- higher token cost on voting and evaluation paths
That is a trade I will take every time.
The old system was clever. The new system is more deliberate.
And in orchestration, deliberate beats clever.