After shipping direct streaming, the parent agent developed an annoying habit. The specialist would stream a detailed research analysis directly to the user. Then the parent would write three paragraphs summarising exactly what the user just read. "The Researcher found that..." followed by the same points, rephrased slightly worse.
Why it happened
The parent agent receives the specialist's output as a tool result. From the model's perspective, it just received new information and its job is to present it to the user. It doesn't know the user already read it in real time. So it does what any diligent assistant would do: summarise and present the findings.
It's a classic case of information asymmetry - the parent's picture of the world doesn't include the side-channel delivery that already happened. Without being told "the user already has this," it assumes it's the only one talking.
Think of a project manager who sits in on a presentation, then books a follow-up meeting to tell you what was said in the presentation you both attended.
The prompt engineering fix
This wasn't a code problem. The streaming infrastructure worked perfectly. The issue was what we told the parent agent alongside the specialist's output.
The solution is a technique called inline instruction injection - instead of relying on a general rule buried somewhere in the system prompt (which might be 4,000 words earlier and easy for the model to overlook), you embed the instruction right next to the data it applies to. The tool result now says, right there alongside the specialist's output: "This was already streamed directly to the user. Do NOT repeat, rephrase, or re-summarise it. Simply confirm the task was completed."
The instruction is direct, even blunt. That's deliberate. Open-source models (which is what we run through Chutes) respond well to clear, unambiguous commands. They respond poorly to gentle suggestions like "you may want to avoid repeating" - the model reads the hedging language ("may want to") as optional guidance and often ignores it. This is a known pattern: the more specific and firm the constraint, the more reliably the model follows it. Prompt engineers call this constraint specificity, and it's one of those things that sounds obvious until you've watched a model cheerfully ignore a polite instruction for the tenth time.
The difference
Before: specialist streams a 200-word analysis, parent writes 150 words restating the same points. The user reads everything twice and the system burns extra tokens for zero informational gain.
After: specialist streams the analysis, parent says "Both tasks are handled." Done. No clutter, no redundancy, and the response feels faster because there's no unnecessary tail.
Lesson learned
When designing multi-agent systems, the handoff protocol matters as much as the execution. Getting the specialist to produce good output was the easy part. Getting the parent to shut up about it was the hard part.
The solution was treating the tool result as a control message, not just a data message. It doesn't just carry content - it carries instructions about how to behave with that content. In systems engineering this separation between "what the data is" and "what to do with it" is called the control plane versus the data plane, and it turns out the same concept applies perfectly to LLM orchestration. The tool result is both the envelope and the delivery instructions.