Nov 25, 2025
Yulei Sheng

We recently adopted Gemini 3 Pro in our system. While it's great in general and aggressively uses parallel tool calls (which increases speed and reduces token consumption), it introduces a challenge. Although the majority of our tools can be called in parallel, there are a few that absolutely cannot.
There are two typical solutions:
1. Prompt Engineering: It works, but not 100% of the time due to the probabilistic nature of LLMs.
2. Disable Parallel Tool Calls Altogether: We want to avoid this because we prefer the speed and cost benefits of parallelization.
So, we built a "Pending Queue" pattern on top of the Vercel AI SDK.
System Requirements
We have several requirements for the system:
1. Fully leverage the power of LLM parallel tool calls.
2. Detect bad parallel tool calls before execution to prevent side effects.
3. Help the model self-recover.
The "Pending Queue" Pattern
Instead of executing tools immediately, we decouple the execution from the tool call. Here is the architecture:
1. Flag: Mark tools that cannot be called in parallel as 'nonParallelizable'.
2. Intercept: Before sending tools to 'streamText', we wrap 'nonParallelizable' tools. When called, the wrapper:
a. Pushes the real execution closure into a Pending Queue.
b. Returns a placeholder result immediately.
3. Validate: Once the AI SDK has executed all tool calls for the step, we inspect the batch.
4. Run or Reject:
- If the batch is invalid ('nonParallelizable' tools was called with others): Reject the executions of 'nonParallelizable' tools. Replace the placeholder results with a clear error message asking the agent to call the tool individually. NOTE: All parallelizable tools in the batch were already successfully executed, so we treat them as no-ops.
- If the batch is valid (the 'nonParallelizable' tool was called alone): Fetch the original execution closure from the pending queue. Execute it. Replace the placeholder result with the actual result.
5. Send Back to LLM: Send the final tool call results (including any error messages or delayed execution results) back to the LLM.
Implementation
Here is a simplified generic implementation using TypeScript.
1. The Tool Wrapper
First, we wrap our tools. If a tool is marked 'nonParallelizable', we don't run it; we queue it.
The Execution Loop
In your main agent loop (where you handle the model's response), you validate the entire batch before finalizing results.
Benefits
1. No "Ghost" Side Effects
Because we return a placeholder (`status: 'pending'`) initially, the dangerous code never runs if the validation fails. You don't have to rollback database transactions or undo API calls.
2. Self-Correcting Agents
By returning a specific error message ("Must be called alone"), you turn a system failure into a prompt. The model sees the error in the message history and self-corrects in the next step.
3. Fully Leverage Parallel Tool Calls
We don't have to disable parallel tool calls globally just for the 1% of tools that can't be run in parallel.
4. Compatibility
This pattern works cleanly on top of the Vercel AI SDK and any model providers.
