Part 4: Orchestrating Agents with Claude

Part of the Multi Agent Orchestration 101 Series

Why I Reach for Claude in Orchestration

After building the OpenAI supervisor in Part 3, I started noticing where GPT-4o struggled: long planning chains with many interdependencies, complex instruction sets that required careful reading, and tasks where I needed the model to reason through a problem step-by-step before acting. That's when I started using Claude as the supervisor and kept the cheaper OpenAI models as workers.

This part covers:

Anthropic's tool use API (structurally similar to OpenAI, different details)
Claude's extended thinking for planning agents
Mixing Claude and OpenAI agents in one system
When I choose one over the other

Prerequisites

pip install anthropic>=0.25.0 openai>=1.14.0 python-dotenv

export ANTHROPIC_API_KEY="sk-ant-..."

Anthropic Tool Use: The API Differences

Anthropic's tool use follows the same mental model as OpenAI function calling — you send a list of tool schemas, the model returns a tool_use block, you run the tool, send a tool_result block, and the model continues.

The structural differences from OpenAI:

OpenAI

Anthropic

tools: [{type: "function", function: {...}}]

tools: [{name: ..., description: ..., input_schema: {...}}]

finish_reason == "tool_calls"

stop_reason == "tool_use"

tool_calls[i].function.arguments (JSON string)

content[i].input (already a dict)

Role "tool" in next message

Role "user" with type: "tool_result"

This last point trips people up. In Anthropic's API, tool results are sent as user messages with a tool_result content type. OpenAI uses a dedicated "tool" role.

Claude Agent Implementation

# claude_agent.py
from __future__ import annotations
import json
import os
from anthropic import AsyncAnthropic
from agent_v2 import AgentV2, Message
from dispatcher import ToolDispatcher

client = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


class ClaudeAgent(AgentV2):
    def __init__(
        self,
        name: str,
        system_prompt: str,
        dispatcher: ToolDispatcher,
        model: str = "claude-3-5-haiku-20241022",
        persist: bool = False,
        use_extended_thinking: bool = False,
        thinking_budget: int = 5000,
    ) -> None:
        super().__init__(
            name=name,
            system_prompt=system_prompt,
            dispatcher=dispatcher,
            persist=persist,
        )
        self.model = model
        self.use_extended_thinking = use_extended_thinking
        self.thinking_budget = thinking_budget

    async def _think(self) -> str:
        """Call Anthropic with current memory and available tools."""
        # Build Anthropic message history from agent memory
        messages: list[dict] = []
        i = 0
        while i < len(self.memory):
            msg = self.memory[i]

            if msg.role == "user":
                messages.append({"role": "user", "content": msg.content})
                i += 1

            elif msg.role == "assistant" and "tool_use_id" in msg.metadata:
                # This was a tool_use assistant turn
                messages.append({
                    "role": "assistant",
                    "content": [
                        {
                            "type": "tool_use",
                            "id": msg.metadata["tool_use_id"],
                            "name": msg.metadata["tool_name"],
                            "input": msg.metadata["tool_input"],
                        }
                    ],
                })
                # Next message should be the tool result
                if i + 1 < len(self.memory) and self.memory[i + 1].role == "tool":
                    tool_result_msg = self.memory[i + 1]
                    messages.append({
                        "role": "user",
                        "content": [
                            {
                                "type": "tool_result",
                                "tool_use_id": msg.metadata["tool_use_id"],
                                "content": tool_result_msg.content,
                            }
                        ],
                    })
                    i += 2
                else:
                    i += 1

            else:
                messages.append({"role": msg.role, "content": msg.content})
                i += 1

        # Build request kwargs
        request_kwargs: dict = {
            "model": self.model,
            "max_tokens": 4096,
            "system": self.system_prompt,
            "messages": messages,
            "tools": [
                {
                    "name": s["name"],
                    "description": s["description"],
                    "input_schema": s["parameters"],
                }
                for s in self.dispatcher.schemas()
            ],
        }

        if self.use_extended_thinking:
            request_kwargs["thinking"] = {
                "type": "enabled",
                "budget_tokens": self.thinking_budget,
            }
            # Extended thinking requires a higher max_tokens
            request_kwargs["max_tokens"] = self.thinking_budget + 4096

        response = await client.messages.create(**request_kwargs)

        # Find tool_use block if present
        for block in response.content:
            if block.type == "tool_use":
                # Store assistant turn in memory so history stays valid
                self.memory.append(
                    Message(
                        role="assistant",
                        content="",
                        metadata={
                            "tool_use_id": block.id,
                            "tool_name": block.name,
                            "tool_input": block.input,
                        },
                    )
                )
                return f"TOOL: {block.name} {json.dumps(block.input)}"

        # Plain text response
        text = next(
            (b.text for b in response.content if b.type == "text"), ""
        )
        return f"DONE: {text}"

The message history reconstruction (the while i loop) is the awkward part. Because Anthropic pairs tool_use assistant turns with tool_result user turns, I have to match them up when rebuilding history from my flat memory list. This is the price of abstracting away the API differences.

Extended Thinking for Planning

Claude's extended thinking lets the model reason internally before responding. The reasoning is visible to you (useful for debugging) but not shown to the user. I enable it selectively for supervisor agents that need to plan across many steps.

# claude_supervisor_thinking.py
import asyncio
import os
from claude_agent import ClaudeAgent
from openai_agent import OpenAIAgent
from dispatcher import ToolDispatcher
from supervisor import make_worker_tool
from tools import shell_tool, kv_set_tool, kv_get_tool


async def main() -> None:
    # Workers can be OpenAI — cheaper for execution
    shell_worker = OpenAIAgent(
        name="shell_worker",
        system_prompt="Execute shell commands. Return output concisely.",
        dispatcher=ToolDispatcher([shell_tool]),
        model="gpt-4o-mini",
    )

    memory_worker = OpenAIAgent(
        name="memory_worker",
        system_prompt="Store and retrieve data using kv_set and kv_get.",
        dispatcher=ToolDispatcher([kv_set_tool, kv_get_tool]),
        model="gpt-4o-mini",
    )

    # Supervisor is Claude with extended thinking
    supervisor = ClaudeAgent(
        name="claude_supervisor",
        system_prompt=(
            "You are a careful orchestration agent. "
            "Think step-by-step before delegating. "
            "Use 'shell_worker' for commands, 'memory_worker' for data storage. "
            "Always verify results before returning a final answer."
        ),
        dispatcher=ToolDispatcher(
            [make_worker_tool(shell_worker), make_worker_tool(memory_worker)]
        ),
        model="claude-3-7-sonnet-20250219",
        use_extended_thinking=True,
        thinking_budget=8000,
    )

    result = await supervisor.run(
        "Check how much disk space is free on the root partition, "
        "then store that value under the key 'disk_free'. "
        "Return a one-sentence summary."
    )
    print(result)


if __name__ == "__main__":
    asyncio.run(main())

When I run this, I can inspect the thinking blocks in the raw API response to see how the model reasoned about the delegation order. It doesn't always take the most efficient path, but it rarely makes logic errors on complex multi-step goals.

Mixing Claude and OpenAI Agents

The fact that both patterns use the same AgentV2 base means mixing them is trivial. The make_worker_tool function from Part 3 works unchanged with a ClaudeAgent.

# Both agents implement AgentV2.run(task: str) -> str
# The supervisor doesn't care which backend the worker uses

mixed_dispatcher = ToolDispatcher([
    make_worker_tool(openai_worker),   # OpenAI GPT-4o-mini
    make_worker_tool(claude_worker),   # Claude Haiku
    kv_set_tool,
    kv_get_tool,
])

This is where building from first principles pays off. Because I defined the agent interface myself, I'm not dependent on any framework's compatibility matrix.

When I Choose Claude vs OpenAI

I don't treat this as a religious debate. I switch based on the task:

Choose Claude when:

The supervisor needs to reason through ambiguous or multi-step plans
The system prompt is long and detailed (Claude follows complex instructions better)
You want extended thinking for debugging agent decisions
The context window needs to fit large documents (200K vs 128K for GPT-4o)

Choose OpenAI when:

The task is well-defined and the worker just needs to execute (GPT-4o-mini is fast and cheap)
You need structured output with response_format: json_schema (OpenAI's strict mode is more reliable for complex schemas)
Latency is critical — GPT-4o-mini is noticeably faster than Haiku for short tasks in my benchmarks
Your team already has OpenAI credits and quota capacity

In practice, my personal projects use Claude Sonnet or claude-3-7-sonnet as the supervisor and GPT-4o-mini as workers. It's a good cost/capability trade-off.

A Note on API Version Pinning

Both Anthropic and OpenAI change their APIs frequently. I learned this the hard way when an Anthropic update changed the thinking block format and broke my parsing. Pin your library versions in requirements.txt and test before upgrading:

anthropic==0.25.0
openai==1.14.0

Key Takeaways

Anthropic tool results are sent as user messages, not a tool role — watch for this
Extended thinking is useful for supervisor-level planning; enable it selectively
The AgentV2 abstraction makes mixing Claude and OpenAI workers trivial
Choose the model based on the task, not loyalty to a provider
Pin library versions — both APIs evolve fast

Up Next

Part 5: Production Patterns — structured logging across agents, cost tracking, rate limiting, and the biggest mistakes I made in production.

PreviousPart 3: Orchestrating Agents with OpenAI NextPart 5: Production Patterns

Last updated 1 month ago

hashtagWhy I Reach for Claude in Orchestration

hashtagPrerequisites

hashtagAnthropic Tool Use: The API Differences

hashtagClaude Agent Implementation

hashtagExtended Thinking for Planning

hashtagMixing Claude and OpenAI Agents

hashtagWhen I Choose Claude vs OpenAI

hashtagA Note on API Version Pinning

hashtagKey Takeaways

hashtagUp Next