Part 4: Orchestrating Agents with Claude
Part of the Multi Agent Orchestration 101 Series
Why I Reach for Claude in Orchestration
After building the OpenAI supervisor in Part 3, I started noticing where GPT-4o struggled: long planning chains with many interdependencies, complex instruction sets that required careful reading, and tasks where I needed the model to reason through a problem step-by-step before acting. That's when I started using Claude as the supervisor and kept the cheaper OpenAI models as workers.
This part covers:
Anthropic's tool use API (structurally similar to OpenAI, different details)
Claude's extended thinking for planning agents
Mixing Claude and OpenAI agents in one system
When I choose one over the other
Prerequisites
pip install anthropic>=0.25.0 openai>=1.14.0 python-dotenvexport ANTHROPIC_API_KEY="sk-ant-..."Anthropic Tool Use: The API Differences
Anthropic's tool use follows the same mental model as OpenAI function calling — you send a list of tool schemas, the model returns a tool_use block, you run the tool, send a tool_result block, and the model continues.
The structural differences from OpenAI:
tools: [{type: "function", function: {...}}]
tools: [{name: ..., description: ..., input_schema: {...}}]
finish_reason == "tool_calls"
stop_reason == "tool_use"
tool_calls[i].function.arguments (JSON string)
content[i].input (already a dict)
Role "tool" in next message
Role "user" with type: "tool_result"
This last point trips people up. In Anthropic's API, tool results are sent as user messages with a tool_result content type. OpenAI uses a dedicated "tool" role.
Claude Agent Implementation
The message history reconstruction (the while i loop) is the awkward part. Because Anthropic pairs tool_use assistant turns with tool_result user turns, I have to match them up when rebuilding history from my flat memory list. This is the price of abstracting away the API differences.
Extended Thinking for Planning
Claude's extended thinking lets the model reason internally before responding. The reasoning is visible to you (useful for debugging) but not shown to the user. I enable it selectively for supervisor agents that need to plan across many steps.
When I run this, I can inspect the thinking blocks in the raw API response to see how the model reasoned about the delegation order. It doesn't always take the most efficient path, but it rarely makes logic errors on complex multi-step goals.
Mixing Claude and OpenAI Agents
The fact that both patterns use the same AgentV2 base means mixing them is trivial. The make_worker_tool function from Part 3 works unchanged with a ClaudeAgent.
This is where building from first principles pays off. Because I defined the agent interface myself, I'm not dependent on any framework's compatibility matrix.
When I Choose Claude vs OpenAI
I don't treat this as a religious debate. I switch based on the task:
Choose Claude when:
The supervisor needs to reason through ambiguous or multi-step plans
The system prompt is long and detailed (Claude follows complex instructions better)
You want extended thinking for debugging agent decisions
The context window needs to fit large documents (200K vs 128K for GPT-4o)
Choose OpenAI when:
The task is well-defined and the worker just needs to execute (GPT-4o-mini is fast and cheap)
You need structured output with
response_format: json_schema(OpenAI's strict mode is more reliable for complex schemas)Latency is critical — GPT-4o-mini is noticeably faster than Haiku for short tasks in my benchmarks
Your team already has OpenAI credits and quota capacity
In practice, my personal projects use Claude Sonnet or claude-3-7-sonnet as the supervisor and GPT-4o-mini as workers. It's a good cost/capability trade-off.
A Note on API Version Pinning
Both Anthropic and OpenAI change their APIs frequently. I learned this the hard way when an Anthropic update changed the thinking block format and broke my parsing. Pin your library versions in requirements.txt and test before upgrading:
Key Takeaways
Anthropic tool results are sent as
usermessages, not atoolrole — watch for thisExtended thinking is useful for supervisor-level planning; enable it selectively
The
AgentV2abstraction makes mixing Claude and OpenAI workers trivialChoose the model based on the task, not loyalty to a provider
Pin library versions — both APIs evolve fast
Up Next
Part 5: Production Patterns — structured logging across agents, cost tracking, rate limiting, and the biggest mistakes I made in production.
Last updated