Part 4: Orchestrating Agents with Claude

Part of the Multi Agent Orchestration 101 Series

Why I Reach for Claude in Orchestration

After building the OpenAI supervisor in Part 3, I started noticing where GPT-4o struggled: long planning chains with many interdependencies, complex instruction sets that required careful reading, and tasks where I needed the model to reason through a problem step-by-step before acting. That's when I started using Claude as the supervisor and kept the cheaper OpenAI models as workers.

This part covers:

  • Anthropic's tool use API (structurally similar to OpenAI, different details)

  • Claude's extended thinking for planning agents

  • Mixing Claude and OpenAI agents in one system

  • When I choose one over the other


Prerequisites

pip install anthropic>=0.25.0 openai>=1.14.0 python-dotenv
export ANTHROPIC_API_KEY="sk-ant-..."

Anthropic Tool Use: The API Differences

Anthropic's tool use follows the same mental model as OpenAI function calling — you send a list of tool schemas, the model returns a tool_use block, you run the tool, send a tool_result block, and the model continues.

The structural differences from OpenAI:

OpenAI
Anthropic

tools: [{type: "function", function: {...}}]

tools: [{name: ..., description: ..., input_schema: {...}}]

finish_reason == "tool_calls"

stop_reason == "tool_use"

tool_calls[i].function.arguments (JSON string)

content[i].input (already a dict)

Role "tool" in next message

Role "user" with type: "tool_result"

This last point trips people up. In Anthropic's API, tool results are sent as user messages with a tool_result content type. OpenAI uses a dedicated "tool" role.


Claude Agent Implementation

The message history reconstruction (the while i loop) is the awkward part. Because Anthropic pairs tool_use assistant turns with tool_result user turns, I have to match them up when rebuilding history from my flat memory list. This is the price of abstracting away the API differences.


Extended Thinking for Planning

Claude's extended thinking lets the model reason internally before responding. The reasoning is visible to you (useful for debugging) but not shown to the user. I enable it selectively for supervisor agents that need to plan across many steps.

When I run this, I can inspect the thinking blocks in the raw API response to see how the model reasoned about the delegation order. It doesn't always take the most efficient path, but it rarely makes logic errors on complex multi-step goals.


Mixing Claude and OpenAI Agents

The fact that both patterns use the same AgentV2 base means mixing them is trivial. The make_worker_tool function from Part 3 works unchanged with a ClaudeAgent.

This is where building from first principles pays off. Because I defined the agent interface myself, I'm not dependent on any framework's compatibility matrix.


When I Choose Claude vs OpenAI

I don't treat this as a religious debate. I switch based on the task:

Choose Claude when:

  • The supervisor needs to reason through ambiguous or multi-step plans

  • The system prompt is long and detailed (Claude follows complex instructions better)

  • You want extended thinking for debugging agent decisions

  • The context window needs to fit large documents (200K vs 128K for GPT-4o)

Choose OpenAI when:

  • The task is well-defined and the worker just needs to execute (GPT-4o-mini is fast and cheap)

  • You need structured output with response_format: json_schema (OpenAI's strict mode is more reliable for complex schemas)

  • Latency is critical — GPT-4o-mini is noticeably faster than Haiku for short tasks in my benchmarks

  • Your team already has OpenAI credits and quota capacity

In practice, my personal projects use Claude Sonnet or claude-3-7-sonnet as the supervisor and GPT-4o-mini as workers. It's a good cost/capability trade-off.


A Note on API Version Pinning

Both Anthropic and OpenAI change their APIs frequently. I learned this the hard way when an Anthropic update changed the thinking block format and broke my parsing. Pin your library versions in requirements.txt and test before upgrading:


Key Takeaways

  • Anthropic tool results are sent as user messages, not a tool role — watch for this

  • Extended thinking is useful for supervisor-level planning; enable it selectively

  • The AgentV2 abstraction makes mixing Claude and OpenAI workers trivial

  • Choose the model based on the task, not loyalty to a provider

  • Pin library versions — both APIs evolve fast


Up Next

Part 5: Production Patterns — structured logging across agents, cost tracking, rate limiting, and the biggest mistakes I made in production.

Last updated