# Part 4: Orchestrating Agents with Claude

*Part of the* [*Multi Agent Orchestration 101 Series*](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/multi-agent-orchestration-101)

## Why I Reach for Claude in Orchestration

After building the OpenAI supervisor in Part 3, I started noticing where GPT-4o struggled: long planning chains with many interdependencies, complex instruction sets that required careful reading, and tasks where I needed the model to reason through a problem step-by-step before acting. That's when I started using Claude as the supervisor and kept the cheaper OpenAI models as workers.

This part covers:

* Anthropic's tool use API (structurally similar to OpenAI, different details)
* Claude's extended thinking for planning agents
* Mixing Claude and OpenAI agents in one system
* When I choose one over the other

***

## Prerequisites

```bash
pip install anthropic>=0.25.0 openai>=1.14.0 python-dotenv
```

```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```

***

## Anthropic Tool Use: The API Differences

Anthropic's tool use follows the same mental model as OpenAI function calling — you send a list of tool schemas, the model returns a `tool_use` block, you run the tool, send a `tool_result` block, and the model continues.

The structural differences from OpenAI:

| OpenAI                                           | Anthropic                                                     |
| ------------------------------------------------ | ------------------------------------------------------------- |
| `tools: [{type: "function", function: {...}}]`   | `tools: [{name: ..., description: ..., input_schema: {...}}]` |
| `finish_reason == "tool_calls"`                  | `stop_reason == "tool_use"`                                   |
| `tool_calls[i].function.arguments` (JSON string) | `content[i].input` (already a dict)                           |
| Role `"tool"` in next message                    | Role `"user"` with `type: "tool_result"`                      |

This last point trips people up. In Anthropic's API, tool results are sent as **user messages** with a `tool_result` content type. OpenAI uses a dedicated `"tool"` role.

***

## Claude Agent Implementation

```python
# claude_agent.py
from __future__ import annotations
import json
import os
from anthropic import AsyncAnthropic
from agent_v2 import AgentV2, Message
from dispatcher import ToolDispatcher

client = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"])


class ClaudeAgent(AgentV2):
    def __init__(
        self,
        name: str,
        system_prompt: str,
        dispatcher: ToolDispatcher,
        model: str = "claude-3-5-haiku-20241022",
        persist: bool = False,
        use_extended_thinking: bool = False,
        thinking_budget: int = 5000,
    ) -> None:
        super().__init__(
            name=name,
            system_prompt=system_prompt,
            dispatcher=dispatcher,
            persist=persist,
        )
        self.model = model
        self.use_extended_thinking = use_extended_thinking
        self.thinking_budget = thinking_budget

    async def _think(self) -> str:
        """Call Anthropic with current memory and available tools."""
        # Build Anthropic message history from agent memory
        messages: list[dict] = []
        i = 0
        while i < len(self.memory):
            msg = self.memory[i]

            if msg.role == "user":
                messages.append({"role": "user", "content": msg.content})
                i += 1

            elif msg.role == "assistant" and "tool_use_id" in msg.metadata:
                # This was a tool_use assistant turn
                messages.append({
                    "role": "assistant",
                    "content": [
                        {
                            "type": "tool_use",
                            "id": msg.metadata["tool_use_id"],
                            "name": msg.metadata["tool_name"],
                            "input": msg.metadata["tool_input"],
                        }
                    ],
                })
                # Next message should be the tool result
                if i + 1 < len(self.memory) and self.memory[i + 1].role == "tool":
                    tool_result_msg = self.memory[i + 1]
                    messages.append({
                        "role": "user",
                        "content": [
                            {
                                "type": "tool_result",
                                "tool_use_id": msg.metadata["tool_use_id"],
                                "content": tool_result_msg.content,
                            }
                        ],
                    })
                    i += 2
                else:
                    i += 1

            else:
                messages.append({"role": msg.role, "content": msg.content})
                i += 1

        # Build request kwargs
        request_kwargs: dict = {
            "model": self.model,
            "max_tokens": 4096,
            "system": self.system_prompt,
            "messages": messages,
            "tools": [
                {
                    "name": s["name"],
                    "description": s["description"],
                    "input_schema": s["parameters"],
                }
                for s in self.dispatcher.schemas()
            ],
        }

        if self.use_extended_thinking:
            request_kwargs["thinking"] = {
                "type": "enabled",
                "budget_tokens": self.thinking_budget,
            }
            # Extended thinking requires a higher max_tokens
            request_kwargs["max_tokens"] = self.thinking_budget + 4096

        response = await client.messages.create(**request_kwargs)

        # Find tool_use block if present
        for block in response.content:
            if block.type == "tool_use":
                # Store assistant turn in memory so history stays valid
                self.memory.append(
                    Message(
                        role="assistant",
                        content="",
                        metadata={
                            "tool_use_id": block.id,
                            "tool_name": block.name,
                            "tool_input": block.input,
                        },
                    )
                )
                return f"TOOL: {block.name} {json.dumps(block.input)}"

        # Plain text response
        text = next(
            (b.text for b in response.content if b.type == "text"), ""
        )
        return f"DONE: {text}"
```

The message history reconstruction (the `while i` loop) is the awkward part. Because Anthropic pairs tool\_use assistant turns with tool\_result user turns, I have to match them up when rebuilding history from my flat `memory` list. This is the price of abstracting away the API differences.

***

## Extended Thinking for Planning

Claude's extended thinking lets the model reason internally before responding. The reasoning is visible to you (useful for debugging) but not shown to the user. I enable it selectively for supervisor agents that need to plan across many steps.

```python
# claude_supervisor_thinking.py
import asyncio
import os
from claude_agent import ClaudeAgent
from openai_agent import OpenAIAgent
from dispatcher import ToolDispatcher
from supervisor import make_worker_tool
from tools import shell_tool, kv_set_tool, kv_get_tool


async def main() -> None:
    # Workers can be OpenAI — cheaper for execution
    shell_worker = OpenAIAgent(
        name="shell_worker",
        system_prompt="Execute shell commands. Return output concisely.",
        dispatcher=ToolDispatcher([shell_tool]),
        model="gpt-4o-mini",
    )

    memory_worker = OpenAIAgent(
        name="memory_worker",
        system_prompt="Store and retrieve data using kv_set and kv_get.",
        dispatcher=ToolDispatcher([kv_set_tool, kv_get_tool]),
        model="gpt-4o-mini",
    )

    # Supervisor is Claude with extended thinking
    supervisor = ClaudeAgent(
        name="claude_supervisor",
        system_prompt=(
            "You are a careful orchestration agent. "
            "Think step-by-step before delegating. "
            "Use 'shell_worker' for commands, 'memory_worker' for data storage. "
            "Always verify results before returning a final answer."
        ),
        dispatcher=ToolDispatcher(
            [make_worker_tool(shell_worker), make_worker_tool(memory_worker)]
        ),
        model="claude-3-7-sonnet-20250219",
        use_extended_thinking=True,
        thinking_budget=8000,
    )

    result = await supervisor.run(
        "Check how much disk space is free on the root partition, "
        "then store that value under the key 'disk_free'. "
        "Return a one-sentence summary."
    )
    print(result)


if __name__ == "__main__":
    asyncio.run(main())
```

When I run this, I can inspect the `thinking` blocks in the raw API response to see how the model reasoned about the delegation order. It doesn't always take the most efficient path, but it rarely makes logic errors on complex multi-step goals.

***

## Mixing Claude and OpenAI Agents

The fact that both patterns use the same `AgentV2` base means mixing them is trivial. The `make_worker_tool` function from Part 3 works unchanged with a `ClaudeAgent`.

```python
# Both agents implement AgentV2.run(task: str) -> str
# The supervisor doesn't care which backend the worker uses

mixed_dispatcher = ToolDispatcher([
    make_worker_tool(openai_worker),   # OpenAI GPT-4o-mini
    make_worker_tool(claude_worker),   # Claude Haiku
    kv_set_tool,
    kv_get_tool,
])
```

This is where building from first principles pays off. Because I defined the agent interface myself, I'm not dependent on any framework's compatibility matrix.

***

## When I Choose Claude vs OpenAI

I don't treat this as a religious debate. I switch based on the task:

**Choose Claude when:**

* The supervisor needs to reason through ambiguous or multi-step plans
* The system prompt is long and detailed (Claude follows complex instructions better)
* You want extended thinking for debugging agent decisions
* The context window needs to fit large documents (200K vs 128K for GPT-4o)

**Choose OpenAI when:**

* The task is well-defined and the worker just needs to execute (GPT-4o-mini is fast and cheap)
* You need structured output with `response_format: json_schema` (OpenAI's strict mode is more reliable for complex schemas)
* Latency is critical — GPT-4o-mini is noticeably faster than Haiku for short tasks in my benchmarks
* Your team already has OpenAI credits and quota capacity

In practice, my personal projects use Claude Sonnet or claude-3-7-sonnet as the supervisor and GPT-4o-mini as workers. It's a good cost/capability trade-off.

***

## A Note on API Version Pinning

Both Anthropic and OpenAI change their APIs frequently. I learned this the hard way when an Anthropic update changed the `thinking` block format and broke my parsing. Pin your library versions in `requirements.txt` and test before upgrading:

```
anthropic==0.25.0
openai==1.14.0
```

***

## Key Takeaways

* Anthropic tool results are sent as `user` messages, not a `tool` role — watch for this
* Extended thinking is useful for supervisor-level planning; enable it selectively
* The `AgentV2` abstraction makes mixing Claude and OpenAI workers trivial
* Choose the model based on the task, not loyalty to a provider
* Pin library versions — both APIs evolve fast

***

## Up Next

[Part 5: Production Patterns](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/multi-agent-orchestration-101/part-5-production-patterns) — structured logging across agents, cost tracking, rate limiting, and the biggest mistakes I made in production.
