# Part 4: Building an Agent with Claude

*Part of the* [*AI Agent Development 101 Series*](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-agent-development-101)

## Why I Started Using Claude for Agents

My first production agent used GPT-4o exclusively. It was reliable and fast. Then I got a task that required reading a large configuration file (around 80K tokens) before deciding what to do. GPT-4o at the time had a 128K context limit — technically enough, but the model's reasoning quality degraded noticeably when the context was that full. I was also paying for 80K input tokens on every single reasoning step.

I switched the agent to Claude 3.5 Sonnet for that project. Two things improved immediately: reasoning quality on large-context tasks stayed consistent, and prompt caching cut my input token costs by around 70% because the large config file was always at the start of the prompt.

This part covers rebuilding the ReAct agent from Part 3 using Anthropic's API, and goes deeper on the features that make Claude specifically useful for agent work.

***

## Prerequisites

```bash
pip install anthropic>=0.25.0 aiosqlite chromadb sentence-transformers python-dotenv
```

```bash
export ANTHROPIC_API_KEY="sk-ant-..."
```

***

## System Prompt Engineering for Claude

Claude and GPT-4o respond differently to the same prompt. In my experience:

* Claude follows **numbered rule lists** very reliably — more so than bullet points
* Claude benefits from an explicit **output format section** with a filled-in example
* Claude respects **negative instructions** ("never do X") consistently, whereas GPT-4o sometimes needs them rephrased positively

Here is the system prompt I use for my Claude ReAct agent:

```python
CLAUDE_REACT_SYSTEM_PROMPT = """\
You are a ReAct agent. Your job is to solve a given goal by alternating between reasoning (Thought) and acting (Action).

## Output format

You must reply in this exact format every turn. No deviations.

Thought: <Write out your reasoning. Begin by summarising what you know from observations so far. End with what you will do next and why.>
Action: <A single tool call, or FINISH>

## Action format

For a tool call: write the tool name followed by a JSON object of arguments.
Example: search {"query": "disk usage linux command"}

For finishing: write FINISH followed by your answer.
Example: FINISH The disk usage on / is 42%.

## Rules

1. Always write a Thought before every Action. No exceptions.
2. In the Thought, start with "So far I know:" and summarise relevant observations.
3. If a tool returned an error, explain in the Thought what went wrong before retrying.
4. Do not call FINISH until the goal is fully complete and verified.
5. Never use a tool name that is not listed below.
6. Never call the same tool with the same arguments twice.

## Available tools

{tool_descriptions}
"""
```

The `"So far I know:"` opener in Rule 2 is something I added after noticing Claude would occasionally drift from the original goal on long tasks. Forcing it to articulate accumulated knowledge at the start of each thought anchors the reasoning.

***

## The Claude ReAct Agent

The structure mirrors the OpenAI agent from Part 3. The differences are in how tool calls are sent and how responses are parsed.

```python
# claude_react_agent.py
from __future__ import annotations
import json
import os
import re
import asyncio
from anthropic import AsyncAnthropic
from react_agent import ReActAgent, StepType
from memory_system import MemorySystem
from tools import ToolDispatcher

client = AsyncAnthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

CLAUDE_REACT_SYSTEM_PROMPT = """\
You are a ReAct agent. Solve the goal by alternating Thought and Action.

Output format — use this every turn:
Thought: <summarise what you know so far, then explain your next step>
Action: <tool_name {"arg": "value"}> or <FINISH <answer>>

Rules:
1. Always write Thought before Action.
2. Start Thought with "So far I know:" and summarise observations.
3. Diagnose errors before retrying.
4. Only call FINISH when the goal is fully complete.
5. Never invent tool names.
6. Never repeat the same tool call with the same arguments.

Available tools:
{tool_descriptions}
"""


class ClaudeReActAgent(ReActAgent):
    def __init__(
        self,
        goal: str,
        dispatcher: ToolDispatcher,
        session_id: str,
        model: str = "claude-3-5-sonnet-20241022",
        use_extended_thinking: bool = False,
        thinking_budget: int = 4000,
        max_steps: int = 20,
    ) -> None:
        super().__init__(
            name="claude_react_agent",
            tools={},
            max_steps=max_steps,
        )
        self.goal = goal
        self.dispatcher = dispatcher
        self.model = model
        self.use_extended_thinking = use_extended_thinking
        self.thinking_budget = thinking_budget
        self.memory = MemorySystem(session_id=session_id)

        tool_desc = "\n".join(
            f"- {s['name']}: {s['description']}"
            for s in dispatcher.schemas()
        )
        self._system_prompt = CLAUDE_REACT_SYSTEM_PROMPT.format(
            tool_descriptions=tool_desc
        )

    async def start(self) -> None:
        await self.memory.restore()
        await self.memory.add("user", f"Goal: {self.goal}")

    def _build_messages(self) -> list[dict]:
        """Build the Anthropic message list from memory, excluding system."""
        return [
            msg for msg in self.memory.to_messages()
            if msg["role"] != "system"
        ]

    async def _think(self, goal: str) -> str:
        # Inject episodic memories
        recent_obs = [
            s.content for s in self.trace[-3:]
            if s.type == StepType.OBSERVATION
        ]
        query = recent_obs[-1] if recent_obs else goal
        recalls = self.memory.recall_relevant(query, top_k=2)

        messages = self._build_messages()
        if recalls:
            recall_note = "Relevant recalled context:\n" + "\n".join(f"- {r}" for r in recalls)
            messages.append({"role": "user", "content": recall_note})
            messages.append({"role": "assistant", "content": "Understood. I'll factor that in."})

        request_kwargs: dict = {
            "model": self.model,
            "max_tokens": 2048,
            "system": self._system_prompt,
            "messages": messages,
        }

        if self.use_extended_thinking:
            request_kwargs["thinking"] = {
                "type": "enabled",
                "budget_tokens": self.thinking_budget,
            }
            request_kwargs["max_tokens"] = self.thinking_budget + 2048

        response = await client.messages.create(**request_kwargs)

        # Prefer the text block; skip thinking blocks
        raw = ""
        for block in response.content:
            if block.type == "text":
                raw = block.text
                break

        await self.memory.add("assistant", raw, episodic=False)

        thought_match = re.search(r"Thought:\s*(.+?)(?=\nAction:|$)", raw, re.DOTALL)
        return thought_match.group(1).strip() if thought_match else raw

    async def _decide(self, thought: str) -> str:
        last_assistant = [
            e for e in self.memory.short_term.last(5)
            if e.role == "assistant"
        ]
        if not last_assistant:
            return "FINISH: Could not determine action."

        raw = last_assistant[-1].content
        action_match = re.search(r"Action:\s*(.+?)$", raw, re.MULTILINE | re.DOTALL)
        if not action_match:
            return "FINISH: No action found."
        return action_match.group(1).strip()

    async def _act(self, tool_name: str, args: str) -> str:
        if tool_name == "FINISH":
            return ""

        # Claude often outputs JSON args inline: tool_name {"key": "val"}
        # Split on first space to separate name from args
        if " " in tool_name:
            parts = tool_name.split(" ", 1)
            tool_name = parts[0]
            args = parts[1]

        result = await self.dispatcher.call(tool_name, args)
        await self.memory.add(
            "user",
            f"Observation: {result}",
            episodic=True,
        )
        return result
```

***

## Extended Thinking as an Explicit Reasoning Step

Claude's extended thinking is different from the Thought in the ReAct loop. The Thought is output text the model writes and that you see. Extended thinking is an internal reasoning process the model runs before generating output — it solves harder subproblems, checks its own logic, and only then writes the Thought and Action.

I use extended thinking for:

* Tasks where the model needs to plan more than 3 steps ahead
* Debugging tasks where it needs to reason about ambiguous error messages
* Any task where I've seen the model make a systematic reasoning error without it

Enable it selectively — it costs extra tokens and adds latency. For simple tool dispatch tasks it's overkill.

```python
# Example: enable extended thinking for a complex analysis task
agent = ClaudeReActAgent(
    goal="Analyse the Python files in ./src, identify which ones have no test coverage, and list them ordered by file size.",
    dispatcher=dispatcher,
    session_id="analysis_001",
    model="claude-3-7-sonnet-20250219",
    use_extended_thinking=True,
    thinking_budget=6000,
)
```

When I inspected the thinking blocks on complex tasks, Claude uses them to build a mental map of dependencies before deciding which tool to call first. Without extended thinking, it would sometimes call tools in an order that required backtracking.

***

## Prompt Caching to Cut Token Costs

If your agent always starts with the same large document (a codebase, a config file, a dataset), you're paying for those tokens on every step. Anthropic's prompt caching re-uses the KV-cached prefix across calls that share the same content.

Mark the cacheable prefix with `cache_control`:

```python
async def _think_with_caching(self, large_document: str, goal: str) -> str:
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": large_document,
                    "cache_control": {"type": "ephemeral"},  # cache this block
                },
                {
                    "type": "text",
                    "text": f"Goal: {goal}\n\nWhat should I do next?",
                },
            ],
        }
    ] + self._build_messages()[1:]  # add conversation history after the cached block

    response = await client.messages.create(
        model=self.model,
        max_tokens=2048,
        system=[
            {
                "type": "text",
                "text": self._system_prompt,
                "cache_control": {"type": "ephemeral"},  # cache the system prompt too
            }
        ],
        messages=messages,
    )
    # ...
```

**Real cost impact from my own usage:** I had a code review agent that loaded a 60K-token codebase on every step. After adding prompt caching, cached input tokens dropped the effective cost by \~72%. The cache hits on `claude-3-5-sonnet` cost $0.30/1M vs $3.00/1M for uncached.

The cache is valid for 5 minutes by default. For an agent that runs faster than that, every step after the first is cached.

***

## Complete Runnable Example

```python
# run_claude_agent.py
import asyncio
from claude_react_agent import ClaudeReActAgent
from dispatcher import ToolDispatcher
from tools import shell_tool, kv_set_tool, kv_get_tool
from long_term_memory import init_db


async def main() -> None:
    await init_db()

    dispatcher = ToolDispatcher([shell_tool, kv_set_tool, kv_get_tool])

    agent = ClaudeReActAgent(
        goal=(
            "Check how many Python files are in the current directory recursively. "
            "Store the count under the key 'python_file_count'. "
            "Return the count in your final answer."
        ),
        dispatcher=dispatcher,
        session_id="claude_session_001",
        model="claude-3-5-sonnet-20241022",
    )

    await agent.start()
    answer = await agent.run(agent.goal)

    print(f"\n=== Answer ===\n{answer}\n")
    print("=== Trace ===")
    agent.print_trace()


if __name__ == "__main__":
    asyncio.run(main())
```

The trace from Claude is noticeably more verbose in the Thought steps than GPT-4o. Claude tends to write longer, more detailed reasoning. That's useful for debugging — I can follow the logic precisely — but it does mean slightly higher output token usage.

***

## Head-to-Head: OpenAI vs Claude for Single Agents

After running both on the same task set in my personal projects, here is my honest comparison:

| Factor                                      | OpenAI `gpt-4o`               | Claude `3-5-sonnet`            |
| ------------------------------------------- | ----------------------------- | ------------------------------ |
| Tool selection accuracy (5-tool agent)      | High                          | High                           |
| Reasoning quality on long tasks (>20 steps) | Degrades slightly             | Stays consistent               |
| Structured output / strict schema           | Excellent (`response_format`) | Good (tool use)                |
| Large context handling                      | Good (128K)                   | Excellent (200K)               |
| Prompt caching                              | Not available                 | Available, 72% savings         |
| Thought verbosity                           | Concise                       | Verbose (better for debugging) |
| Speed on short tasks                        | Faster                        | Slightly slower                |
| Cost per step (without caching)             | Lower                         | Slightly higher                |

For agents on large documents: Claude wins on both quality and cost (with caching). For short, well-defined tasks: GPT-4o is faster and slightly cheaper. For production agents where I need to debug failures: Claude's verbose thoughts are worth the extra tokens.

***

## Key Takeaways

* System prompt structure matters more with Claude — numbered rules and explicit format sections outperform bullet points
* Extended thinking is useful for complex multi-step planning; enable it selectively
* Prompt caching on the system prompt and large documents cuts costs significantly on repeat calls
* Claude's verbose Thought output is a feature, not a bug — it makes agent decisions traceable
* The `MemorySystem` from Part 2 works unchanged with Claude

***

## Up Next

[Part 5: Evaluating and Testing Your Agent](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-agent-development-101/part-5-evaluating-agents) — deterministic tests for tool dispatch, trajectory evaluation to check if the agent took a sensible path, and regression testing when you upgrade the underlying model.
