Part 1: Agent Foundations and the ReAct Loop

Part of the AI Agent Development 101 Series

The Paper That Changed How I Think About Agents

I had been building LLM pipelines for months before I read the ReAct paper. My agents were essentially prompt chains: question goes in, answer comes out, maybe with a tool call in the middle if I was lucky. They were brittle. Changing the prompt slightly would break the tool selection. Adding a new tool required rewriting the whole system prompt.

Then I read "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). The insight is obvious in retrospect: let the model write its reasoning out loud before it acts. A thought step first, then an action, then observe the result, then another thought. This interleaving of reasoning and acting is what separates a capable agent from a fancy prompt chain.

I rebuilt my personal home automation agent using this pattern and the improvement was immediate — not just in reliability, but in debuggability. Now I can read the trace and understand exactly why the agent made a decision.

This part builds the ReAct loop in pure Python 3 with no LLM so you can see the structure cleanly before adding AI complexity.

The ReAct Pattern Explained

A ReAct agent loop looks like this:

User: "What's the disk usage on my server and is it above 80%?"

Thought: I need to check disk usage. I'll use the shell tool to run df.
Action: shell("df -h /")
Observation: /dev/sda1  50G  38G  9G  81% /

Thought: Disk usage is 81%, which is above 80%. I should report this clearly.
Action: FINISH("Disk usage is 81% — above the 80% threshold. Consider clearing logs or extending storage.")

Three things happen in each cycle:

Thought — the agent reasons about what it knows and what it needs to do next
Action — the agent calls a tool or decides to finish
Observation — the tool result is fed back into the context

The key insight: the thought step is not cosmetic. It gives the model a "scratchpad" to work through the problem before committing to an action. Agents without explicit reasoning tend to pick actions impulsively based on surface pattern matching.

Building the ReAct Loop in Pure Python 3

No LLM yet. I'll use a rule-based _think() so the structure is visible without noise.

# react_agent.py
from __future__ import annotations
import asyncio
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Callable, Awaitable


class StepType(str, Enum):
    THOUGHT = "thought"
    ACTION = "action"
    OBSERVATION = "observation"
    FINISH = "finish"


@dataclass
class Step:
    type: StepType
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)


@dataclass
class ReActAgent:
    name: str
    tools: dict[str, Callable[..., Awaitable[str]]] = field(default_factory=dict)
    trace: list[Step] = field(default_factory=list)
    max_steps: int = 20

    async def run(self, goal: str) -> str:
        """
        Run the ReAct loop until the agent calls FINISH or hits max_steps.
        Returns the final answer.
        """
        self.trace.clear()
        self._add(StepType.THOUGHT, f"I need to achieve: {goal}")

        for step_num in range(self.max_steps):
            thought = await self._think(goal)
            self._add(StepType.THOUGHT, thought)

            action_str = await self._decide(thought)

            if action_str.startswith("FINISH:"):
                answer = action_str[7:].strip()
                self._add(StepType.FINISH, answer)
                return answer

            # Parse "TOOL_NAME: <args>"
            if ":" in action_str:
                tool_name, args = action_str.split(":", 1)
                tool_name = tool_name.strip()
                args = args.strip()
            else:
                tool_name, args = action_str.strip(), ""

            self._add(StepType.ACTION, action_str)
            observation = await self._act(tool_name, args)
            self._add(StepType.OBSERVATION, observation)

        return "Reached maximum steps without finishing."

    def _add(self, step_type: StepType, content: str, **metadata: Any) -> None:
        self.trace.append(Step(type=step_type, content=content, metadata=metadata))

    async def _think(self, goal: str) -> str:
        """Generate the next thought. Replaced with an LLM call in later parts."""
        raise NotImplementedError

    async def _decide(self, thought: str) -> str:
        """Convert a thought into an action string. Replaced with an LLM call."""
        raise NotImplementedError

    async def _act(self, tool_name: str, args: str) -> str:
        if tool_name not in self.tools:
            return f"Error: unknown tool '{tool_name}'. Available: {list(self.tools)}"
        try:
            return await self.tools[tool_name](args)
        except Exception as exc:
            return f"Tool error: {exc}"

    def print_trace(self) -> None:
        """Print the full reasoning trace for debugging."""
        for i, step in enumerate(self.trace):
            prefix = {
                StepType.THOUGHT: "💭 Thought",
                StepType.ACTION: "⚡ Action",
                StepType.OBSERVATION: "👁 Observation",
                StepType.FINISH: "✅ Finish",
            }[step.type]
            print(f"[{i:02d}] {prefix}: {step.content}")

The print_trace() method is something I use constantly in development. When an agent does something unexpected, the trace shows exactly which thought led to which action. That visibility is why I build ReAct agents explicitly rather than using opaque framework chains.

A Concrete Rule-Based Example

Before wiring in an LLM, here's a rule-based agent that demonstrates the full loop:

# disk_check_agent.py
import asyncio
import re
from react_agent import ReActAgent, StepType


async def shell_tool(command: str) -> str:
    proc = await asyncio.create_subprocess_shell(
        command,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
    )
    stdout, stderr = await proc.communicate()
    if proc.returncode != 0:
        return f"Error: {stderr.decode().strip()}"
    return stdout.decode().strip()


class DiskCheckAgent(ReActAgent):
    def __init__(self) -> None:
        super().__init__(
            name="disk_checker",
            tools={"shell": shell_tool},
        )
        self._disk_result: str | None = None

    async def _think(self, goal: str) -> str:
        if self._disk_result is None:
            return "I haven't checked disk usage yet. I'll run df to get it."
        # Parse the usage percentage from the stored result
        match = re.search(r"(\d+)%", self._disk_result)
        if match:
            pct = int(match.group(1))
            if pct > 80:
                return f"Disk is at {pct}%, which is above 80%. I should warn the user."
            return f"Disk is at {pct}%, which is fine. I'll report it's OK."
        return "Could not parse disk usage. I'll report the raw output."

    async def _decide(self, thought: str) -> str:
        if "haven't checked" in thought:
            return "shell: df -h /"
        if "above 80%" in thought or "which is fine" in thought or "Could not parse" in thought:
            return f"FINISH: {thought}"
        return "FINISH: Unable to determine next action."

    async def _act(self, tool_name: str, args: str) -> str:
        result = await super()._act(tool_name, args)
        if tool_name == "shell":
            self._disk_result = result
        return result


async def main() -> None:
    agent = DiskCheckAgent()
    answer = await agent.run("Check disk usage and warn me if it's above 80%.")
    print(f"\nAnswer: {answer}\n")
    agent.print_trace()


if __name__ == "__main__":
    asyncio.run(main())

Run it:

Answer: Disk is at 23%, which is fine. I'll report it's OK.

[00] 💭 Thought: I need to achieve: Check disk usage and warn me if it's above 80%.
[01] 💭 Thought: I haven't checked disk usage yet. I'll run df to get it.
[02] ⚡ Action: shell: df -h /
[03] 👁 Observation: Filesystem      Size  Used Avail Use% Mounted on
                    /dev/sda1        50G   11G   36G  23% /
[04] 💭 Thought: Disk is at 23%, which is fine. I'll report it's OK.
[05] ✅ Finish: Disk is at 23%, which is fine. I'll report it's OK.

The trace is the whole point. With a real LLM replacing _think() and _decide(), this exact format is what you will log in production.

Why Chain-of-Thought Matters for Tool Selection

When I first connected an LLM to an agent loop, I skipped the thought step and jumped straight to action selection. The model would pick the wrong tool roughly 30% of the time on multi-step tasks.

Adding an explicit thought step before action selection dropped that to under 5% on the same tasks. The reason, as far as I can tell: the thought step forces the model to articulate what state it's in before committing. Without it, tool selection is based entirely on the surface form of the last observation, which is often ambiguous.

An empirical comparison from my own benchmark:

Approach

Correct tool selection (20-task eval)

Direct action (no thought)

73%

ReAct (thought + action)

94%

ReAct + explicit state recap in thought

97%

The "explicit state recap" means prompting the model to start its thought with a brief summary of what it knows so far. Something like: "So far I know X. The last observation told me Y. The next step should be...". I cover this prompting pattern in Parts 3 and 4.

The Stopping Condition Problem

One thing the ReAct paper glosses over: how does the agent know when to stop?

I've seen three failure modes:

Stops too early — finishes on a partial result because it pattern-matched "task complete" prematurely
Loops indefinitely — keeps generating new thoughts and actions without converging
Stops at the wrong level — calls FINISH when it should call another tool to verify its answer

My solution combines three guards:

# In ReActAgent.run():

for step_num in range(self.max_steps):       # guard 1: hard cap
    thought = await self._think(goal)

    # Guard 2: self-verification before finishing
    if "I am confident" in thought or "verified" in thought.lower():
        action_str = await self._decide(thought)
    else:
        # Prompt model to verify before finishing if it's about to FINISH
        action_str = await self._decide(thought)
        if action_str.startswith("FINISH:") and step_num < 2:
            # Too few steps — probably a premature finish
            self._add(StepType.THOUGHT, "Wait — let me verify this before finishing.")
            action_str = await self._decide("Verify the answer before finishing.")

Guard 3 (loop detection) is covered in Part 5 when I discuss evaluation.

Key Takeaways

ReAct = Thought → Action → Observation → repeat; this interleaving makes agents debuggable
The thought step is not cosmetic — it measurably improves tool selection accuracy
Build the skeleton with rule-based logic first; replace _think() and _decide() with LLM calls in Parts 3 and 4
print_trace() is your best debugging tool
Always add a hard max_steps cap — an uncapped agent will loop

Up Next

Part 2: Agent Memory and State — the four types of agent memory, implementing a sliding context window, and adding semantic search so the agent can recall relevant past observations.

PreviousAI Agent Development 101 NextPart 2: Agent Memory and State

Last updated 1 month ago

hashtagThe Paper That Changed How I Think About Agents

hashtagThe ReAct Pattern Explained

hashtagBuilding the ReAct Loop in Pure Python 3

hashtagA Concrete Rule-Based Example

hashtagWhy Chain-of-Thought Matters for Tool Selection

hashtagThe Stopping Condition Problem

hashtagKey Takeaways

hashtagUp Next