I had been building LLM pipelines for months before I read the ReAct paper. My agents were essentially prompt chains: question goes in, answer comes out, maybe with a tool call in the middle if I was lucky. They were brittle. Changing the prompt slightly would break the tool selection. Adding a new tool required rewriting the whole system prompt.
Then I read "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). The insight is obvious in retrospect: let the model write its reasoning out loud before it acts. A thought step first, then an action, then observe the result, then another thought. This interleaving of reasoning and acting is what separates a capable agent from a fancy prompt chain.
I rebuilt my personal home automation agent using this pattern and the improvement was immediate β not just in reliability, but in debuggability. Now I can read the trace and understand exactly why the agent made a decision.
This part builds the ReAct loop in pure Python 3 with no LLM so you can see the structure cleanly before adding AI complexity.
The ReAct Pattern Explained
A ReAct agent loop looks like this:
User: "What's the disk usage on my server and is it above 80%?"
Thought: I need to check disk usage. I'll use the shell tool to run df.
Action: shell("df -h /")
Observation: /dev/sda1 50G 38G 9G 81% /
Thought: Disk usage is 81%, which is above 80%. I should report this clearly.
Action: FINISH("Disk usage is 81% β above the 80% threshold. Consider clearing logs or extending storage.")
Three things happen in each cycle:
Thought β the agent reasons about what it knows and what it needs to do next
Action β the agent calls a tool or decides to finish
Observation β the tool result is fed back into the context
The key insight: the thought step is not cosmetic. It gives the model a "scratchpad" to work through the problem before committing to an action. Agents without explicit reasoning tend to pick actions impulsively based on surface pattern matching.
Building the ReAct Loop in Pure Python 3
No LLM yet. I'll use a rule-based _think() so the structure is visible without noise.
The print_trace() method is something I use constantly in development. When an agent does something unexpected, the trace shows exactly which thought led to which action. That visibility is why I build ReAct agents explicitly rather than using opaque framework chains.
A Concrete Rule-Based Example
Before wiring in an LLM, here's a rule-based agent that demonstrates the full loop:
Run it:
The trace is the whole point. With a real LLM replacing _think() and _decide(), this exact format is what you will log in production.
Why Chain-of-Thought Matters for Tool Selection
When I first connected an LLM to an agent loop, I skipped the thought step and jumped straight to action selection. The model would pick the wrong tool roughly 30% of the time on multi-step tasks.
Adding an explicit thought step before action selection dropped that to under 5% on the same tasks. The reason, as far as I can tell: the thought step forces the model to articulate what state it's in before committing. Without it, tool selection is based entirely on the surface form of the last observation, which is often ambiguous.
An empirical comparison from my own benchmark:
Approach
Correct tool selection (20-task eval)
Direct action (no thought)
73%
ReAct (thought + action)
94%
ReAct + explicit state recap in thought
97%
The "explicit state recap" means prompting the model to start its thought with a brief summary of what it knows so far. Something like: "So far I know X. The last observation told me Y. The next step should be...". I cover this prompting pattern in Parts 3 and 4.
The Stopping Condition Problem
One thing the ReAct paper glosses over: how does the agent know when to stop?
I've seen three failure modes:
Stops too early β finishes on a partial result because it pattern-matched "task complete" prematurely
Loops indefinitely β keeps generating new thoughts and actions without converging
Stops at the wrong level β calls FINISH when it should call another tool to verify its answer
My solution combines three guards:
Guard 3 (loop detection) is covered in Part 5 when I discuss evaluation.
Key Takeaways
ReAct = Thought β Action β Observation β repeat; this interleaving makes agents debuggable
The thought step is not cosmetic β it measurably improves tool selection accuracy
Build the skeleton with rule-based logic first; replace _think() and _decide() with LLM calls in Parts 3 and 4
print_trace() is your best debugging tool
Always add a hard max_steps cap β an uncapped agent will loop
Up Next
Part 2: Agent Memory and State β the four types of agent memory, implementing a sliding context window, and adding semantic search so the agent can recall relevant past observations.
# react_agent.py
from __future__ import annotations
import asyncio
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Callable, Awaitable
class StepType(str, Enum):
THOUGHT = "thought"
ACTION = "action"
OBSERVATION = "observation"
FINISH = "finish"
@dataclass
class Step:
type: StepType
content: str
metadata: dict[str, Any] = field(default_factory=dict)
@dataclass
class ReActAgent:
name: str
tools: dict[str, Callable[..., Awaitable[str]]] = field(default_factory=dict)
trace: list[Step] = field(default_factory=list)
max_steps: int = 20
async def run(self, goal: str) -> str:
"""
Run the ReAct loop until the agent calls FINISH or hits max_steps.
Returns the final answer.
"""
self.trace.clear()
self._add(StepType.THOUGHT, f"I need to achieve: {goal}")
for step_num in range(self.max_steps):
thought = await self._think(goal)
self._add(StepType.THOUGHT, thought)
action_str = await self._decide(thought)
if action_str.startswith("FINISH:"):
answer = action_str[7:].strip()
self._add(StepType.FINISH, answer)
return answer
# Parse "TOOL_NAME: <args>"
if ":" in action_str:
tool_name, args = action_str.split(":", 1)
tool_name = tool_name.strip()
args = args.strip()
else:
tool_name, args = action_str.strip(), ""
self._add(StepType.ACTION, action_str)
observation = await self._act(tool_name, args)
self._add(StepType.OBSERVATION, observation)
return "Reached maximum steps without finishing."
def _add(self, step_type: StepType, content: str, **metadata: Any) -> None:
self.trace.append(Step(type=step_type, content=content, metadata=metadata))
async def _think(self, goal: str) -> str:
"""Generate the next thought. Replaced with an LLM call in later parts."""
raise NotImplementedError
async def _decide(self, thought: str) -> str:
"""Convert a thought into an action string. Replaced with an LLM call."""
raise NotImplementedError
async def _act(self, tool_name: str, args: str) -> str:
if tool_name not in self.tools:
return f"Error: unknown tool '{tool_name}'. Available: {list(self.tools)}"
try:
return await self.tools[tool_name](args)
except Exception as exc:
return f"Tool error: {exc}"
def print_trace(self) -> None:
"""Print the full reasoning trace for debugging."""
for i, step in enumerate(self.trace):
prefix = {
StepType.THOUGHT: "π Thought",
StepType.ACTION: "β‘ Action",
StepType.OBSERVATION: "π Observation",
StepType.FINISH: "β Finish",
}[step.type]
print(f"[{i:02d}] {prefix}: {step.content}")
# disk_check_agent.py
import asyncio
import re
from react_agent import ReActAgent, StepType
async def shell_tool(command: str) -> str:
proc = await asyncio.create_subprocess_shell(
command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await proc.communicate()
if proc.returncode != 0:
return f"Error: {stderr.decode().strip()}"
return stdout.decode().strip()
class DiskCheckAgent(ReActAgent):
def __init__(self) -> None:
super().__init__(
name="disk_checker",
tools={"shell": shell_tool},
)
self._disk_result: str | None = None
async def _think(self, goal: str) -> str:
if self._disk_result is None:
return "I haven't checked disk usage yet. I'll run df to get it."
# Parse the usage percentage from the stored result
match = re.search(r"(\d+)%", self._disk_result)
if match:
pct = int(match.group(1))
if pct > 80:
return f"Disk is at {pct}%, which is above 80%. I should warn the user."
return f"Disk is at {pct}%, which is fine. I'll report it's OK."
return "Could not parse disk usage. I'll report the raw output."
async def _decide(self, thought: str) -> str:
if "haven't checked" in thought:
return "shell: df -h /"
if "above 80%" in thought or "which is fine" in thought or "Could not parse" in thought:
return f"FINISH: {thought}"
return "FINISH: Unable to determine next action."
async def _act(self, tool_name: str, args: str) -> str:
result = await super()._act(tool_name, args)
if tool_name == "shell":
self._disk_result = result
return result
async def main() -> None:
agent = DiskCheckAgent()
answer = await agent.run("Check disk usage and warn me if it's above 80%.")
print(f"\nAnswer: {answer}\n")
agent.print_trace()
if __name__ == "__main__":
asyncio.run(main())
Answer: Disk is at 23%, which is fine. I'll report it's OK.
[00] π Thought: I need to achieve: Check disk usage and warn me if it's above 80%.
[01] π Thought: I haven't checked disk usage yet. I'll run df to get it.
[02] β‘ Action: shell: df -h /
[03] π Observation: Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 11G 36G 23% /
[04] π Thought: Disk is at 23%, which is fine. I'll report it's OK.
[05] β Finish: Disk is at 23%, which is fine. I'll report it's OK.
# In ReActAgent.run():
for step_num in range(self.max_steps): # guard 1: hard cap
thought = await self._think(goal)
# Guard 2: self-verification before finishing
if "I am confident" in thought or "verified" in thought.lower():
action_str = await self._decide(thought)
else:
# Prompt model to verify before finishing if it's about to FINISH
action_str = await self._decide(thought)
if action_str.startswith("FINISH:") and step_num < 2:
# Too few steps β probably a premature finish
self._add(StepType.THOUGHT, "Wait β let me verify this before finishing.")
action_str = await self._decide("Verify the answer before finishing.")