Part 1: Agent Foundations and the ReAct Loop

Part of the AI Agent Development 101 Series

The Paper That Changed How I Think About Agents

I had been building LLM pipelines for months before I read the ReAct paper. My agents were essentially prompt chains: question goes in, answer comes out, maybe with a tool call in the middle if I was lucky. They were brittle. Changing the prompt slightly would break the tool selection. Adding a new tool required rewriting the whole system prompt.

Then I read "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). The insight is obvious in retrospect: let the model write its reasoning out loud before it acts. A thought step first, then an action, then observe the result, then another thought. This interleaving of reasoning and acting is what separates a capable agent from a fancy prompt chain.

I rebuilt my personal home automation agent using this pattern and the improvement was immediate β€” not just in reliability, but in debuggability. Now I can read the trace and understand exactly why the agent made a decision.

This part builds the ReAct loop in pure Python 3 with no LLM so you can see the structure cleanly before adding AI complexity.


The ReAct Pattern Explained

A ReAct agent loop looks like this:

User: "What's the disk usage on my server and is it above 80%?"

Thought: I need to check disk usage. I'll use the shell tool to run df.
Action: shell("df -h /")
Observation: /dev/sda1  50G  38G  9G  81% /

Thought: Disk usage is 81%, which is above 80%. I should report this clearly.
Action: FINISH("Disk usage is 81% β€” above the 80% threshold. Consider clearing logs or extending storage.")

Three things happen in each cycle:

  1. Thought β€” the agent reasons about what it knows and what it needs to do next

  2. Action β€” the agent calls a tool or decides to finish

  3. Observation β€” the tool result is fed back into the context

The key insight: the thought step is not cosmetic. It gives the model a "scratchpad" to work through the problem before committing to an action. Agents without explicit reasoning tend to pick actions impulsively based on surface pattern matching.


Building the ReAct Loop in Pure Python 3

No LLM yet. I'll use a rule-based _think() so the structure is visible without noise.

The print_trace() method is something I use constantly in development. When an agent does something unexpected, the trace shows exactly which thought led to which action. That visibility is why I build ReAct agents explicitly rather than using opaque framework chains.


A Concrete Rule-Based Example

Before wiring in an LLM, here's a rule-based agent that demonstrates the full loop:

Run it:

The trace is the whole point. With a real LLM replacing _think() and _decide(), this exact format is what you will log in production.


Why Chain-of-Thought Matters for Tool Selection

When I first connected an LLM to an agent loop, I skipped the thought step and jumped straight to action selection. The model would pick the wrong tool roughly 30% of the time on multi-step tasks.

Adding an explicit thought step before action selection dropped that to under 5% on the same tasks. The reason, as far as I can tell: the thought step forces the model to articulate what state it's in before committing. Without it, tool selection is based entirely on the surface form of the last observation, which is often ambiguous.

An empirical comparison from my own benchmark:

Approach
Correct tool selection (20-task eval)

Direct action (no thought)

73%

ReAct (thought + action)

94%

ReAct + explicit state recap in thought

97%

The "explicit state recap" means prompting the model to start its thought with a brief summary of what it knows so far. Something like: "So far I know X. The last observation told me Y. The next step should be...". I cover this prompting pattern in Parts 3 and 4.


The Stopping Condition Problem

One thing the ReAct paper glosses over: how does the agent know when to stop?

I've seen three failure modes:

  1. Stops too early β€” finishes on a partial result because it pattern-matched "task complete" prematurely

  2. Loops indefinitely β€” keeps generating new thoughts and actions without converging

  3. Stops at the wrong level β€” calls FINISH when it should call another tool to verify its answer

My solution combines three guards:

Guard 3 (loop detection) is covered in Part 5 when I discuss evaluation.


Key Takeaways

  • ReAct = Thought β†’ Action β†’ Observation β†’ repeat; this interleaving makes agents debuggable

  • The thought step is not cosmetic β€” it measurably improves tool selection accuracy

  • Build the skeleton with rule-based logic first; replace _think() and _decide() with LLM calls in Parts 3 and 4

  • print_trace() is your best debugging tool

  • Always add a hard max_steps cap β€” an uncapped agent will loop


Up Next

Part 2: Agent Memory and State β€” the four types of agent memory, implementing a sliding context window, and adding semantic search so the agent can recall relevant past observations.

Last updated