Part 2: Agent Memory and State

Part of the AI Agent Development 101 Series

The Memory Problem I Kept Ignoring

My first real agent had a simple memory: a Python list of messages. It worked fine for tasks that took 5–6 steps. Then I gave it a task that took 40 steps — summarising a large codebase file by file — and everything fell apart.

The problem was twofold. First, the context window filled up around step 30 and the model started forgetting what it had already processed. Second, when I asked the agent a follow-up question in a new session, it had no recollection of what it had done before.

Memory is not an afterthought in agent design. It determines what the agent can achieve. This part covers the four memory types I actually use and how to implement them.


The Four Agent Memory Types

I think about agent memory in four layers, borrowed loosely from cognitive science:

Type
Scope
Implementation
Survives restart?

Sensory

Current input only

Function argument

No

Short-term

Current session

List in RAM

No

Long-term

Across sessions

SQLite / file

Yes

Episodic

Semantic recall from past

Vector store

Yes

Most tutorial agents only implement short-term memory. That is fine for demos. Production agents need at least long-term, and often episodic.


Short-Term Memory: The Sliding Context Window

The baseline memory implementation. The challenge is keeping it within the LLM's context window without losing important early context.

The _trim method is the important part. I always pin the system prompt and the original user goal at positions 0 and 1. Everything else slides. Without pinning, a long agent run will trim out the original goal — which causes the model to start working on something subtly different to what was asked.


Long-Term Memory: Persisting Agent State

Long-term memory lets an agent resume a task after a restart, or remember facts about the user/project across different sessions. I use SQLite for this because it requires zero infrastructure.

I use save_fact for things like "preferred language = Python" or "project root = /Users/htunn/code/myapp". These are user-level preferences the agent should remember across all sessions, not just the current one.


Episodic Memory: Semantic Recall with a Local Vector Store

The most powerful memory type. Instead of recalling messages by recency, the agent recalls by relevance. "Did I see anything about disk space before?" retrieves the most relevant past observations even if they happened 50 steps ago.

I use chromadb for local vector storage — it's embeddable, runs in-process, and doesn't need a server.

Here's how I use episodic memory in a real agent run: after every tool observation, I store it. Before deciding on the next action, I recall the top 3 most relevant past observations and inject them into the thought prompt.

This is the pattern I use in my home automation agent. When the agent is diagnosing a problem, it can recall that "three days ago, disk space dropped after a large log rotation" even though that observation is far outside the sliding context window.


Combining All Three Layers

Here's a complete MemorySystem class that wires all three together:

The restore() call at agent startup is what enables session continuity. When I restart my agent after a break, it loads the last 50 messages from SQLite and picks up where it left off — without needing to re-explain the goal.


What I Don't Use (and Why)

Redis for memory: Redis is great for distributed systems, but for a single-agent personal project it's unnecessary infrastructure. SQLite is a file — I can back it up, open it in DB Browser for SQLite, and inspect it when something goes wrong. Redis adds a daemon to manage.

LangChain ConversationBufferMemory: It works, but it's opaque. When the buffer fills, messages are silently dropped according to internal logic I don't control. I'd rather have my own trimming code that I understand completely.

External vector databases (Pinecone, Weaviate) for local projects: Chromadb embedded-mode is sufficient for anything running on a single machine. I only consider external vector DBs when the agent needs to query a shared knowledge base from multiple processes.


Key Takeaways

  • Four memory types: sensory, short-term, long-term, episodic — implement at least short-term and long-term

  • Always pin the system prompt and user goal in the sliding window

  • asyncio + aiosqlite for persistent, zero-infrastructure long-term memory

  • Episodic memory via local Chroma gives relevance-based recall without any API calls

  • A MemorySystem class that combines all three is easier to maintain than separate ad-hoc implementations


Up Next

Part 3: Building an Agent with OpenAI — replacing the stub _think() and _decide() with real OpenAI function calling, connecting the MemorySystem, and handling streaming tool calls.

Last updated