Part 2: Agent Memory and State
Part of the AI Agent Development 101 Series
The Memory Problem I Kept Ignoring
My first real agent had a simple memory: a Python list of messages. It worked fine for tasks that took 5–6 steps. Then I gave it a task that took 40 steps — summarising a large codebase file by file — and everything fell apart.
The problem was twofold. First, the context window filled up around step 30 and the model started forgetting what it had already processed. Second, when I asked the agent a follow-up question in a new session, it had no recollection of what it had done before.
Memory is not an afterthought in agent design. It determines what the agent can achieve. This part covers the four memory types I actually use and how to implement them.
The Four Agent Memory Types
I think about agent memory in four layers, borrowed loosely from cognitive science:
Sensory
Current input only
Function argument
No
Short-term
Current session
List in RAM
No
Long-term
Across sessions
SQLite / file
Yes
Episodic
Semantic recall from past
Vector store
Yes
Most tutorial agents only implement short-term memory. That is fine for demos. Production agents need at least long-term, and often episodic.
Short-Term Memory: The Sliding Context Window
The baseline memory implementation. The challenge is keeping it within the LLM's context window without losing important early context.
The _trim method is the important part. I always pin the system prompt and the original user goal at positions 0 and 1. Everything else slides. Without pinning, a long agent run will trim out the original goal — which causes the model to start working on something subtly different to what was asked.
Long-Term Memory: Persisting Agent State
Long-term memory lets an agent resume a task after a restart, or remember facts about the user/project across different sessions. I use SQLite for this because it requires zero infrastructure.
I use save_fact for things like "preferred language = Python" or "project root = /Users/htunn/code/myapp". These are user-level preferences the agent should remember across all sessions, not just the current one.
Episodic Memory: Semantic Recall with a Local Vector Store
The most powerful memory type. Instead of recalling messages by recency, the agent recalls by relevance. "Did I see anything about disk space before?" retrieves the most relevant past observations even if they happened 50 steps ago.
I use chromadb for local vector storage — it's embeddable, runs in-process, and doesn't need a server.
Here's how I use episodic memory in a real agent run: after every tool observation, I store it. Before deciding on the next action, I recall the top 3 most relevant past observations and inject them into the thought prompt.
This is the pattern I use in my home automation agent. When the agent is diagnosing a problem, it can recall that "three days ago, disk space dropped after a large log rotation" even though that observation is far outside the sliding context window.
Combining All Three Layers
Here's a complete MemorySystem class that wires all three together:
The restore() call at agent startup is what enables session continuity. When I restart my agent after a break, it loads the last 50 messages from SQLite and picks up where it left off — without needing to re-explain the goal.
What I Don't Use (and Why)
Redis for memory: Redis is great for distributed systems, but for a single-agent personal project it's unnecessary infrastructure. SQLite is a file — I can back it up, open it in DB Browser for SQLite, and inspect it when something goes wrong. Redis adds a daemon to manage.
LangChain ConversationBufferMemory: It works, but it's opaque. When the buffer fills, messages are silently dropped according to internal logic I don't control. I'd rather have my own trimming code that I understand completely.
External vector databases (Pinecone, Weaviate) for local projects: Chromadb embedded-mode is sufficient for anything running on a single machine. I only consider external vector DBs when the agent needs to query a shared knowledge base from multiple processes.
Key Takeaways
Four memory types: sensory, short-term, long-term, episodic — implement at least short-term and long-term
Always pin the system prompt and user goal in the sliding window
asyncio+aiosqlitefor persistent, zero-infrastructure long-term memoryEpisodic memory via local Chroma gives relevance-based recall without any API calls
A
MemorySystemclass that combines all three is easier to maintain than separate ad-hoc implementations
Up Next
Part 3: Building an Agent with OpenAI — replacing the stub _think() and _decide() with real OpenAI function calling, connecting the MemorySystem, and handling streaming tool calls.
Last updated