# Part 2: Agent Memory and State

*Part of the* [*AI Agent Development 101 Series*](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-agent-development-101)

## The Memory Problem I Kept Ignoring

My first real agent had a simple memory: a Python list of messages. It worked fine for tasks that took 5–6 steps. Then I gave it a task that took 40 steps — summarising a large codebase file by file — and everything fell apart.

The problem was twofold. First, the context window filled up around step 30 and the model started forgetting what it had already processed. Second, when I asked the agent a follow-up question in a new session, it had no recollection of what it had done before.

Memory is not an afterthought in agent design. It determines what the agent can achieve. This part covers the four memory types I actually use and how to implement them.

***

## The Four Agent Memory Types

I think about agent memory in four layers, borrowed loosely from cognitive science:

| Type           | Scope                     | Implementation    | Survives restart? |
| -------------- | ------------------------- | ----------------- | ----------------- |
| **Sensory**    | Current input only        | Function argument | No                |
| **Short-term** | Current session           | List in RAM       | No                |
| **Long-term**  | Across sessions           | SQLite / file     | Yes               |
| **Episodic**   | Semantic recall from past | Vector store      | Yes               |

Most tutorial agents only implement short-term memory. That is fine for demos. Production agents need at least long-term, and often episodic.

***

## Short-Term Memory: The Sliding Context Window

The baseline memory implementation. The challenge is keeping it within the LLM's context window without losing important early context.

```python
# short_term_memory.py
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any


@dataclass
class MemoryEntry:
    role: str       # "user" | "assistant" | "tool" | "system"
    content: str
    metadata: dict[str, Any] = field(default_factory=dict)


class ShortTermMemory:
    """
    A bounded message list that preserves:
    - The system prompt (position 0, always kept)
    - The original user goal (position 1, always kept)
    - The last N messages (sliding window)
    """

    def __init__(self, max_entries: int = 30) -> None:
        self._entries: list[MemoryEntry] = []
        self.max_entries = max_entries

    def add(self, role: str, content: str, **metadata: Any) -> None:
        self._entries.append(MemoryEntry(role=role, content=content, metadata=metadata))
        self._trim()

    def _trim(self) -> None:
        # Always keep the first two entries (system + user goal)
        if len(self._entries) > self.max_entries:
            anchors = self._entries[:2]
            tail = self._entries[-(self.max_entries - 2):]
            self._entries = anchors + tail

    def to_messages(self) -> list[dict[str, str]]:
        """Return the format expected by OpenAI and Anthropic message APIs."""
        return [
            {"role": e.role, "content": e.content}
            for e in self._entries
        ]

    def __len__(self) -> int:
        return len(self._entries)

    def last(self, n: int = 1) -> list[MemoryEntry]:
        return self._entries[-n:]
```

The `_trim` method is the important part. I always pin the system prompt and the original user goal at positions 0 and 1. Everything else slides. Without pinning, a long agent run will trim out the original goal — which causes the model to start working on something subtly different to what was asked.

***

## Long-Term Memory: Persisting Agent State

Long-term memory lets an agent resume a task after a restart, or remember facts about the user/project across different sessions. I use SQLite for this because it requires zero infrastructure.

```python
# long_term_memory.py
from __future__ import annotations
import json
import time
import aiosqlite

DB_PATH = "agent_state.db"


async def init_db(db_path: str = DB_PATH) -> None:
    async with aiosqlite.connect(db_path) as db:
        await db.execute("""
            CREATE TABLE IF NOT EXISTS messages (
                id          INTEGER PRIMARY KEY AUTOINCREMENT,
                session_id  TEXT    NOT NULL,
                role        TEXT    NOT NULL,
                content     TEXT    NOT NULL,
                metadata    TEXT    DEFAULT '{}',
                created_at  REAL    NOT NULL
            )
        """)
        await db.execute("""
            CREATE TABLE IF NOT EXISTS facts (
                id          INTEGER PRIMARY KEY AUTOINCREMENT,
                session_id  TEXT    NOT NULL,
                key         TEXT    NOT NULL,
                value       TEXT    NOT NULL,
                updated_at  REAL    NOT NULL,
                UNIQUE(session_id, key)
            )
        """)
        await db.commit()


async def save_message(
    session_id: str,
    role: str,
    content: str,
    metadata: dict | None = None,
    db_path: str = DB_PATH,
) -> None:
    async with aiosqlite.connect(db_path) as db:
        await db.execute(
            "INSERT INTO messages (session_id, role, content, metadata, created_at) "
            "VALUES (?, ?, ?, ?, ?)",
            (session_id, role, content, json.dumps(metadata or {}), time.time()),
        )
        await db.commit()


async def load_messages(
    session_id: str,
    last_n: int = 50,
    db_path: str = DB_PATH,
) -> list[dict]:
    async with aiosqlite.connect(db_path) as db:
        async with db.execute(
            "SELECT role, content FROM messages WHERE session_id = ? "
            "ORDER BY created_at DESC LIMIT ?",
            (session_id, last_n),
        ) as cursor:
            rows = await cursor.fetchall()
    return [{"role": r[0], "content": r[1]} for r in reversed(rows)]


async def save_fact(
    session_id: str,
    key: str,
    value: str,
    db_path: str = DB_PATH,
) -> None:
    """Upsert a named fact for a session."""
    async with aiosqlite.connect(db_path) as db:
        await db.execute(
            "INSERT INTO facts (session_id, key, value, updated_at) "
            "VALUES (?, ?, ?, ?) "
            "ON CONFLICT(session_id, key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at",
            (session_id, key, value, time.time()),
        )
        await db.commit()


async def load_facts(session_id: str, db_path: str = DB_PATH) -> dict[str, str]:
    async with aiosqlite.connect(db_path) as db:
        async with db.execute(
            "SELECT key, value FROM facts WHERE session_id = ?",
            (session_id,),
        ) as cursor:
            rows = await cursor.fetchall()
    return {r[0]: r[1] for r in rows}
```

I use `save_fact` for things like "preferred language = Python" or "project root = /Users/htunn/code/myapp". These are user-level preferences the agent should remember across all sessions, not just the current one.

***

## Episodic Memory: Semantic Recall with a Local Vector Store

The most powerful memory type. Instead of recalling messages by recency, the agent recalls by relevance. "Did I see anything about disk space before?" retrieves the most relevant past observations even if they happened 50 steps ago.

I use `chromadb` for local vector storage — it's embeddable, runs in-process, and doesn't need a server.

```bash
pip install chromadb sentence-transformers
```

```python
# episodic_memory.py
from __future__ import annotations
import chromadb
from chromadb.utils import embedding_functions

# Use a local sentence transformer — no API call needed
_embed_fn = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)


class EpisodicMemory:
    """
    Stores observations (tool results, thoughts) as embeddings.
    Retrieve the most semantically relevant ones at any point.
    """

    def __init__(self, session_id: str, persist_dir: str = "./.chroma") -> None:
        self._client = chromadb.PersistentClient(path=persist_dir)
        self._collection = self._client.get_or_create_collection(
            name=f"agent_{session_id}",
            embedding_function=_embed_fn,
        )
        self._counter = 0

    def store(self, text: str, metadata: dict | None = None) -> None:
        self._counter += 1
        self._collection.add(
            ids=[f"mem_{self._counter}"],
            documents=[text],
            metadatas=[metadata or {}],
        )

    def recall(self, query: str, top_k: int = 3) -> list[str]:
        """Return the top_k most semantically similar stored memories."""
        if self._collection.count() == 0:
            return []
        results = self._collection.query(
            query_texts=[query],
            n_results=min(top_k, self._collection.count()),
        )
        return results["documents"][0]
```

Here's how I use episodic memory in a real agent run: after every tool observation, I store it. Before deciding on the next action, I recall the top 3 most relevant past observations and inject them into the thought prompt.

```python
# In the ReAct loop, before calling _think():
relevant_memories = episodic.recall(query=last_observation, top_k=3)
if relevant_memories:
    memory_context = "\n".join(f"- {m}" for m in relevant_memories)
    thought_prefix = f"Relevant past observations:\n{memory_context}\n\nCurrent situation:"
else:
    thought_prefix = "Current situation:"
```

This is the pattern I use in my home automation agent. When the agent is diagnosing a problem, it can recall that "three days ago, disk space dropped after a large log rotation" even though that observation is far outside the sliding context window.

***

## Combining All Three Layers

Here's a complete `MemorySystem` class that wires all three together:

```python
# memory_system.py
from __future__ import annotations
import asyncio
from short_term_memory import ShortTermMemory, MemoryEntry
from long_term_memory import init_db, save_message, load_messages, save_fact, load_facts
from episodic_memory import EpisodicMemory


class MemorySystem:
    def __init__(self, session_id: str, max_short_term: int = 30) -> None:
        self.session_id = session_id
        self.short_term = ShortTermMemory(max_entries=max_short_term)
        self.episodic = EpisodicMemory(session_id=session_id)
        self._facts: dict[str, str] = {}

    async def restore(self) -> None:
        """Load long-term state from SQLite on agent startup."""
        await init_db()
        messages = await load_messages(self.session_id)
        for msg in messages:
            self.short_term.add(msg["role"], msg["content"])
        self._facts = await load_facts(self.session_id)

    async def add(
        self,
        role: str,
        content: str,
        persist: bool = True,
        episodic: bool = False,
    ) -> None:
        self.short_term.add(role, content)
        if persist:
            await save_message(self.session_id, role, content)
        if episodic:
            self.episodic.store(content, metadata={"role": role})

    async def remember_fact(self, key: str, value: str) -> None:
        self._facts[key] = value
        await save_fact(self.session_id, key, value)

    def recall_relevant(self, query: str, top_k: int = 3) -> list[str]:
        return self.episodic.recall(query, top_k=top_k)

    def get_fact(self, key: str, default: str = "") -> str:
        return self._facts.get(key, default)

    def to_messages(self) -> list[dict[str, str]]:
        return self.short_term.to_messages()
```

The `restore()` call at agent startup is what enables session continuity. When I restart my agent after a break, it loads the last 50 messages from SQLite and picks up where it left off — without needing to re-explain the goal.

***

## What I Don't Use (and Why)

**Redis for memory:** Redis is great for distributed systems, but for a single-agent personal project it's unnecessary infrastructure. SQLite is a file — I can back it up, open it in DB Browser for SQLite, and inspect it when something goes wrong. Redis adds a daemon to manage.

**LangChain `ConversationBufferMemory`:** It works, but it's opaque. When the buffer fills, messages are silently dropped according to internal logic I don't control. I'd rather have my own trimming code that I understand completely.

**External vector databases (Pinecone, Weaviate) for local projects:** Chromadb embedded-mode is sufficient for anything running on a single machine. I only consider external vector DBs when the agent needs to query a shared knowledge base from multiple processes.

***

## Key Takeaways

* Four memory types: sensory, short-term, long-term, episodic — implement at least short-term and long-term
* Always pin the system prompt and user goal in the sliding window
* `asyncio` + `aiosqlite` for persistent, zero-infrastructure long-term memory
* Episodic memory via local Chroma gives relevance-based recall without any API calls
* A `MemorySystem` class that combines all three is easier to maintain than separate ad-hoc implementations

***

## Up Next

[Part 3: Building an Agent with OpenAI](https://blog.htunnthuthu.com/ai-and-machine-learning/artificial-intelligence/ai-agent-development-101/part-3-openai-agent) — replacing the stub `_think()` and `_decide()` with real OpenAI function calling, connecting the `MemorySystem`, and handling streaming tool calls.