Agent Memory Architectures: Vector DBs vs Knowledge Graphs vs Long-Term Store
Reviewed: June 4, 2026
Memory is what separates a chatbot from an agent. Without memory, every interaction starts from scratch — no continuity, no personalization, no accumulated knowledge. As agents tackle longer and more complex tasks, memory architecture becomes the most consequential design decision. This post breaks down the three dominant approaches and shows you when to use each.
The Three Memory Tiers
Agent memory operates across three time horizons:
- Working memory (in-context): What the agent can see right now — the current conversation, retrieved documents, and tool outputs. Limited by the LLM context window.
- Short-term store (session-scoped): Information accumulated during a session — decisions made, plans formulated, intermediate results. Lost when the session ends unless explicitly persisted.
- Long-term store (persistent): Knowledge that survives across sessions — user preferences, domain facts, past interactions, learned skills. This is where architecture choices matter most.
Architecture 1: Vector Database Retrieval
The most common long-term memory architecture. Conversations, documents, and facts are embedded as vectors and stored in a vector database. At query time, the agent retrieves semantically similar memories.
class VectorMemoryStore:
def __init__(self):
self.embedder = OpenAIEmbeddings()
self.db = ChromaDB(persistent_path="./agent_memory")
def remember(self, text, metadata=None):
vector = self.embedder.embed(text)
self.db.add(vector, text, metadata or {})
def recall(self, query, top_k=5):
query_vec = self.embedder.embed(query)
return self.db.search(query_vec, k=top_k)
def reflect(self, last_n=20):
"""Consolidate recent memories to reduce redundancy"""
recent = self.db.get_last(n=last_n)
summary = llm.summarize(recent)
self.remember(summary, {"type": "consolidated"})
Strengths:
- Simple to implement, excellent semantic matching
- Scales to millions of memories
- Mature ecosystem (Pinecone, Weaviate, Chroma, Qdrant)
Weaknesses:
- No understanding of relationships between memories
- Retrieval quality degrades with ambiguous queries
- No temporal reasoning — „what changed last week?“ is hard
- Redundant storage of related facts
Architecture 2: Knowledge Graphs
Store memories as entities and relationships. The agent can traverse the graph to find indirect connections, reason about relationships, and maintain ontological structure.
class KnowledgeGraphMemory:
def remember(self, facts):
"""facts = [{'subject': 'Alice', 'predicate': 'works_at', 'object': 'Google'}, ...]"""
for fact in facts:
self.graph.add_triple(fact['subject'], fact['predicate'], fact['object'])
def recall(self, entity, depth=2):
"""Find everything connected to an entity within N hops"""
return self.graph.traverse(entity, max_depth=depth)
def infer(self, query):
"""Apply graph reasoning rules"""
return self.reasoner.apply_rules(query, self.graph)
Strengths:
- Rich relationship modeling — „who does Alice report to?“
- Inferencing over transitive relationships
- Explainable reasoning paths
- Efficient storage of known facts (no duplication)
Weaknesses:
- Expensive to extract structured triples from unstructured text
- Difficult to maintain graph consistency at scale
- SPARQL/cypher queries less flexible than semantic search
- Slower retrieval for large graphs without good indexing
Architecture 3: Hybrid Memory Systems
Production agents increasingly use a hybrid approach — vector search for fuzzy retrieval, knowledge graphs for structured reasoning, and a lightweight key-value store for fast lookups.
class HybridAgentMemory:
def __init__(self):
self.vector_store = VectorMemoryStore() # Semantic recall
self.kg = KnowledgeGraphMemory() # Relationship reasoning
self.kv = KeyValueStore() # Fast lookups (user prefs, state)
def remember(self, text, structured_facts=None):
self.vector_store.remember(text)
if structured_facts:
self.kg.remember(structured_facts)
def recall(self, query):
# Parallel retrieval across all stores
semantic_results = self.vector_store.recall(query)
graph_results = self.kg.recall(query)
kv_results = self.kv.get(query)
# Merge and rank
return self.fusion_ranker.merge(semantic_results, graph_results, kv_results)
Architecture 4: Memory Consolidation & Forgetting
The most overlooked aspect: agents need to consolidate and forget, just like humans.
Consolidation patterns:
- Summarization: Compress multiple related memories into a single summary
- Abstraction: Extract general principles from specific instances
- Clustering: Group related memories and store the centroid
Forgetting patterns:
- Time-based decay: Reduce retrieval score for old memories
- Usage-based: Promote frequently accessed memories, demote unused ones
- Relevance pruning: Remove memories that are never retrieved
Production Considerations
| Concern | Vector DB | Knowledge Graph | Hybrid |
|---|---|---|---|
| Setup complexity | Low | High | Very High |
| Retrieval speed | Fast (ms) | Variable | Moderate |
| Scale (millions of facts) | Excellent | Moderate | Good |
| Relational reasoning | None | Excellent | Good |
| Fuzzy/semantic search | Excellent | Poor | Excellent |
| Explainability | Low | High | Moderate |
Recommendations
- Simple agent, basic memory: Vector DB (Chroma or Qdrant)
- Knowledge-intensive, relational domain: Knowledge Graph (Neo4j or Amazon Neptune)
- Production agent with diverse memory needs: Hybrid (Vector DB + KG + KV)
- Budget-constrained: Start with vector DB, add KG only when relational queries become critical
What’s Next
The frontier in 2027 is adaptive memory systems — agents that decide for themselves what to remember, what to consolidate, and what to forget. Early research (MemGPT, Generative Agents, Reflexion) points toward agents with increasingly human-like memory management. The teams that get memory right will build agents that genuinely improve over time — not just within a session, but across weeks and months of interaction.
Part of the Evergreen AI Guides collection.
