Why LLMs Have No Memory — A Research Report Covering 67 Primary Sources

This is not AI科普. This is a cross-validated research sprint backed by 67 primary sources — vendor docs, arXiv papers, and researcher interviews — on a question every Agent builder hits: why don’t LLMs remember anything?

→ Full report: 14-product comparison table, 9 engineering takeaways, 3-year paradigm roadmap

The One-Liner
#

Four independent constraints — O(n²) attention + KV cache VRAM + catastrophic forgetting + GDPR right-to-be-forgotten — stacked together leave “stateless” as the only viable engineering solution. Every “Memory” feature you’ve seen (ChatGPT, Claude, Cursor) is structured text injected into the system prompt. Zero weight modification. The next 1–3 years belong to stateless LLM kernels + stateful Agent memory layers.

Why 67 Sources
#

Because every Agent builder runs into the same walls:

Why does the AI forget user preferences after 10 turns?
Why can’t Prompt Caching replace Memory?
Why does every product claim “memory” but none touches model weights?
Mem0 vs Zep vs Letta vs LangGraph Store — which one?

The answers exist in Anthropic/OpenAI/Google docs, Karpathy interviews, and arXiv papers — scattered across 67 places. This report connects them.

The Four-Layer Memory Stack
#

Bottom-up:

L1 · Bare LLM (frozen weights): Forever stateless. Every inference is a fresh process.
L2 · In-Architecture Memory: Titans / Infini-attention / Mamba-2. Highest research value, not yet validated at scale (needs ≥70B / ≥10T tokens).
L3 · Ultra-Long Context: Gemini 2M, Magic 100M. Best in-session carrier, but O(n²) ceiling remains.
L4 · Agent Memory Layer: External DB + Agent runtime. Most commercially mature. Mem0, Zep, Letta, LangGraph Store.

→ Full four-layer analysis + 14-product comparison

Top 3 Takeaways for Engineering Teams
#

Never conflate Cache and Memory — Cache skips prefill (saves money); Memory decides prompt content (adds capability). Orthogonal concerns.
Writing memory = writing system prompt — Markdown files (CLAUDE.md, Cursor Rules) are always more controllable, diffable, and version-controlled than “letting the AI remember.”
AI writes, human approves = the steadiest auto-Memory pattern — Cursor 1.2’s mandatory user approval and Devin’s suggestion-only flow are the post-prompt-injection consensus.

→ Read the full report: Karpathy’s canonical interview, memory economics, 9 engineering takeaways, 3-year paradigm roadmap

The One-Liner#

Why 67 Sources#

The Four-Layer Memory Stack#

Top 3 Takeaways for Engineering Teams#

Related

The One-Liner
#

Why 67 Sources
#

The Four-Layer Memory Stack
#

Top 3 Takeaways for Engineering Teams
#