Skip to main content
  1. Home/
  2. Posts/

Why LLMs Have No Memory — A Research Report Covering 67 Primary Sources

Liu ZhuoQi
Author
Liu ZhuoQi
Personal blog of AI Agent developer Liu ZhuoQi. Sharing practical notes on AI Agent development, tool engineering, and creative programming.

This is not AI科普. This is a cross-validated research sprint backed by 67 primary sources — vendor docs, arXiv papers, and researcher interviews — on a question every Agent builder hits: why don’t LLMs remember anything?

Full report: 14-product comparison table, 9 engineering takeaways, 3-year paradigm roadmap


The One-Liner
#

Four independent constraints — O(n²) attention + KV cache VRAM + catastrophic forgetting + GDPR right-to-be-forgotten — stacked together leave “stateless” as the only viable engineering solution. Every “Memory” feature you’ve seen (ChatGPT, Claude, Cursor) is structured text injected into the system prompt. Zero weight modification. The next 1–3 years belong to stateless LLM kernels + stateful Agent memory layers.

Why 67 Sources
#

Because every Agent builder runs into the same walls:

  • Why does the AI forget user preferences after 10 turns?
  • Why can’t Prompt Caching replace Memory?
  • Why does every product claim “memory” but none touches model weights?
  • Mem0 vs Zep vs Letta vs LangGraph Store — which one?

The answers exist in Anthropic/OpenAI/Google docs, Karpathy interviews, and arXiv papers — scattered across 67 places. This report connects them.

The Four-Layer Memory Stack
#

Bottom-up:

  • L1 · Bare LLM (frozen weights): Forever stateless. Every inference is a fresh process.
  • L2 · In-Architecture Memory: Titans / Infini-attention / Mamba-2. Highest research value, not yet validated at scale (needs ≥70B / ≥10T tokens).
  • L3 · Ultra-Long Context: Gemini 2M, Magic 100M. Best in-session carrier, but O(n²) ceiling remains.
  • L4 · Agent Memory Layer: External DB + Agent runtime. Most commercially mature. Mem0, Zep, Letta, LangGraph Store.

Full four-layer analysis + 14-product comparison

Top 3 Takeaways for Engineering Teams
#

  1. Never conflate Cache and Memory — Cache skips prefill (saves money); Memory decides prompt content (adds capability). Orthogonal concerns.
  2. Writing memory = writing system prompt — Markdown files (CLAUDE.md, Cursor Rules) are always more controllable, diffable, and version-controlled than “letting the AI remember.”
  3. AI writes, human approves = the steadiest auto-Memory pattern — Cursor 1.2’s mandatory user approval and Devin’s suggestion-only flow are the post-prompt-injection consensus.

Read the full report: Karpathy’s canonical interview, memory economics, 9 engineering takeaways, 3-year paradigm roadmap

Related

Why LLMs Have No Memory — A Cross-Validated Research Report with 67 Primary Sources

·1623 words· 8 min
1. Why LLMs Are Stateless # Four independent constraints — individually manageable, together they leave “stateless” as the only viable engineering solution. This conclusion is cross-validated across 67 primary sources. Architecture: O(n²) Attention # Self-attention scales at O(n²). A single 4096-token sequence needs 2 GB VRAM for KV cache; 32 concurrent sessions hit 64 GB — more than the model weights themselves. Llama 3.1 at 100M context requires 638 H100 GPUs ($5,400/hour) for KV cache alone.

大模型为什么没有记忆——67 条一手资料的交叉验证

这不是一篇"AI 科普"——这是一次用 Exa / Tavily / Context7 / WebSearch 四源交叉验证,覆盖 67 条一手资料 的硬核调研。如果你在给 Agent 系统设计记忆层,或者想搞清楚 ChatGPT Memory / Claude Memory / Cursor Rules 到底是怎么回事,这篇是你要看的东西。 → 完整报告(含 14 产品对比表、9 条工程结论、3 年范式演进地图) 一句话结论 # 所谓「大模型没有记忆」不是疏忽,而是 O(n²) 注意力 + KV Cache 显存 + 灾难性遗忘 + GDPR 合规 四重约束的均衡解。ChatGPT / Claude / Cursor 的 “Memory” 本质都是把结构化文本 塞回 system prompt,模型权重永远不动。未来 1–3 年的主流是 「无状态 LLM 内核 + 有状态 Agent 记忆层」 混合架构。

大模型为什么没有记忆——67 条一手资料的交叉验证调研

·1172 字· 6 分钟
一句话结论 # 所谓「大模型没有记忆」不是疏忽,而是 Transformer O(n²) 注意力 + KV cache 显存 + 权重纠缠(灾难性遗忘)+ GDPR 合规 四重约束的均衡解。ChatGPT / Claude / Cursor 的 “Memory” 本质都是把结构化文本塞回 system prompt,模型权重永远不动。Prompt Caching 只是性能优化,不是记忆。未来 1–3 年的主流是 「无状态 LLM 内核 + 有状态 Agent 记忆层」 混合架构。 计算复杂度 100M ctx 成本 Cache 价格 主流 TTL O(n²) 638×H100 0.1× 5min–24h 1. 为什么 LLM 被设计成无状态 # 四个独立约束叠加,每一个单独都不致命,叠在一起就只剩"无状态"这一种工程解——这个结论来自对 67 条一手资料的交叉验证。