OpenClaw’s vector retrieval silently failed — but BM25 text search kept the memory system running for two weeks unnoticed. Should you even bother fixing it? Here’s how I used NVIDIA’s free embedding API to complete the picture at zero cost.
OpenClaw’s daily-ai-news cron job kept timing out. The root cause: a missing absolute path in the SKILL.md caused the Agent to spend 15 exec calls searching for a tool every run. Messages 165→54, exec calls 44→7 — one file path beat any algorithm optimization.
Background: The Cost Problem in Agent Tool Calling # In traditional agent tool-calling, every tool invocation requires a full cycle of “model inference → tool execution → result return → model re-inference.” This seemingly natural loop breaks down at scale in three ways:
Context Pollution: Every tool result is injected verbatim into the context window. Fetch expense reports for 20 employees, and 2,000+ line items enter context — even though you only need to know “which 3 people exceeded their budget.” Inference Overhead: Each tool call demands a full model inference pass. Five tools = five inference passes, each costing hundreds of milliseconds to seconds. Noise Degrades Accuracy: When the context window is packed with intermediate results, the model must find signal in noise. Context Rot research shows LLM performance on complex tasks drops 50-70% as context grows. As Florian Bruniaux puts it in the Claude Code Architecture Guide: “The Outer Loop — everything outside the model: context management, tool invocation, verification, memory consolidation — increasingly determines system quality more than model inference itself.”
A full-chain production battle log: from startup failures and Feishu message silent drops to production stability — compaction safeguard, five-layer debugging, model-harness fit, and memory system comparison.