Posts

2026

OpenClaw Memory in Practice: From 'Vector Search Is Down But Everything Still Works' to Zero-Cost NVIDIA Embeddings

2026-06-20·1697 words· 8 min

AI Agent in Practice OpenClaw AI Agent Memory System Embedding NVIDIA Vector Search BM25

OpenClaw’s vector retrieval silently failed — but BM25 text search kept the memory system running for two weeks unnoticed. Should you even bother fixing it? Here’s how I used NVIDIA’s free embedding API to complete the picture at zero cost.

OpenClaw in Practice: One File Path Eliminated 84% of Tool Calls — A Cron Job Debugging Story

2026-06-20·1537 words· 8 min

AI Agent in Practice OpenClaw AI Agent Cron Job SKILL.md Prompt Engineering Performance

OpenClaw’s daily-ai-news cron job kept timing out. The root cause: a missing absolute path in the SKILL.md caused the Agent to spend 15 exec calls searching for a tool every run. Messages 165→54, exec calls 44→7 — one file path beat any algorithm optimization.

Claude's Tool Calling Paradigm Shift: A Deep Dive into Programmatic Tool Calling and Dynamic Filtering

2026-06-13·2548 words· 12 min

AI Agent in Practice Claude AI Agent Agent Architecture Tool Calling Context Engineering Programmatic Tool Calling Dynamic Filtering Code Execution

Background: The Cost Problem in Agent Tool Calling # In traditional agent tool-calling, every tool invocation requires a full cycle of “model inference → tool execution → result return → model re-inference.” This seemingly natural loop breaks down at scale in three ways: Context Pollution: Every tool result is injected verbatim into the context window. Fetch expense reports for 20 employees, and 2,000+ line items enter context — even though you only need to know “which 3 people exceeded their budget.” Inference Overhead: Each tool call demands a full model inference pass. Five tools = five inference passes, each costing hundreds of milliseconds to seconds. Noise Degrades Accuracy: When the context window is packed with intermediate results, the model must find signal in noise. Context Rot research shows LLM performance on complex tasks drops 50-70% as context grows. As Florian Bruniaux puts it in the Claude Code Architecture Guide: “The Outer Loop — everything outside the model: context management, tool invocation, verification, memory consolidation — increasingly determines system quality more than model inference itself.”

OpenClaw in Production: When the Most Advanced Memory System Meets the Quietest Failure

2026-05-27·4035 words· 19 min

AI Agent in Practice OpenClaw AI Agent Feishu Memory System Compaction Debugging

A full-chain production battle log: from startup failures and Feishu message silent drops to production stability — compaction safeguard, five-layer debugging, model-harness fit, and memory system comparison.

Why We Moved from Celery to Temporal for Production Agent Pipelines

2026-05-16·1647 words· 8 min

Agent Architecture Agent Engineering Temporal Celery Workflow Engine Production Backend Python

In April 2026, we migrated seo-project’s task queue from Celery to Temporal. We dropped exactly one dependency (celery), wrote 11 new files (src/infrastructure/temporal/), and renamed our containers from api/worker/beat to api/temporal_worker_blue/green with blue-green deployment. The most common question afterward: why not just keep using Celery? If it’s already running, what’s the point? This article is the answer. It doesn’t come from documentation comparisons. It comes from production bugs we hit running Agent pipelines at scale.

Where Do ChatGPT Business Promo Codes Actually Come From? An OSINT Trace

2026-05-14·1481 words· 7 min

Investigation ChatGPT OpenAI Stripe Promo Codes OSINT Reverse Engineering

In May 2026, the Chinese AI community went wild over a wave of ChatGPT Business discounts: £11/month for 2 seats in the UK, $20 in the US, AU$25 in Australia, locked in for 48 months. Mysterious codes like codestonegb, thealloynetwork, and firstfocus spread across forums and blogs at breakneck speed. One question nobody was asking: where did these codes actually come from? I spent several days cross-referencing sources across five platforms and three languages. The answer is messier—and more interesting—than “they leaked on linux.do.”

RAG vs LLM Wiki vs Plain Text — A Decision Framework for Agent Long-Term Memory

2026-05-11·1234 words· 6 min

Agent Architecture AI Agent Memory RAG Context Engineering

Every Agent builder hits this question eventually: where do I store user data so the agent remembers it next session? Three approaches dominate the landscape: RAG (vector retrieval), LLM Wiki (structured knowledge injection), and plain-text context memory (the CLAUDE.md / Cursor Rules pattern). Each has vocal advocates. But picking wrong is expensive — do RAG too light and it’s a noise generator; do plain text too heavy and it’s a token incinerator.

Why LLMs Have No Memory — A Research Report Covering 67 Primary Sources

2026-05-04·353 words· 2 min

Research AI Agent LLM Memory Research Context Engineering

This is not AI科普. This is a cross-validated research sprint backed by 67 primary sources — vendor docs, arXiv papers, and researcher interviews — on a question every Agent builder hits: why don’t LLMs remember anything? → Full report: 14-product comparison table, 9 engineering takeaways, 3-year paradigm roadmap The One-Liner # Four independent constraints — O(n²) attention + KV cache VRAM + catastrophic forgetting + GDPR right-to-be-forgotten — stacked together leave “stateless” as the only viable engineering solution. Every “Memory” feature you’ve seen (ChatGPT, Claude, Cursor) is structured text injected into the system prompt. Zero weight modification. The next 1–3 years belong to stateless LLM kernels + stateful Agent memory layers.

Embedding CSS Animation Demos in Hugo Articles

2026-05-04·96 words· 1 min

Dev Log Hugo CSS Animation Shortcode

Hugo shortcodes make it easy to embed live code demos. Here are three ways: 1. Inline CSS Demo (No External Service) # A spinning loader animation, right in the article: Pure CSS Spinner A gradient text animation:

Building a Personal Site with Hugo and Dual-Stack CDN

2026-05-04·351 words· 2 min

Dev Log Hugo Alibaba Cloud Cloudflare CDN ICP Filing

Why Hugo # When picking a framework for a personal blog, my top criterion was low maintenance cost — I didn’t want to abandon writing three months later because of npm dependency hell. Hugo is a single binary, requires no Node.js, builds thousands of posts in 1-2 seconds, and the Blowfish theme comes with dark mode, full-text search, multilingual support, RSS, Open Graph, and reading time estimates out of the box. Day-to-day writing only requires touching Markdown files.

↑