Liu ZhuoQi

AI Application Engineer · Agent Systems

Recent

Five Codex Harness Designs Worth Copying After Reading the Source

2026-08-02·Updated: 2026-08-03·2408 words· 12 min

Agent Engineering Codex OpenAI AI Agent Agent Harness Tool Calling Agent Runtime Skills Goal App Server

Think of Codex as a small construction crew. The model is the site lead deciding what should happen next. The agent harness is everything around that lead: dispatch desk, access control, job records, and the progress board. The source is valuable not merely because the lead can issue commands, but because the surrounding system keeps work safe, recoverable, and understandable to the customer. Many agent tutorials reduce the loop to this:

How Agents Remember You: Human Memory Science and a Code Audit of Six Open-Source Systems

2026-07-30·5564 words· 27 min

Deep Dives AI Agent LLM Memory Memory Systems Cognitive Science Open-Source Architecture

Almost every agent project now claims to provide “long-term memory.” For one project, that means embedding chat history. For another, it means maintaining a user profile. A third lets the model edit Markdown files. A fourth builds a bitemporal knowledge graph. All four use the word memory, but they are not the same system and should not be placed on one undifferentiated leaderboard. To decide whether a system genuinely remembers, I would rather ask three questions:

How to Choose an LLM Inference Engine — A 2026 Map from Local Single-GPU to PD Disaggregation

2026-07-19·Updated: 2026-07-30·3716 words· 18 min

Deep Dives LLM Inference Engine VLLM SGLang Model Serving Inference Optimization Selection Guide Research

Aliyun’s CAP has a piece on picking an inference engine that narrows the field to four: Ollama, vLLM, SGLang, and Hugging Face Pipeline. In 2024, that framing was fine. By 2026, it’s missing half the map. NVIDIA’s TensorRT-LLM has completed its “PyTorch-ification,” SGLang became famous as the first open-source project to reproduce DeepSeek’s large-scale deployment, Hugging Face slapped a “maintenance mode” banner on TGI and told you to switch to vLLM — and the real throughline of the entire 2025 inference landscape can be summed up in one word: disaggregate.

OpenClaw in Practice: One File Path Eliminated 84% of Tool Calls — A Cron Job Debugging Story

2026-06-20·Updated: 2026-07-30·1606 words· 8 min

Agent Engineering OpenClaw AI Agent Cron Job SKILL.md Prompt Engineering Performance

OpenClaw’s daily-ai-news cron job kept timing out. The root cause: a missing absolute path in the SKILL.md caused the Agent to spend 15 exec calls searching for a tool every run. Messages 165→54, exec calls 44→7 — one file path beat any algorithm optimization.

OpenClaw Memory in Practice: From 'Vector Search Is Down But Everything Still Works' to Zero-Cost NVIDIA Embeddings

2026-06-20·Updated: 2026-07-30·1713 words· 9 min

Agent Engineering OpenClaw AI Agent Memory System Embedding NVIDIA Vector Search BM25

OpenClaw’s vector retrieval silently failed — but BM25 text search kept the memory system running for two weeks unnoticed. Should you even bother fixing it? Here’s how I used NVIDIA’s free embedding API to complete the picture at zero cost.