1. 📘 Topic and Domain: The paper presents MetaClaw, a continual meta-learning framework for deployed LLM agents that enables them to evolve and adapt in real-world usage through skill synthesis and policy optimization.
2. 💡 Previous Research and New Ideas: The paper builds on memory-based methods (Reflexion), skill-based approaches (Voyager, ExpeL), and RL-based LLM training (RLHF, GRPO), proposing a novel dual-mechanism approach that combines gradient-free skill evolution with opportunistic gradient-based policy optimization while maintaining strict support-query data separation.
3. ❓ Problem: The paper addresses the fundamental tension that deployed LLM agents remain static after training while user needs and task distributions evolve continuously, causing performance degradation without service interruption for retraining.
4. 🛠️ Methods: MetaClaw employs two complementary mechanisms: skill-driven fast adaptation that analyzes failures to synthesize reusable behavioral instructions with zero downtime, and opportunistic policy optimization that performs RL-based LoRA fine-tuning during user-inactive windows detected by monitoring sleep schedules, system inactivity, and calendar events.
5. 📊 Results and Evaluation: On MetaClaw-Bench (934 questions), skill adaptation improved accuracy by up to 32% relative, the full pipeline advanced Kimi-K2.5 from 21.4% to 40.6% accuracy with 8.25× gain in task completion, and on AutoResearchClaw, skill injection alone improved composite robustness by 18.3%.