1. 📘 Topic and Domain: The paper focuses on reinforcement learning for language models, specifically in the domain of agentic reasoning and sparse-reward environments.
2. 💡 Previous Research and New Ideas: The paper builds on standard reinforcement learning with verifiable rewards (RLVR) and proposes Experiential Reinforcement Learning (ERL), which adds an explicit experience-reflection-consolidation loop where models generate self-reflections to guide improved second attempts.
3. ❓ Problem: The paper addresses the challenge of learning from sparse and delayed environmental feedback in reinforcement learning, where models struggle to implicitly infer how failures should translate into behavioral improvements.
4. 🛠️ Methods: ERL employs a four-stage process: initial attempt, self-reflection generation based on feedback, refined second attempt guided by reflection, and internalization through selective distillation to consolidate improvements into the base policy.
5. 📊 Results and Evaluation: ERL outperforms RLVR across all tested environments, achieving gains of up to +81% in Sokoban, +27% in FrozenLake, and +11% in HotpotQA, demonstrating improved learning efficiency and final performance.