1. 📘 Topic and Domain: The paper examines whether reinforcement learning (RL) actually creates new reasoning capabilities in large language models (LLMs) beyond what exists in base models, focusing on mathematical, programming, and visual reasoning tasks.
2. 💡 Previous Research and New Ideas: The paper builds on previous research in Reinforcement Learning with Verifiable Rewards (RLVR) but challenges the common belief that RLVR enables LLMs to develop novel reasoning abilities beyond their base models.
3. ❓ Problem: The paper aims to determine whether RLVR training genuinely introduces new reasoning capabilities to LLMs or merely optimizes existing capabilities from the base model.
4. 🛠️ Methods: The authors used the pass@k metric with large k values across multiple model families and benchmarks to measure the reasoning capability boundaries of both base and RL-trained models, combined with perplexity analysis.
5. 📊 Results and Evaluation: The results showed that while RL-trained models outperform base models at small k values, base models achieve higher pass@k scores at large k values, indicating that RLVR improves sampling efficiency but does not introduce new reasoning abilities beyond what already exists in the base models.