1. 📘 Topic and Domain: The paper introduces ARM (Adaptive Reasoning Model), focusing on improving the efficiency of large language models' reasoning capabilities in the domain of natural language processing and artificial intelligence.
2. 💡 Previous Research and New Ideas: Based on previous research on large reasoning models and Group Relative Policy Optimization (GRPO), the paper proposes a new approach that enables models to adaptively select appropriate reasoning formats based on task difficulty, rather than using a uniform reasoning approach.
3. ❓ Problem: The paper aims to solve the "overthinking" problem in large reasoning models, where models apply unnecessarily complex reasoning to all tasks regardless of difficulty, leading to excessive token usage and computational inefficiency.
4. 🛠️ Methods: The paper uses a two-stage training approach: first applying supervised fine-tuning to teach the model four reasoning formats (Direct Answer, Short CoT, Code, and Long CoT), then implementing Ada-GRPO, an adapted version of GRPO with a format diversity reward mechanism.
5. 📊 Results and Evaluation: ARM achieved comparable accuracy while reducing token usage by ~30% on average (up to ~70% in some cases) compared to models using only Long CoT, and demonstrated a ~2× training speedup compared to traditional GRPO while maintaining performance across various reasoning tasks.