2025-05-20 Papers

1/2

Paper 1

AdaptThink: Reasoning Models Can Learn When to Think

Published: 2025-05-19

Link: http://arxiv.org/pdf/2505.13417

1. 📘 Topic and Domain: The paper focuses on improving the efficiency of large reasoning language models by developing an adaptive thinking mode selection system.

2. 💡 Previous Research and New Ideas: Based on previous research on reasoning models that use chain-of-thought thinking, this paper introduces "NoThinking" mode and proposes a novel approach called AdaptThink that allows models to adaptively choose between thinking and no-thinking modes.

3. ❓ Problem: The paper addresses the inefficiency of current reasoning models that use lengthy thinking processes for all problems, even simple ones that don't require extensive reasoning.

4. 🛠️ Methods: The paper implements AdaptThink, a reinforcement learning algorithm with two key components: a constrained optimization objective to encourage NoThinking while maintaining performance, and an importance sampling strategy to balance thinking modes during training.

5. 📊 Results and Evaluation: AdaptThink reduced average response length by 53% while improving accuracy by 2.4% across three math datasets when tested on DeepSeek-R1-Distill-Qwen-1.5B model, demonstrating both improved efficiency and performance.

AdaptThink: Reasoning Models Can Learn When to Think

1/2

Paper 2

Thinkless: LLM Learns When to Think

Published: 2025-05-19

Link: http://arxiv.org/pdf/2505.13379

1. 📘 Topic and Domain: The paper focuses on developing adaptive reasoning capabilities in Large Language Models (LLMs) to efficiently switch between short-form and long-form reasoning responses.

2. 💡 Previous Research and New Ideas: Based on previous research in chain-of-thought reasoning and hybrid reasoning approaches, the paper introduces a novel Decoupled Group Relative Policy Optimization (DeGRPO) algorithm that learns when to use elaborate reasoning versus concise responses.

3. ❓ Problem: The paper addresses the inefficiency of LLMs using elaborate reasoning for all queries when many problems can be solved with straightforward solutions.

4. 🛠️ Methods: The method employs a two-stage approach: first using distillation for warm-up training, then applying reinforcement learning with DeGRPO to optimize the model's decision-making between short and long-form responses using control tokens.

5. 📊 Results and Evaluation: The approach reduced long-form reasoning usage by 50-90% across various mathematical benchmarks (Minerva Algebra, MATH-500, GSM8K) while maintaining performance, with the model appropriately selecting more complex reasoning for challenging tasks like AIME.

Thinkless: LLM Learns When to Think

1/2

Paper 3

MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision

Published: 2025-05-19

Link: http://arxiv.org/pdf/2505.13427

1. 📘 Topic and Domain: The paper focuses on enhancing multimodal mathematical reasoning in Large Language Models through process reward modeling.

2. 💡 Previous Research and New Ideas: Based on previous work in reward modeling and mathematical reasoning in pure text, the paper proposes a novel framework for generating step-level supervision in multimodal mathematical reasoning without human annotation.

3. ❓ Problem: The paper addresses the challenge of complex multi-step reasoning in multimodal math problems, where models often produce logically inconsistent or partially correct solutions due to lack of fine-grained supervision.

4. 🛠️ Methods: The authors develop MM-PRM using a three-stage approach: training a policy model (MM-Policy), generating process supervision data through Monte Carlo Tree Search, and training a process reward model using soft labels on step-level annotations.

5. 📊 Results and Evaluation: The framework achieved significant improvements across multiple benchmarks, including increasing accuracy on MM-K12 test set from 33.92% to 42.80%, MathVista from 62.93% to 67.60%, and OlympiadBench from 15.41% to 24.00%.