2025-06-18 Papers

1/2

Paper 1

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Published: 2025-06-17

Link: http://arxiv.org/pdf/2506.14429

1. 📘 Topic and Domain: Analysis of long-context capabilities in diffusion-based Large Language Models (LLMs) in the field of Natural Language Processing.

2. 💡 Previous Research and New Ideas: Based on research in auto-regressive LLMs and RoPE scaling theory, proposes novel insights into diffusion LLMs' unique behavior with long contexts.

3. ❓ Problem: Addresses the unexplored area of how diffusion LLMs handle long context windows and whether they can be extended beyond their pretrained context lengths.

4. 🛠️ Methods: Conducts systematic comparison between diffusion and auto-regressive LLMs using Needle-In-A-Haystack tests, analyzes through RoPE theory, and proposes LongLLaDA method with NTK-based RoPE extrapolation.

5. 📊 Results and Evaluation: Achieved 6x context expansion (24k tokens) without further training, demonstrated diffusion LLMs maintain stable perplexity during extrapolation, match auto-regressive models in retrieval tasks but lag in aggregation while excelling at QA tasks.

LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

1/2

Paper 2

Reasoning with Exploration: An Entropy Perspective

Published: 2025-06-17

Link: http://arxiv.org/pdf/2506.14758

1. 📘 Topic and Domain: The paper focuses on improving language model reasoning capabilities through entropy-based reinforcement learning in the domain of natural language processing.

2. 💡 Previous Research and New Ideas: Building on traditional reinforcement learning exploration methods, the paper proposes a novel approach of using entropy as a signal to encourage exploratory reasoning behaviors in language models.

3. ❓ Problem: The paper addresses the issue of language models becoming overly exploitative during reinforcement learning training, leading to performance plateaus and limited reasoning capabilities.

4. 🛠️ Methods: The authors introduce a minimal modification to standard reinforcement learning by augmenting the advantage function with a clipped, gradient-detached entropy term that promotes longer reasoning chains while preserving original optimization.

5. 📊 Results and Evaluation: The method achieved significant improvements on Pass@K metrics across multiple mathematical reasoning benchmarks, even with large K values, demonstrating enhanced reasoning capabilities compared to baseline models.

Reasoning with Exploration: An Entropy Perspective

1/2

Paper 3

Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs

Published: 2025-06-17

Link: http://arxiv.org/pdf/2506.14245

1. 📘 Topic and Domain: The paper explores Reinforcement Learning with Verifiable Rewards (RLVR) in Large Language Models (LLMs), focusing on improving reasoning capabilities.

2. 💡 Previous Research and New Ideas: Based on previous research showing RLVR-tuned models underperforming base models on Pass@K metrics, the paper proposes a new perspective that RLVR actually incentivizes correct reasoning rather than just finding correct answers.

3. ❓ Problem: The paper aims to resolve the contradiction of why RLVR-tuned models show worse Pass@K performance than base models despite supposedly improving reasoning capabilities.

4. 🛠️ Methods: The authors introduce a new metric called CoT-Pass@K that evaluates both reasoning path and final answer correctness, develop theoretical frameworks explaining RLVR's optimization process, and conduct empirical validation using LLM verifiers.

5. 📊 Results and Evaluation: Results show that RLVR consistently improves CoT-Pass@K across all K values, indicating genuine enhancement of reasoning capabilities, and analysis of training dynamics reveals this improvement emerges early in training and generalizes well.