2025-05-26 Papers

1/2

Paper 1

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Published: 2025-05-23

Link: http://arxiv.org/pdf/2505.18129

1. 📘 Topic and Domain: The paper presents V-Triune, a unified reinforcement learning system for vision-language models that combines both visual reasoning and perception tasks.

2. 💡 Previous Research and New Ideas: Prior research focused separately on either reasoning tasks (math, science) or perception tasks (detection, grounding), while this paper proposes a novel unified approach combining both through a triple-component system and dynamic IoU reward mechanism.

3. ❓ Problem: The paper addresses the challenge of training vision-language models to perform both reasoning and perception tasks effectively within a single unified framework, as previous approaches treated these tasks in isolation.

4. 🛠️ Methods: The paper implements a three-tier system: Sample-Level Data Formatting (for unified task inputs), Verifier-Level Reward Computation (for custom rewards), and Source-Level Metric Monitoring (for diagnostics), along with a Dynamic IoU reward for perception tasks.

5. 📊 Results and Evaluation: The resulting Orsta models achieved significant improvements on MEGA-Bench Core benchmark, with gains ranging from +2.1% to +14.1% across different model variants (7B and 32B), while also showing strong performance on downstream tasks like MMMU, MathVista, and COCO.

One RL to See Them All: Visual Triple Unified Reinforcement Learning

1/2

Paper 2

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Published: 2025-05-23

Link: http://arxiv.org/pdf/2505.17667

1. 📘 Topic and Domain: The paper focuses on developing long-context large reasoning models through reinforcement learning, specifically in the domain of natural language processing and artificial intelligence.

2. 💡 Previous Research and New Ideas: The paper builds on recent large reasoning models (LRMs) that demonstrate strong reasoning capabilities through RL in short-context tasks, and proposes a novel framework called QWEN LONG-L1 to extend these capabilities to long-context scenarios.

3. ❓ Problem: The paper addresses the challenge of extending large reasoning models to effectively process and reason on long-context inputs (e.g., 120K tokens) via reinforcement learning, tackling issues of suboptimal training efficiency and unstable optimization.

4. 🛠️ Methods: The paper implements a progressive context scaling framework combining warm-up supervised fine-tuning, curriculum-guided phased reinforcement learning, and difficulty-aware retrospective sampling strategy.

5. 📊 Results and Evaluation: QWEN LONG-L1-32B outperformed flagship LRMs like OpenAI-o3-mini and Qwen3-235B-A22B across seven long-context document question-answering benchmarks, achieving performance comparable to Claude-3.7-Sonnet-Thinking.

QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

1/2

Paper 3

QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization

Published: 2025-05-23

Link: http://arxiv.org/pdf/2505.18092

1. 📘 Topic and Domain: The paper presents QwenLong-CPRS, a context compression framework for large language models (LLMs) in the domain of natural language processing.

2. 💡 Previous Research and New Ideas: The work builds upon previous research in RAG frameworks and sparse attention mechanisms, proposing a novel dynamic context optimization paradigm that uses natural language instructions to guide multi-granularity context compression.

3. ❓ Problem: The paper addresses two key challenges: the prohibitive computational overhead during long sequence processing and the "lost in the middle" performance degradation where LLMs struggle to effectively handle lengthy inputs.

4. 🛠️ Methods: The authors implement four key innovations: natural language-guided dynamic optimization, bidirectional reasoning layers for boundary awareness, token critic mechanisms with language modeling heads, and window-parallel inference architecture.

5. 📊 Results and Evaluation: Across five benchmarks (4K-2M word contexts), QwenLong-CPRS achieved 21.59× context compression with 19.15-point average performance gains, surpassing leading proprietary LLMs by 4.85 and 10.88 points on Ruler-128K and InfiniteBench benchmarks.