2025-09-10 Papers

1/2

Paper 1

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Published: 2025-09-09

Link: http://arxiv.org/pdf/2509.07980

1. 📘 Topic and Domain: The paper focuses on developing parallel thinking capabilities in large language models through reinforcement learning for mathematical reasoning tasks.

2. 💡 Previous Research and New Ideas: Previous research relied on supervised fine-tuning with synthetic data, while this paper introduces the first reinforcement learning framework for parallel thinking that can explore multiple reasoning paths simultaneously.

3. ❓ Problem: The paper addresses the challenge of effectively training language models to use parallel thinking for complex reasoning tasks, as existing methods struggle with exploration and generalization.

4. 🛠️ Methods: The authors implement a progressive curriculum combining supervised fine-tuning on simple tasks followed by reinforcement learning on harder problems, using specialized reward schemes and both causal and structured model variants.

5. 📊 Results and Evaluation: The approach achieved 8.4% accuracy improvements over sequential thinking models on math benchmarks like MATH, AMC23, and AIME, with a notable 42.9% improvement on AIME25 when using parallel thinking as a mid-training exploration scaffold.

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

1/2

Paper 2

Visual Representation Alignment for Multimodal Large Language Models

Published: 2025-09-09

Link: http://arxiv.org/pdf/2509.07979

1. 📘 Topic and Domain: Visual representation alignment in multimodal large language models (MLLMs) to improve their visual understanding capabilities.

2. 💡 Previous Research and New Ideas: Based on previous MLLMs like LLaVA that use text-only supervision, proposes new idea of aligning internal visual representations with pre-trained vision foundation models.

3. ❓ Problem: MLLMs trained with text-only supervision often discard important visual details, leading to poor performance in vision-centric tasks like object counting and spatial reasoning.

4. 🛠️ Methods: Introduces VIRAL (VIsual Representation ALignment), which aligns the internal visual representations of MLLMs with those of pre-trained vision foundation models using cosine similarity-based loss.

5. 📊 Results and Evaluation: Achieved consistent improvements across multiple benchmarks, with significant gains in vision-centric tasks, and demonstrated better training efficiency and robustness in spatial reasoning tasks.

Visual Representation Alignment for Multimodal Large Language Models

1/2

Paper 3

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

Published: 2025-09-09

Link: http://arxiv.org/pdf/2509.07969

1. 📘 Topic and Domain: The paper focuses on developing a visual language model called Mini-o3 for multi-turn visual search tasks through reinforcement learning and tool-based interactions.

2. 💡 Previous Research and New Ideas: Based on previous research in tool-based visual language models like DeepEyes and Chain-of-Focus, it proposes new techniques for scaling up reasoning patterns and interaction turns beyond existing limitations.

3. ❓ Problem: The paper addresses the limitation of existing open-source visual language models that exhibit monotonous reasoning patterns and allow only limited interaction turns, making them inadequate for difficult visual search tasks.

4. 🛠️ Methods: The authors use a three-component approach: constructing a Visual Probe Dataset, developing an iterative data collection pipeline for cold-start trajectories, and implementing an over-turn masking strategy in reinforcement learning.

5. 📊 Results and Evaluation: Mini-o3 achieved state-of-the-art performance on multiple visual search benchmarks, demonstrating the ability to scale to tens of interaction turns and showing improved accuracy as the number of turns increased, despite being trained with only a 6-turn limit.