2025-06-05 Papers

1/2

Paper 1

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

Published: 2025-06-04

Link: http://arxiv.org/pdf/2506.04207

1. 📘 Topic and Domain: The paper focuses on advancing multimodal reasoning capabilities in large language models through optimized training methods and reinforcement learning.

2. 💡 Previous Research and New Ideas: Based on DeepSeek-R1's success in textual reasoning, this paper proposes a novel three-stage curriculum combining text-centric cold start, multimodal RL, and text RL refinement.

3. ❓ Problem: The paper addresses the challenge of cultivating sophisticated multimodal reasoning abilities in MLLMs, as current methods often fail to fully unlock complex reasoning capabilities.

4. 🛠️ Methods: The authors develop a staged reinforcement optimization framework incorporating Prioritized Advantage Distillation (PAD), efficient-length reward function, and a carefully curated GRAMMAR dataset.

5. 📊 Results and Evaluation: Their ReVisual-R1 model achieves state-of-the-art performance among open-source 7B MLLMs across multiple reasoning benchmarks, outperforming previous models by an average of 16.8 percentage points.

Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning

1/2

Paper 2

MiMo-VL Technical Report

Published: 2025-06-04

Link: http://arxiv.org/pdf/2506.03569

1. 📘 Topic and Domain: The paper presents MiMo-VL, a vision-language model for multimodal AI systems, focusing on visual understanding, reasoning, and GUI interaction.

2. 💡 Previous Research and New Ideas: Based on previous vision-language models and RLHF research, it introduces Mixed On-policy Reinforcement Learning (MORL) and incorporates high-quality reasoning data in pre-training stages.

3. ❓ Problem: The paper aims to build a compact yet powerful vision-language model that can handle complex visual understanding, multimodal reasoning, and GUI interaction tasks while maintaining strong performance across diverse capabilities.

4. 🛠️ Methods: Uses a four-stage pre-training process (2.4 trillion tokens) combined with Mixed On-policy Reinforcement Learning (MORL), incorporating diverse reward signals and a native-resolution Vision Transformer architecture.

5. 📊 Results and Evaluation: MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35/40 tasks, scores 59.4 on OlympiadBench, achieves 56.1 on OSWorld-G, and shows strong performance across 50+ evaluation benchmarks, setting new standards for open-source vision-language models.

MiMo-VL Technical Report

1/2

Paper 3

SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models

Published: 2025-06-04

Link: http://arxiv.org/pdf/2506.04180

1. 📘 Topic and Domain: Long-form text generation using large language models, focusing on improving coherence and quality through structured thinking and reflection.

2. 💡 Previous Research and New Ideas: Based on research showing LLMs struggle with long-form coherence; proposes a novel three-stage framework (planning-writing-refining) that mimics human writing processes.

3. ❓ Problem: Addressing limitations in LLMs' ability to maintain coherence, logical consistency, and text quality when generating long-form content.

4. 🛠️ Methods: Developed SuperWriter-Agent framework with structured thinking stages, created supervised fine-tuning dataset, and implemented hierarchical Direct Preference Optimization using Monte Carlo Tree Search.

5. 📊 Results and Evaluation: Achieved state-of-the-art performance on WritingBench benchmark, surpassing larger baseline models in both automatic and human evaluations, with strong results in fluency, coherence, and logical consistency.