2025-09-26 Papers

1/2

Paper 1

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Published: 2025-09-25

Link: http://arxiv.org/pdf/2509.21268

1. 📘 Topic and Domain: The paper focuses on enhancing multimodal reasoning in large language models through improved reinforcement learning techniques and high-quality training data.

2. 💡 Previous Research and New Ideas: Based on Group Relative Policy Optimization (GRPO) for reinforcement learning, the paper proposes a novel Variance-Aware Sampling (VAS) strategy and introduces large-scale curated datasets for multimodal reasoning.

3. ❓ Problem: The paper addresses two main limitations in multimodal reasoning models: the lack of high-quality long chain-of-thought data and the instability of reinforcement learning algorithms in post-training due to gradient vanishing.

4. 🛠️ Methods: The authors developed VAS, which uses Variance Promotion Score combining outcome variance and trajectory diversity to improve policy optimization, and curated ~1.6M long chain-of-thought data and ~15k RL QA pairs.

5. 📊 Results and Evaluation: The model achieved state-of-the-art performance across mathematical reasoning benchmarks, with the 7B model reaching an average score of 58.4 and demonstrating strong improvements in convergence, stability, and downstream performance.

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

1/2

Paper 2

Tree Search for LLM Agent Reinforcement Learning

Published: 2025-09-25

Link: http://arxiv.org/pdf/2509.21240

1. 📘 Topic and Domain: The paper focuses on tree-based reinforcement learning methods for training Large Language Model (LLM) agents, specifically in the domain of multi-turn agent interactions and decision-making.

2. 💡 Previous Research and New Ideas: Based on existing chain-based RL approaches for LLMs, the paper proposes a novel tree-based sampling strategy where each node represents a complete agent interaction step, introducing more efficient rollout sampling and finer-grained supervision signals.

3. ❓ Problem: The paper addresses two key challenges in LLM agent RL: heavy budget consumption in rollouts due to multi-turn interactions, and sparse supervision signals in long-horizon trajectories.

4. 🛠️ Methods: The authors develop Tree-GRPO (Tree-based Group Relative Policy Optimization), which uses tree search for rollout sampling and estimates grouped relative advantages at both intra-tree and inter-tree levels to provide step-level process supervision signals.

5. 📊 Results and Evaluation: Experiments across 11 datasets and 3 types of QA tasks showed Tree-GRPO consistently outperformed chain-based methods, achieving superior performance while using only a quarter of the rollout budget, with particularly strong improvements for smaller models.

Tree Search for LLM Agent Reinforcement Learning

1/2

Paper 3

Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

Published: 2025-09-25

Link: http://arxiv.org/pdf/2509.21245

1. 📘 Topic and Domain: A unified framework for controllable 3D asset generation from images using multiple conditioning signals, in the domain of computer vision and 3D graphics.

2. 💡 Previous Research and New Ideas: Based on Hunyuan3D 2.1 and recent advances in 3D-native generative models, proposing a novel unified framework that integrates multiple control signals (point clouds, voxels, bounding boxes, and skeletons) into a single model.

3. ❓ Problem: Existing 3D generation methods lack fine-grained control and cross-modal capabilities, limiting their practical applications in production workflows.

4. 🛠️ Methods: Implements a unified control encoder that processes multiple types of conditioning signals, combining them with image features in a shared architecture using Diffusion Transformers (DiT) and VAE-based decoding.

5. 📊 Results and Evaluation: Demonstrates improved generation accuracy and control across different conditions: accurate pose alignment for characters, proper scale adjustment with bounding boxes, enhanced geometric detail with point clouds, and better shape fidelity with voxel conditions.