2026-01-29 Papers

1/2

Paper 1

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Published: 2026-01-28

Link: http://arxiv.org/pdf/2601.20614

1. 📘 Topic and Domain: The paper focuses on enhancing mathematical reasoning capabilities in large language models through reinforcement learning techniques.

2. 💡 Previous Research and New Ideas: The paper builds on Group Relative Policy Optimization (GRPO) but identifies its implicit bias against harder questions, proposing Difficulty-Aware Group Policy Optimization (DGPO) with balanced advantage estimation and Multi-Aspect Question Reformulation (MQR) for data augmentation.

3. ❓ Problem: The paper addresses the systematic lack of emphasis on challenging questions in existing reinforcement learning methods, where GRPO's update magnitudes are suppressed for both easier and harder questions.

4. 🛠️ Methods: The authors use DGPO algorithm with difficulty-balanced group advantage estimation using mean absolute deviation and difficulty-aware question-level weighting, combined with MQR strategy that reformulates questions by adding story backgrounds, introducing abstract terminology, and nesting sub-problems.

5. 📊 Results and Evaluation: MathForge achieves 42.17% average accuracy across six benchmarks (4.56% improvement over GRPO baseline) when tested on Qwen2.5-Math-7B, with consistent improvements demonstrated across different model sizes and types including multimodal domains.

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

1/2

Paper 2

Advancing Open-source World Models

Published: 2026-01-28

Link: http://arxiv.org/pdf/2601.20540

1. 📘 Topic and Domain: The paper presents LingBot-World, an open-source world model for interactive video generation that bridges video synthesis and actionable simulation in computer vision and machine learning.

2. 💡 Previous Research and New Ideas: Building on video generation models and world simulators like Genie 3 and Wan2.2, the paper proposes a multi-stage evolution strategy (pre-training, middle-training, post-training) with hierarchical data captioning and mixture-of-experts architecture for long-term consistency.

3. ❓ Problem: The paper addresses the challenge of transitioning from passive video generation to interactive world simulation, tackling issues of scarce interactive data, maintaining long-term temporal coherence, and achieving real-time controllable generation.

4. 🛠️ Methods: The authors employ a scalable data engine with game/synthetic data acquisition, progressive curriculum training with MoE architecture, and causal architecture adaptation with few-step distillation for real-time inference.

5. 📊 Results and Evaluation: LingBot-World achieves superior performance on VBench metrics (0.8857 dynamic degree vs 0.7612/0.7217 for baselines), maintains minute-level temporal consistency, supports real-time interaction at 16fps, and demonstrates emergent spatial memory and 3D consistency capabilities.

Advancing Open-source World Models

1/2

Paper 3

DeepSeek-OCR 2: Visual Causal Flow

Published: 2026-01-28

Link: http://arxiv.org/pdf/2601.20552

1. 📘 Topic and Domain: The paper presents DeepSeek-OCR 2, a vision-language model for document reading and optical character recognition with a novel encoder that dynamically reorders visual tokens based on image semantics.

2. 💡 Previous Research and New Ideas: The paper builds on DeepSeek-OCR, DETR's parallelized queries, and BLIP-2's Q-former, proposing DeepEncoder V2 which replaces CLIP with an LLM-style architecture using causal attention to enable semantic-aware visual token reordering.

3. ❓ Problem: The paper addresses the limitation of conventional VLMs that process visual tokens in rigid raster-scan order, which contradicts human visual perception that follows flexible, semantically coherent scanning patterns driven by causal reasoning.

4. 🛠️ Methods: The authors use a vision tokenizer with SAM-base architecture, an LLM-style encoder (Qwen2-0.5B) with dual-stream attention (bidirectional for visual tokens, causal for learnable queries), and a DeepSeek-3B MoE decoder, trained in three stages.

5. 📊 Results and Evaluation: DeepSeek-OCR 2 achieves 91.09% overall performance on OmniDocBench v1.5 (3.73% improvement over baseline) with reduced Edit Distance for reading order (0.057 vs 0.085), while using fewer maximum visual tokens (1120 vs 1156).