2025-11-13 Papers

1/2

Paper 1

TiDAR: Think in Diffusion, Talk in Autoregression

Published: 2025-11-11

Link: http://arxiv.org/pdf/2511.08923

1. 📘 Topic and Domain: The paper introduces TiDAR, a hybrid language model architecture that combines diffusion and autoregressive approaches for efficient text generation.

2. 💡 Previous Research and New Ideas: Based on previous work in diffusion language models and autoregressive models, it proposes a novel hybrid architecture that utilizes "free token slots" to combine parallel drafting from diffusion with high-quality autoregressive sampling in a single forward pass.

3. ❓ Problem: The paper addresses the challenge of achieving both high throughput and high quality in language model generation, as existing methods typically trade off between these aspects.

4. 🛠️ Methods: TiDAR uses a specially designed attention mask that enables parallel token drafting via diffusion and sequential sampling via autoregression within a single model forward pass, along with exact KV cache support.

5. 📊 Results and Evaluation: TiDAR 1.5B achieved 4.71x speedup and TiDAR 8B achieved 5.91x speedup in tokens per second compared to autoregressive models while maintaining comparable quality, outperforming both diffusion models and speculative decoding approaches.

TiDAR: Think in Diffusion, Talk in Autoregression

1/2

Paper 2

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

Published: 2025-11-12

Link: http://arxiv.org/pdf/2511.09515

1. 📘 Topic and Domain: Vision-Language-Action (VLA) models for robotic manipulation, focusing on reinforcement learning using world models.

2. 💡 Previous Research and New Ideas: Based on imitation learning and real-world RL approaches for VLA models, proposing a novel world model-based policy optimization framework that enables learning without real-world interaction.

3. ❓ Problem: Current VLA models struggle with learning from failures and self-correction, while direct reinforcement learning suffers from high sample complexity and safety concerns in real-world robotics.

4. 🛠️ Methods: Introduces WMPO (World Model-based Policy Optimization) that uses a pixel-based video-generative world model pretrained on robotic trajectories, combined with policy behavior alignment and Group Relative Policy Optimization (GRPO).

5. 📊 Results and Evaluation: WMPO outperformed baseline methods across four manipulation tasks in simulation and real-world settings, demonstrating improved sample efficiency, stronger performance, emergent self-correction behaviors, and robust generalization capabilities.

WMPO: World Model-based Policy Optimization for Vision-Language-Action Models

1/2

Paper 3

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Published: 2025-11-09

Link: http://arxiv.org/pdf/2511.06307

1. 📘 Topic and Domain: The paper focuses on data curation and training strategies for reinforcement learning in competitive code generation, specifically addressing how to construct effective RLVR (Reinforcement Learning with Verifiable Reward) datasets.

2. 💡 Previous Research and New Ideas: Previous research focused mainly on RLVR algorithm design and math benchmarks, while this paper introduces a novel two-stage RL framework that emphasizes data curation and curriculum learning for competitive programming.

3. ❓ Problem: The paper addresses the challenge of improving language models' performance in competitive programming tasks, where solutions must be both logically correct and computationally efficient.

4. 🛠️ Methods: The authors implement a two-stage approach: first, supervised fine-tuning followed by entropy expansion training on diverse problems, then a hard-focus curriculum learning stage using Group Relative Policy Optimization (GRPO) with increased rollouts on challenging problems.

5. 📊 Results and Evaluation: The approach achieved state-of-the-art performance among 32B parameter models, with improvements ranging from 13% to 58% across various benchmarks, demonstrating particularly strong gains on challenging problems in LeetCode and Codeforces contests.