2025-07-25 Papers

1/2

Paper 1

Group Sequence Policy Optimization

Published: 2025-07-23

Link: http://arxiv.org/pdf/2507.18071

1. 📘 Topic and Domain: The paper introduces a new reinforcement learning algorithm called Group Sequence Policy Optimization (GSPO) for training large language models.

2. 💡 Previous Research and New Ideas: Based on previous GRPO (Group Relative Policy Optimization) algorithm, it proposes a novel sequence-level approach rather than token-level optimization for reinforcement learning.

3. ❓ Problem: The paper aims to solve the instability and inefficiency issues in current RL algorithms like GRPO, which can lead to model collapse when training large language models.

4. 🛠️ Methods: GSPO defines importance ratios based on sequence likelihood rather than token-level weights, and performs sequence-level clipping, rewarding, and optimization.

5. 📊 Results and Evaluation: GSPO achieved superior training stability and efficiency compared to GRPO, stabilized Mixture-of-Experts (MoE) RL training without requiring complex stabilization strategies, and contributed to performance improvements in Qwen3 models.

Group Sequence Policy Optimization

1/2

Paper 2

Captain Cinema: Towards Short Movie Generation

Published: 2025-07-24

Link: http://arxiv.org/pdf/2507.18634

1. 📘 Topic and Domain: The paper presents "Captain Cinema," a framework for generating short movies from textual descriptions, operating in the domain of AI-generated video content and narrative storytelling.

2. 💡 Previous Research and New Ideas: Based on previous text-to-video models that could only generate 5-10 second clips, this paper introduces a novel two-stage approach combining top-down keyframe planning with bottom-up video synthesis for longer, narratively coherent videos.

3. ❓ Problem: The paper addresses the challenge of generating long-form, narratively coherent videos with consistent characters and scenes, as existing approaches struggle with maintaining coherence beyond short clips.

4. 🛠️ Methods: The method uses a two-stage approach: first generating keyframes using a Multimodal Diffusion Transformer with GoldenMem compression for long-context memory, then synthesizing video between keyframes using interleaved conditioning.

5. 📊 Results and Evaluation: The results show superior performance in generating visually coherent and narratively consistent short movies compared to baselines, evaluated through automated metrics and user studies, with particularly strong results in temporal dynamics and character consistency preservation.

Captain Cinema: Towards Short Movie Generation

1/2

Paper 3

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Published: 2025-07-24

Link: http://arxiv.org/pdf/2507.18537

1. 📘 Topic and Domain: Test-time scaling framework for Visual Auto-Regressive (VAR) image generation models.

2. 💡 Previous Research and New Ideas: Based on test-time scaling in diffusion models and LLMs, proposes the first scaling framework specifically designed for VAR models' coarse-to-fine generation process.

3. ❓ Problem: How to improve image generation quality in VAR models without additional training or substantial computational costs.

4. 🛠️ Methods: Implements adaptive descending batch sizes, clustering-based diversity search for early scales, and resampling-based potential selection for late scales.

5. 📊 Results and Evaluation: Achieved 8.7% improvement in GenEval score (0.69→0.75) on the Infinity model, with consistent improvements across multiple evaluation metrics.