1. 📘 Topic and Domain: The paper presents Helios, a 14B parameter autoregressive diffusion model for real-time long video generation in the domain of computer vision and generative AI.
2. 💡 Previous Research and New Ideas: The paper builds on diffusion transformers and autoregressive video generation methods, proposing new ideas including Unified History Injection for infinite video generation, Easy Anti-Drifting strategies without self-forcing, and Deep Compression Flow for efficient computation.
3. ❓ Problem: The paper aims to solve the challenge of generating high-quality, temporally coherent long videos in real-time, addressing issues of drifting, computational efficiency, and the limitations of existing models that are either too slow or produce low-quality results.
4. 🛠️ Methods: The authors use an autoregressive diffusion transformer with Guidance Attention blocks, Multi-Term Memory Patchification for context compression, Pyramid Unified Predictor Corrector for multi-scale generation, and Adversarial Hierarchical Distillation to reduce sampling steps from 50 to 3.
5. 📊 Results and Evaluation: Helios achieves 19.5 FPS on a single H100 GPU while generating minute-scale videos, outperforming existing methods on the newly introduced HeliosBench across metrics including aesthetic quality, motion smoothness, semantic alignment, and naturalness, with a 128× speedup compared to baseline models.