1. 📘 Topic and Domain: Real-time interactive long video generation using frame-level autoregressive models for AI-powered video content creation.
2. 💡 Previous Research and New Ideas: Built upon diffusion models and autoregressive video generation, introducing new techniques like KV-recache, streaming long tuning, and short window attention with frame sink.
3. ❓ Problem: Addressing the challenges of generating high-quality long videos efficiently while enabling real-time interactive control through prompt switching.
4. 🛠️ Methods: Implemented KV-recache to refresh cached states during prompt switches, streaming long tuning for train-long-test-long alignment, and short window attention with frame sink for faster generation.
5. 📊 Results and Evaluation: Achieved 20.7 FPS on a single NVIDIA H100 GPU, supported up to 240-second video generation, and outperformed baselines on VBench benchmarks while requiring only 32 GPU-days for training.