1. 📘 Topic and Domain: The paper presents Seaweed-7B, a cost-effective video generation foundation model with 7 billion parameters, focusing on efficient training strategies in the domain of AI-generated video.
2. 💡 Previous Research and New Ideas: The paper builds on prior video generation models like Sora and MovieGen, proposing that medium-sized models can match or exceed larger models through optimized architecture, training strategies, and data curation.
3. ❓ Problem: The paper addresses the excessive computational costs of training and deploying video generation models, which typically require thousands of GPUs and substantial resources.
4. 🛠️ Methods: The authors trained a 7B-parameter diffusion transformer with a hybrid-stream architecture, using multi-stage training on mixed-resolution data, specialized variational autoencoder designs, and model optimization techniques to maximize efficiency.
5. 📊 Results and Evaluation: Seaweed-7B achieved performance comparable to or better than larger models trained with substantially more resources, ranking second in image-to-video generation in Elo ratings while requiring only 665,000 H100 GPU hours (27.7 days on 1,000 GPUs).