1. 📘 Topic and Domain: Video stylization using a unified framework that supports multiple style conditions (text, style image, and first frame) for video-to-video transformation.
2. 💡 Previous Research and New Ideas: Based on previous single-condition video stylization methods, this paper introduces a unified framework that combines multiple style conditions and proposes a novel token-specific LoRA architecture with a systematic data curation pipeline.
3. ❓ Problem: Existing video stylization methods are limited to single style conditions, suffer from style inconsistency, and lack high-quality datasets for training.
4. 🛠️ Methods: Uses a two-stage training approach with CT and SFT datasets, builds on an I2V model with token-specific LoRA, and employs a data curation pipeline combining image stylization and I2V generation with ControlNets.
5. 📊 Results and Evaluation: Outperforms competitors across all three stylization tasks (text-guided, style-image-guided, and first-frame-guided) in terms of style consistency and video quality, as demonstrated through both quantitative metrics and user studies.