2025-08-06 Papers

1/2

Paper 1

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

Published: 2025-08-05

Link: http://arxiv.org/pdf/2508.03694

1. 📘 Topic and Domain: Ultra-long controllable video generation using multimodal guidance in computer vision and deep learning.

2. 💡 Previous Research and New Ideas: Based on existing short video generation models like CogVideoX and ControlNet, proposing new techniques for long-form generation including unified noise initialization and global control signal normalization.

3. ❓ Problem: Current video generation models struggle with temporal inconsistency and visual degradation when generating longer videos (up to one minute).

4. 🛠️ Methods: Developed LongVie framework using multimodal control (dense depth maps and sparse keypoints), global normalization, unified noise initialization, and degradation-aware training to generate long videos autoregressively.

5. 📊 Results and Evaluation: Achieved state-of-the-art performance on their new LongVGenBench dataset of 100 high-resolution videos, demonstrating superior long-range controllability, consistency, and visual quality compared to baselines.

LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation

1/2

Paper 2

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

Published: 2025-08-05

Link: http://arxiv.org/pdf/2508.03320

1. 📘 Topic and Domain: The paper introduces Skywork UniPic, a unified autoregressive model for visual AI tasks including image understanding, text-to-image generation, and image editing.

2. 💡 Previous Research and New Ideas: Based on previous fragmented approaches using separate models for different tasks, it proposes a novel unified architecture with decoupled visual encoding strategy using MAR for generation and SigLIP2 for understanding.

3. ❓ Problem: The paper addresses the challenge of creating a single, parameter-efficient architecture that can excel at multiple visual AI tasks while remaining deployable on commodity hardware.

4. 🛠️ Methods: The method employs a 1.5B-parameter model with four core components: MAR encoder-decoder, SigLIP2 encoder, shared language model backbone, and MLP projection layers, trained through a progressive four-stage curriculum.

5. 📊 Results and Evaluation: The model achieves state-of-the-art performance across multiple benchmarks: 0.86 on GenEval, 85.5 on DPG-Bench, 5.83 on GEditBench-EN, and 3.49 on ImgEdit-Bench, while requiring only 15GB GPU memory for 1024×1024 image generation.

Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation

1/2

Paper 3

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Published: 2025-08-05

Link: http://arxiv.org/pdf/2508.03686

1. 📘 Topic and Domain: The paper presents CompassVerifier, a unified verification model for evaluating large language model outputs and providing reward signals for reinforcement learning, in the domain of natural language processing and model evaluation.

2. 💡 Previous Research and New Ideas: Based on previous research using rule-based matching and LLM-based verification methods, the paper proposes a novel lightweight verifier model and comprehensive benchmark for systematic evaluation of verification capabilities.

3. ❓ Problem: The paper addresses the lack of comprehensive benchmarks for evaluating verification capabilities across different LLMs and the limitations of existing verification approaches in handling complex edge cases and generalizing across domains.

4. 🛠️ Methods: The authors developed VerifierBench through multi-stage data collection and filtering, and created CompassVerifier using three key techniques: Complex Formula Augmentation, Error-Driven Adversarial Augmentation, and Generalizability Augmentation.

5. 📊 Results and Evaluation: CompassVerifier achieved state-of-the-art performance across diverse domains and tasks, with the 32B model reaching 90.8% accuracy and 87.7% F1-score, significantly outperforming larger general LLMs and baseline verifier models.