2025-06-06 Papers

1/2

Paper 1

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

Published: 2025-06-05

Link: http://arxiv.org/pdf/2506.05301

1. 📘 Topic and Domain: One-step video restoration using diffusion models to improve low-quality videos with high computational efficiency.

2. 💡 Previous Research and New Ideas: Based on diffusion models and adversarial post-training, proposing new adaptive window attention and feature matching loss for efficient high-resolution video restoration.

3. ❓ Problem: The high computational cost and inference time of existing diffusion-based video restoration methods that require multiple sampling steps.

4. 🛠️ Methods: Uses adversarial post-training with progressive distillation, adaptive window attention mechanism, and enhanced loss functions including RpGAN loss and feature matching loss.

5. 📊 Results and Evaluation: Achieved comparable or better performance than multi-step methods while being 4x faster, evaluated on synthetic benchmarks (SPMCS, UDM10, REDS30, YouHQ40) and real-world datasets using both reference and no-reference metrics.

SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training

1/2

Paper 2

Video World Models with Long-term Spatial Memory

Published: 2025-06-05

Link: http://arxiv.org/pdf/2506.05284

1. 📘 Topic and Domain: Video world models with memory mechanisms for long-term consistent video generation, in the domain of computer vision and generative AI.

2. 💡 Previous Research and New Ideas: Based on diffusion-based video generation models, proposes a novel three-part memory system (spatial, working, and episodic memory) inspired by human memory mechanisms.

3. ❓ Problem: Addresses the limited temporal context window and forgetting problem in existing video world models that causes inconsistency when revisiting previously generated scenes.

4. 🛠️ Methods: Implements a geometry-grounded point cloud for spatial memory, recent context frames for working memory, and sparse historical keyframes for episodic memory, all integrated into a diffusion transformer architecture.

5. 📊 Results and Evaluation: Achieves significantly improved view recall consistency (PSNR: 19.10 vs baselines ~12.0) and higher user study ratings across camera accuracy, static consistency, and dynamic plausibility metrics.

Video World Models with Long-term Spatial Memory

1/2

Paper 3

Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts

Published: 2025-06-05

Link: http://arxiv.org/pdf/2506.05229

1. 📘 Topic and Domain: Optimization of Recurrent Memory Transformers (RMTs) for efficient long-context processing in language models.

2. 💡 Previous Research and New Ideas: Based on existing RMT and Parallel RMT architectures, introducing a novel "Diagonal Batching" technique that reorganizes computation to enable parallel processing.

3. ❓ Problem: Sequential execution bottleneck in RMTs that limits performance when processing long sequences of text.

4. 🛠️ Methods: Implements Diagonal Batching by reorganizing the layer-segment computation grid into concurrent diagonals, allowing up to N_Layers operations per kernel launch while maintaining exact recurrence.

5. 📊 Results and Evaluation: Achieved 3.3x speedup over standard LLaMA-1B and 1.8x speedup over sequential RMT implementation on 131,072-token sequences, while maintaining accuracy with only 1% relative error.