2025-11-04 Papers

1/2

Paper 1

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

Published: 2025-11-03

Link: http://arxiv.org/pdf/2511.01678

1. 📘 Topic and Domain: The paper presents UniLumos, a unified framework for image and video relighting that aims to achieve physically plausible lighting effects through AI-based methods.

2. 💡 Previous Research and New Ideas: Based on previous diffusion models for relighting that operate in semantic latent space, this paper introduces physics-plausible feedback by incorporating RGB-space geometry feedback into a flow-matching backbone.

3. ❓ Problem: The paper addresses the issue of unrealistic lighting effects in existing diffusion-based relighting methods, which often produce overexposed highlights, misaligned shadows, and incorrect occlusions due to lack of physical correctness.

4. 🛠️ Methods: The authors implement physics-plausible feedback using depth and normal maps extracted from outputs, employ path consistency learning for efficient training, and develop a structured six-dimensional annotation protocol for illumination attributes.

5. 📊 Results and Evaluation: UniLumos achieved state-of-the-art relighting quality with improved physical consistency while delivering a 20x speedup for both image and video relighting, evaluated through metrics like PSNR, SSIM, LPIPS, and a new LumosBench framework.

UniLumos: Fast and Unified Image and Video Relighting with Physics-Plausible Feedback

1/2

Paper 2

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

Published: 2025-11-03

Link: http://arxiv.org/pdf/2511.01295

1. 📘 Topic and Domain: The paper presents UniREditBench, a comprehensive benchmark for evaluating reasoning-based image editing models across both real-world and game-world scenarios.

2. 💡 Previous Research and New Ideas: Previous benchmarks focused mainly on single-object attribute transformations in realistic scenarios; this paper introduces new dimensions including multi-object interactions and game-world scenarios with human-defined rules, plus a dual-reference evaluation system.

3. ❓ Problem: The paper addresses the lack of comprehensive benchmarks for evaluating complex reasoning-based image editing tasks and the limitations of text-only reference evaluation methods.

4. 🛠️ Methods: The authors developed a multi-scenario data synthesis pipeline to create 2,700 curated samples across 8 primary dimensions and 18 sub-dimensions, implemented dual-reference evaluation using both textual and ground-truth image references, and created UniREdit-Data-100K dataset with chain-of-thought reasoning annotations.

5. 📊 Results and Evaluation: The fine-tuned UniREdit-Bagel model showed substantial improvements over both open-source and closed-source models in handling complex reasoning-based image editing tasks, demonstrating the effectiveness of their benchmark and dataset.

UniREditBench: A Unified Reasoning-based Image Editing Benchmark

1/2

Paper 3

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

Published: 2025-11-03

Link: http://arxiv.org/pdf/2511.01833

1. 📘 Topic and Domain: A comprehensive benchmark called TIR-Bench for evaluating agentic thinking-with-images reasoning capabilities in multimodal large language models.

2. 💡 Previous Research and New Ideas: Based on previous visual search benchmarks that only test basic operations, this paper proposes a more comprehensive benchmark testing complex tool-based image manipulation and reasoning.

3. ❓ Problem: Current benchmarks fail to fully evaluate advanced visual reasoning capabilities like intelligently creating and operating tools to transform images for problem-solving.

4. 🛠️ Methods: Created a 13-task benchmark requiring tool use for image processing, evaluated 22 multimodal language models including both open source and proprietary models with and without tool-use capabilities.

5. 📊 Results and Evaluation: TIR-Bench proved challenging with best performance at only 46%, models with tool-use capabilities significantly outperformed standard models, and agentic fine-tuning was shown more effective than direct fine-tuning.