2026-03-30 Papers

1/2

Paper 1

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

Published: 2026-03-26

Link: http://arxiv.org/pdf/2603.25502

1. 📘 Topic and Domain: The paper focuses on real-world image restoration using large-scale image editing models to handle diverse degradations.
2. 💡 Previous Research and New Ideas: Based on prior all-in-one restoration and large image editing models, the paper introduces a two-stage training strategy combining synthetic and real-world degradation data for better generalization.
3. ❓ Problem: Existing restoration models struggle with poor generalization due to limited training data and simplified synthetic degradations that do not reflect real-world complexity.
4. 🛠️ Methods: The paper constructs a large-scale dataset covering nine degradation types, fine-tunes an open-source image editing model (Step1X-Edit) with a progressive mixed training strategy, and establishes RealIR-Bench for non-reference evaluation.
5. 📊 Results and Evaluation: RealRestorer ranks first among open-source methods and achieves performance comparable to leading closed-source systems, excelling in deblurring and low-light enhancement while demonstrating strong zero-shot generalization.
1. 📘 主题与领域: 论文聚焦于利用大规模图像编辑模型进行真实世界图像复原,以处理多种退化类型。
2. 💡 先前研究与新思路: 基于先前的一体化图像复原和大尺度图像编辑模型,论文提出了结合合成数据与真实退化数据的双阶段训练策略,以提升泛化能力。
3. ❓ 问题: 现有复原模型因训练数据有限及合成退化模式简化,难以泛化到真实世界的复杂退化场景。
4. 🛠️ 方法: 论文构建了覆盖九种退化类型的大规模数据集,通过渐进混合训练策略从开源图像编辑模型Step1X-Edit微调,并建立了无参考评估基准RealIR-Bench。
5. 📊 结果与评估: RealRestorer在开源方法中排名第一,在去模糊和低光增强等任务上表现优异,性能接近领先闭源系统,并展示了强大的零样本泛化能力。

RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models

RealRestorer: Workflow Overview Data Construction Synthetic Data ~1.5M samples Clean + Degradation Synthesis Pipeline SAM-2, MiDaS, VLMs Real-World Data ~100K samples Web Collection CLIP Filtering VLM Assessment 9 Degradation Types Blur Rain Noise Low-light Haze Moiré Reflection Flare Compression Model Architecture Step1X-Edit DiT Backbone + QwenVL Text Encoder Flux-V AE Image Encoding Latent Space (Frozen) DiT Dual-Stream Semantic + Noise (Fine-tuned) QwenVL Text Encoding Semantic Cond. (Frozen) Two-Stage Training Stage 1: Transfer Training Synthetic Data Only Resolution: 1024×1024 LR: 1e-5 (constant) Batch Size: 16 ~500 steps Knowledge Transfer Stage 2: SFT Progressively-Mixed Synthetic:Real = 2:8 Cosine Annealing LR Batch Size: 32 ~1.5K steps Real-World Adaptation RealIR-Bench Evaluation Benchmark 464 real-world degraded images 9 degradation types No reference needed VLM-based eval Metrics RS (Restoration Score) VLM-based quality LPIPS Perceptual similarity FS (Final Score) FS = 0.2×(1-LPIPS)×RS RealRestorer Output State-of-the-Art Open-Source Model #1 among open-source methods Highly competitive with closed-source systems Key Contributions 1 RealRestorer Open-source SOTA restoration model 2 Data Generation Pipeline Comprehensive degradation synthesis 3 RealIR-Bench Non-reference benchmark with VLM eval 4 Two-Stage Training Transfer + SFT with progressive mixing 5 Strong Generalization Zero-shot performance on unseen tasks Workflow Legend: Data Input Model Arch Training Evaluation Output The workflow shows RealRestorer's pipeline: constructing training data (synthetic + real-world), building the model architecture on Step1X-Edit, applying two-stage training, and evaluating on RealIR-Bench. Training Details: • 8 NVIDIA H800 GPUs • ~1 day total training time • Resolution: 1024×1024 • 28-step denoising process • First 1/4 SingleStreamBlocks frozen in SFT stage Evaluation Scope: • FoundIR dataset (750 paired images) • Flare7K++ (100 flare images) • UHDM (500 moiré images) • SIR²+ (201 reflection images)
Q1
1. What is the key innovation in RealRestorer's two-stage training strategy?
Using only synthetic degradation data in both stages
Combining synthetic data transfer learning with real-world data fine-tuning through progressively-mixed training
Freezing all layers and training only the text encoder
Q2
2. How many degradation types does the RealIR-Bench benchmark cover?
Five common types
Seven degradation types
Nine degradation types including blur, rain, noise, low-light, moiré patterns, haze, compression, reflection, and flare
Q3
3. Which metric does RealRestorer use to measure content consistency preservation?
PSNR (Peak Signal-to-Noise Ratio)
SSIM (Structural Similarity Index)
LPIPS (Learned Perceptual Image Patch Similarity)
1/2

Paper 2

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Published: 2026-03-25

Link: http://arxiv.org/pdf/2603.24800

1. 📘 Topic and Domain: The paper focuses on improving Diffusion Transformers (DiTs) for text-to-image generation through parameter-efficient calibration techniques.
2. 💡 Previous Research and New Ideas: The paper builds on prior work revealing uneven contributions of DiT blocks (Stable Flow, FreeFlux) and introduces the novel hypothesis that optimal block weighting via learned scaling parameters can significantly enhance model performance.
3. ❓ Problem: The paper addresses the suboptimal weighting of standard DiT architectures, where certain blocks may introduce detrimental artifacts and the overall generation quality can be improved through post-hoc calibration.
4. 🛠️ Methods: The proposed Calibri method frames DiT calibration as a black-box reward optimization problem solved using the gradient-free CMA-ES evolutionary algorithm, optimizing only ~10² parameters through block, layer, or gate scaling.
5. 📊 Results and Evaluation: Experimental results demonstrate consistent performance improvements across FLUX, SD-3.5M, and Qwen-Image models, with up to 18% HPSv3 improvement while reducing inference steps from 30-100 to just 15 steps.
1. 📘 主题与领域: 该论文聚焦于通过参数高效校准技术提升扩散Transformer(DiTs)在文生图任务中的生成能力。
2. 💡 先前研究与新思路: 该论文基于先前揭示DiT块贡献不均的研究成果(Stable Flow、FreeFlux),提出通过学习缩放参数进行最优块加权可显著提升模型性能的新假设。
3. ❓ 问题: 该论文解决的问题是标准DiT架构的权重设置并非最优,某些块可能产生有害伪影,需要通过后校准提升整体生成质量。
4. 🛠️ 方法: 提出的Calibri方法将DiT校准建模为黑盒奖励优化问题,采用无梯度CMA-ES进化算法优化约10²个参数,通过块级、层级或门级缩放实现。
5. 📊 结果与评估: 实验结果表明,该方法在FLUX、SD-3.5M和Qwen-Image模型上均实现一致的性能提升,HPSv3提升高达18%,同时将推理步数从30-100步减少至仅15步。

Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration

Calibri: Method Workflow Input Processing Text Prompt Diffusion Transformer (DiT / MM-DiT) Generated Image with Calibri Enhancement Key Insight: Single learned scaling parameter improves performance CMA-ES Calibration Loop Step I: Sample Candidates Draw from Gaussian N(μ, σ²C) Step II: Generate Sample Batches Create images with candidate params Step III: Evaluate Candidates Reward Model (HPSv3, IR, Q-Align) Step IV: Update Parameters Adapt μ and C toward better candidates Iteration Loop Calibration Granularity Levels Block Scaling Shared coefficient for Attention + MLP layers ~57-76 parameters Layer Scaling Individual coefficients for each layer in block More flexibility Gate Scaling For MM-DiT: separate visual/textual gates 114-482 parameters Output-level ω Final output calibration weight for ensemble Combines with layers Calibri Ensemble Model 1 (c₁) Model 2 (c₂) Model N (cₙ) + + F{ci}(x,t,p) = Σ ωᵢ · f^si(x,t,p|∅) Results & Benefits Key Achievements • Only ~10² parameters modified • Consistent quality improvements • Reduced inference steps (30→15) • Works with multiple baselines Models Tested • FLUX.1-dev • Stable Diffusion 3.5 Medium • Qwen-Image • Combined with Flow-GRPO Evaluation Metrics • HPSv3 (Human Preference) • ImageReward (IR) • Q-Align (Quality) • Human Evaluation Parameter-Efficient Diffusion Transformer Calibration via Black-Box Optimization
Q1
1. What did the paper discover about the contributions of DiT blocks to generation quality?
All DiT blocks contribute equally to generation
Some DiT blocks can introduce detrimental artifacts that reduce quality
DiT blocks only contribute to inference speed, not quality
Q2
2. What optimization algorithm does Calibri use to find optimal scaling coefficients?
CMA-ES (Covariance Matrix Adaptation Evolution Strategy)
AdamW (Adaptive Moment Estimation)
PPO (Proximal Policy Optimization)
Q3
3. How many inference steps does Calibri reduce the generation process to across different models?
5 steps
15 steps
50 steps
1/2

Paper 3

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Published: 2026-03-25

Link: http://arxiv.org/pdf/2603.24472

1. 📘 Topic and Domain: The paper investigates self-distillation in large language models (LLMs) for mathematical reasoning tasks, focusing on how post-training methods affect reasoning capability.
2. 💡 Previous Research and New Ideas: Based on prior work showing self-distillation improves performance in domains like agentic environments and scientific reasoning, the paper introduces a new hypothesis that performance degradation in math reasoning stems from suppression of epistemic verbalization—the model's expression of uncertainty during reasoning.
3. ❓ Problem: The paper addresses why self-distillation, while effective in some domains, can degrade reasoning performance in mathematical tasks despite guiding models toward correct answers.
4. 🛠️ Methods: The authors use controlled experiments varying information richness in teacher conditioning and task coverage, analyzing how different conditioning contexts (unguided vs. solution-guided generation) affect epistemic token usage and out-of-distribution performance across multiple models (Qwen3-8B, DeepSeek-Distill-Qwen-7B, and Olmo3-7B-Instruct).
5. 📊 Results and Evaluation: Self-distillation with rich conditioning contexts reduces epistemic verbalization and response length, enabling rapid in-domain optimization with limited task coverage but causing up to 40% performance degradation on OOD benchmarks (AIME24, AMC23); GRPO maintains or improves performance while SDPO degrades it, and performance drops correlate with reduced uncertainty expression.
1. 📘 主题与领域: 该论文研究大型语言模型(LLM)中的自蒸馏方法在数学推理任务中的应用,重点关注后训练方法如何影响推理能力。
2. 💡 先前研究与新思路: 基于先前在智能体环境和科学推理等领域证明自蒸馏有效的研�,本文提出一个新假设:数学推理性能下降的原因在于抑制了认知语言化(epistemic verbalization)——即模型在推理过程中表达不确定性的行为。
3. ❓ 问题: 该论文探讨为什么自蒸馏在某些领域有效,却在数学任务中可能导致推理性能下降,尽管该方法引导模型朝向正确答案。
4. 🛠️ 方法: 作者通过控制实验,改变教师条件设置中的信息丰富度和任务覆盖率,分析不同条件上下文(无引导生成 vs. 解题方案引导生成)如何影响认知标记使用和分布外性能,实验涵盖多个模型(Qwen3-8B、DeepSeek-Distill-Qwen-7B 和 Olmo3-7B-Instruct)。
5. 📊 结果与评估: 使用丰富条件上下文的自蒸馏减少了认知语言化和响应长度,在有限任务覆盖下实现快速领域内优化,但在分布外基准测试(AIME24、AMC23)上导致高达40%的性能下降;GRPO保持或提升性能而SDPO导致性能下降,性能下降与不确定性表达的减少相关。

Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs?

Research Methodology Flowchart Self-Distillation & LLM Reasoning Capability 🔬 OBSERVATION Self-distillation improves Chemistry but degrades Math performance ❓ KEY QUESTION Why does performance degrade despite training toward correct answers? 💡 HYPOTHESIS Suppression of epistemic verbalization (uncertainty expression) Two Key Factors Identified 📊 Factor 1: Information Richness Richer context → More confident & concise reasoning → Less uncertainty Measured by conditional mutual information I(y;c|x) 📚 Factor 2: Task Coverage Small coverage → Rapid in-domain gains Large coverage → OOD performance degrades Diverse tasks need uncertainty expression 🧪 EXPERIMENTAL SETUP Models: • DeepSeek-R1-Distill-Qwen-7B • Qwen3-8B • Olmo3-7B-Instruct Methods: • GRPO (baseline) • SDPO (self-distillation) • Off-policy & On-policy Datasets: • DAPO-Math-17k • AIME24, AIME25, AMC23 • MATH500 Metrics: • Response length • Accuracy (Acc@16, Pass@16) • Epistemic token count 🔍 Key Findings 1. Rich Context → Suppression Teacher with full solution (c=s) produces confident, concise reasoning with minimal epistemic tokens 2. Training on Concise → Degradation SFT on solution-guided responses (Dsg) causes up to 40% performance drop despite correct answers in training 3. Task Coverage Matters Small |D|: SDPO works well Large |D|: GRPO outperforms SDPO Epistemic expression crucial for OOD ✅ CONCLUSION Exposing appropriate levels of uncertainty is crucial for robust reasoning. Post-training should optimize reasoning behavior beyond merely reinforcing correct answers. T Epistemic Tokens wait, hmm, perhaps maybe, actually, alternatively seems, might, likely, check (from Kim et al., 2026)
Q1
1. What is 'epistemic verbalization' as defined in this paper?
The model's expression of uncertainty during reasoning (e.g., tokens like 'wait', 'hmm', 'perhaps')
The process of compressing knowledge from a larger model into a smaller one
The verification of mathematical answers using automated theorem provers
Q2
2. Why does self-distillation with rich conditioning context (full solutions) sometimes hurt out-of-distribution performance?
It makes the model too large and slow for practical deployment
It suppresses epistemic verbalization, removing valuable uncertainty signals needed for generalization
It causes the model to forget language capabilities and focus only on math
Q3
3. Based on the paper's findings, what happens when task coverage is increased from small to large in SDPO training?
Performance improves consistently as more diverse problems are covered
The concise reasoning style from SDPO becomes more beneficial
The aggressive suppression of uncertainty becomes harmful as diversity increases

Today's Reading Tips 今日阅读推荐

Start with the RealRestorer paper for its practical open-source restoration system and new benchmark; it builds on large‑scale image editing models that share foundations with the Calibri method, which calibrates diffusion transformers for text‑to‑image generation. The third paper on self‑distillation in LLMs addresses a different domain (language‑model reasoning) and can be read later if you are interested in post‑training analysis.