2025-04-11 Papers

1/2

Paper 1

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

Published: 2025-04-10

Link: http://arxiv.org/pdf/2504.07960

1. 📘 Topic and Domain: Universal image generation framework called VisualCloze that leverages visual in-context learning to handle diverse image generation tasks within a single model.

2. 💡 Previous Research and New Ideas: Based on diffusion models and task-specific image generation approaches, proposing visual in-context learning where models learn tasks from visual demonstrations rather than relying solely on language instructions.

3. ❓ Problem: Addressing limitations of current image generation approaches that either require task-specific models or face challenges with task ambiguity, sparse task distributions, and lack of generalization to unseen tasks.

4. 🛠️ Methods: Creating a graph-structured dataset (Graph200K) with interrelated tasks, formulating image generation as an image infilling problem, and fine-tuning FLUX.1-Fill-dev to support visual in-context learning where tasks are demonstrated through examples.

5. 📊 Results and Evaluation: The model successfully handles various in-domain tasks with reduced ambiguity, generalizes to unseen tasks, enables task unification, and supports reverse generation, outperforming comparable methods in conditional generation, style transfer, and subject-driven image generation tasks.

VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning

1/2

Paper 2

MM-IFEngine: Towards Multimodal Instruction Following

Published: 2025-04-10

Link: http://arxiv.org/pdf/2504.07957

MM-IFEngine: Towards Multimodal Instruction Following

1/2

Paper 3

HoloPart: Generative 3D Part Amodal Segmentation

Published: 2025-04-10

Link: http://arxiv.org/pdf/2504.07943

1. 📘 Topic and Domain: The paper introduces "3D part amodal segmentation," a novel task in 3D computer vision that decomposes 3D shapes into complete semantic parts, even when parts are occluded.

2. 💡 Previous Research and New Ideas: The paper builds on existing 3D part segmentation techniques but extends beyond them by proposing a diffusion-based model (HoloPart) that can complete partial segments into full 3D parts, similar to how 2D amodal segmentation has evolved for images.

3. ❓ Problem: The paper solves the challenge of generating complete 3D parts from incomplete surface segments, addressing key difficulties in inferring occluded geometry, maintaining global shape consistency, and handling diverse shapes with limited training data.

4. 🛠️ Methods: The authors use a two-stage approach: first applying existing 3D part segmentation to obtain initial surface patches, then using their novel HoloPart diffusion model with local attention and context-aware attention mechanisms to complete these segments into full 3D parts.

5. 📊 Results and Evaluation: HoloPart significantly outperforms state-of-the-art shape completion methods on new benchmarks based on ABO and PartObjaverse-Tiny datasets, demonstrating superior performance in Chamfer Distance, IoU, and F-Score metrics, while enabling applications in geometry editing, animation, and material assignment.