1. 📘 Topic and Domain: The paper focuses on physics-aware image editing in computer vision, specifically addressing how to generate physically plausible edits that obey natural laws.
2. 💡 Previous Research and New Ideas: The paper builds on instruction-based image editing methods like Qwen-Image-Edit and proposes reformulating editing as continuous physical state transitions rather than discrete mappings, introducing learnable transition queries to capture dynamics from video data.
3. ❓ Problem: Current image editing models achieve high semantic fidelity but frequently violate physical principles (e.g., incorrect refraction, implausible material deformation), treating editing as a black-box transformation without considering underlying physical laws.
4. 🛠️ Methods: The authors create PhysicTran38K (38K video-based dataset with physics categories), develop PhysicEdit framework with dual-thinking mechanism (frozen Qwen2.5-VL for reasoning + learnable transition queries for visual guidance), and use timestep-aware modulation for diffusion generation.
5. 📊 Results and Evaluation: PhysicEdit achieves 64.86% on PICABench (5.9% improvement over baseline) and 72.16% on KRISBench (10.1% improvement), outperforming all evaluated open-source models and remaining competitive with proprietary models in physical realism and knowledge-grounded editing.