2025-07-17 Papers

1/2

Paper 1

PhysX: Physical-Grounded 3D Asset Generation

Published: 2025-07-16

Link: http://arxiv.org/pdf/2507.12465

1. 📘 Topic and Domain: Physical-grounded 3D asset generation, combining computer vision, 3D modeling, and physics simulation.

2. 💡 Previous Research and New Ideas: Based on existing 3D datasets like ShapeNet and PartNet that focus mainly on geometry/appearance, this paper introduces the first comprehensive physics-annotated 3D dataset and generation framework.

3. ❓ Problem: Current 3D generative models overlook physical properties of objects, limiting their real-world applications in simulation and embodied AI.

4. 🛠️ Methods: Developed PhysXNet (a physics-annotated 3D dataset with 26K objects) using human-in-the-loop annotation pipeline, and PhysXGen (a dual-branch framework) that jointly models geometry and physics during generation.

5. 📊 Results and Evaluation: The framework outperformed baselines across multiple metrics including geometry quality (PSNR, CD, F-Score) and physics predictions (scale, material, affordance, kinematics, descriptions), while maintaining good generalization capability.

PhysX: Physical-Grounded 3D Asset Generation

1/2

Paper 2

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

Published: 2025-07-16

Link: http://arxiv.org/pdf/2507.12463

1. 📘 Topic and Domain: This paper introduces MMHU, a large-scale multimodal benchmark dataset for understanding human behavior in autonomous driving scenarios.

2. 💡 Previous Research and New Ideas: Previous research focused on individual aspects of human behavior (motion, intention, trajectory) in driving, but this paper proposes the first unified comprehensive dataset combining multiple behavior aspects with rich annotations.

3. ❓ Problem: The lack of a unified benchmark dataset for evaluating algorithms that comprehensively understand human behaviors in autonomous driving scenarios, which is crucial for driving safety.

4. 🛠️ Methods: The authors developed a human-in-the-loop annotation pipeline to collect and label 57k human instances from diverse video sources (Waymo, YouTube, self-collected), providing motion data, trajectories, text descriptions, and critical behavior labels.

5. 📊 Results and Evaluation: The dataset improved performance across multiple tasks when used for model training - motion prediction accuracy improved by 9.49 MPJPE, intention prediction accuracy increased by 7.4%, behavior QA accuracy rose by 15.96%, and motion generation showed significant qualitative improvements in driving scenarios.

MMHU: A Massive-Scale Multimodal Benchmark for Human Behavior Understanding

1/2

Paper 3

SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Published: 2025-07-16

Link: http://arxiv.org/pdf/2507.12415

1. 📘 Topic and Domain: Evaluating Large Language Models' ability to optimize code performance in real-world software repositories through the SWE-Perf benchmark.

2. 💡 Previous Research and New Ideas: Based on previous work in code correctness benchmarks like SWE-Bench and function-level optimization; introduces the first repository-level code performance optimization benchmark.

3. ❓ Problem: Addressing the gap in evaluating LLMs' capability to enhance code performance at the repository level, which requires more complex optimization than function-level improvements.

4. 🛠️ Methods: Created SWE-Perf benchmark with 140 curated instances from GitHub pull requests, including codebases, target functions, tests, and expert patches, evaluated under both file-level (oracle) and repo-level (realistic) settings.

5. 📊 Results and Evaluation: All tested LLMs showed significant performance gaps compared to expert-level optimization, with OpenHands performing best but still trailing expert performance by 8.59%, highlighting substantial room for improvement in LLMs' code optimization capabilities.