2025-06-25 Papers

1/2

Paper 1

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

Published: 2025-06-23

Link: http://arxiv.org/pdf/2506.19290

1. 📘 Topic and Domain: The paper focuses on developing data scaling laws and datasets for software engineering tasks using Large Language Models (LLMs), specifically in the domain of automated code fixing and software development.

2. 💡 Previous Research and New Ideas: Based on previous work in code generation and software engineering benchmarks like SWE-bench, the paper proposes a new automated data curation pipeline that systematically scales both volume and diversity of software engineering datasets.

3. ❓ Problem: The paper addresses the lack of high-quality, large-scale training data for software engineering tasks, which has led to open-source LLMs consistently underperforming compared to proprietary models.

4. 🛠️ Methods: The authors developed a three-stage pipeline consisting of: (1) data collection and pre-filtering from GitHub repositories, (2) execution-based validation and runtime environment setup, and (3) agent trajectory generation, resulting in the Skywork-SWE dataset with 10,169 validated instances from 2,531 repositories.

5. 📊 Results and Evaluation: Their Skywork-SWE model achieved 38.0% pass@1 accuracy on SWE-bench Verified benchmark without verifiers, and 47.0% with test-time scaling, establishing a new state-of-the-art among Qwen2.5-Coder-32B-based LLMs while demonstrating clear data scaling laws in software engineering tasks.

Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs

1/2

Paper 2

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Published: 2025-06-23

Link: http://arxiv.org/pdf/2506.18792

1. 📘 Topic and Domain: Dynamic novel view synthesis and 4D reconstruction from monocular video inputs in computer vision.

2. 💡 Previous Research and New Ideas: Based on previous work in neural radiance fields, Gaussian splatting, and diffusion models, introducing a novel diffusion-aware reconstruction framework that leverages personalized diffusion models for enhanced view synthesis.

3. ❓ Problem: Solving the challenge of generating high-quality, photorealistic views of moving subjects from arbitrary viewpoints using only monocular video input, where disentangling structure from motion is ill-posed.

4. 🛠️ Methods: Uses a three-stage approach: initial monocular reconstruction, personalized diffusion model enhancement of novel views, and diffusion-aware reconstruction with dynamic region focusing and camera pose optimization.

5. 📊 Results and Evaluation: Outperformed state-of-the-art baselines on the DyCheck benchmark in visual quality and geometric consistency, showing substantial improvements in PSNR, SSIM, and LPIPS metrics, particularly in dynamic regions.

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

1/2

Paper 3

Matrix-Game: Interactive World Foundation Model

Published: 2025-06-23

Link: http://arxiv.org/pdf/2506.18701

1. 📘 Topic and Domain: Interactive world foundation model for controllable game world generation, specifically focused on Minecraft environments.

2. 💡 Previous Research and New Ideas: Based on video diffusion models and world modeling research, proposes a new two-stage training pipeline combining unlabeled pretraining for environment understanding with action-labeled training for interactive generation.

3. ❓ Problem: Addresses the challenges of acquiring high-quality training data, achieving fine-grained controllability, and establishing standardized evaluation benchmarks for interactive world generation.

4. 🛠️ Methods: Uses a 17B-parameter model trained on Matrix-Game-MC dataset (2,700+ hours unlabeled and 1,000+ hours labeled gameplay), employing diffusion transformers and autoregressive generation with keyboard/mouse control signals.

5. 📊 Results and Evaluation: Outperforms existing open-source Minecraft world models across all GameWorld Score metrics, particularly in controllability and physical consistency, validated through both quantitative benchmarks and double-blind human evaluations.