2026-03-17 Papers

1/2

Paper 1

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Published: 2026-03-16

Link: http://arxiv.org/pdf/2603.15594

1. 📘 Topic and Domain: The paper presents OpenSeeker, a fully open-source search agent for web-based information retrieval in the domain of Large Language Model agents.

2. 💡 Previous Research and New Ideas: Building on ReAct paradigm and existing search agents dominated by corporations, the paper proposes fact-grounded scalable controllable QA synthesis and denoised trajectory synthesis to democratize high-quality training data.

3. ❓ Problem: The paper addresses the lack of transparent, high-quality training data for search agents, which has been monopolized by industrial giants and hindered open-source community progress.

4. 🛠️ Methods: The authors use graph expansion and entity obfuscation to generate complex multi-hop QA pairs, and employ retrospective summarization during trajectory synthesis to denoise web content while training on raw data.

5. 📊 Results and Evaluation: OpenSeeker achieves state-of-the-art performance among open-source agents across four benchmarks (BrowseComp: 29.5%, BrowseComp-ZH: 48.4%, xbench: 74.0%, WideSearch: 59.4%) using only 11.7k samples and simple SFT, even surpassing some industrial models trained with extensive resources.

OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

1/2

Paper 2

Grounding World Simulation Models in a Real-World Metropolis

Published: 2026-03-16

Link: http://arxiv.org/pdf/2603.15583

1. 📘 Topic and Domain: The paper presents a real-world grounded video world simulation model that generates city-scale videos anchored in actual urban environments, specifically Seoul.

2. 💡 Previous Research and New Ideas: Building on pretrained video world models and diffusion transformers, the paper introduces retrieval-augmented generation using street-view images to ground video generation in real locations rather than imagined environments.

3. ❓ Problem: The paper addresses the limitation that existing world models operate in entirely imagined environments, proposing to generate temporally consistent, spatially faithful videos grounded in actual physical locations.

4. 🛠️ Methods: The authors use cross-temporal pairing to handle temporal misalignment, synthetic urban datasets for trajectory diversity, view interpolation for sparse data, and a Virtual Lookahead Sink mechanism for long-horizon stability.

5. 📊 Results and Evaluation: SWM outperforms existing world models on benchmarks across Seoul, Busan, and Ann Arbor in visual quality, camera adherence, temporal coherence, and structural fidelity, maintaining stable generation over trajectories reaching hundreds of meters.

Grounding World Simulation Models in a Real-World Metropolis

1/2

Paper 3

AI Can Learn Scientific Taste

Published: 2026-03-15

Link: http://arxiv.org/pdf/2603.14473

1. 📘 Topic and Domain: The paper focuses on training AI models to develop scientific taste - the ability to judge and propose research ideas with high potential impact - in the domain of AI for scientific research.

2. 💡 Previous Research and New Ideas: Building on reinforcement learning paradigms like RLHF and RLVR, the paper proposes Reinforcement Learning from Community Feedback (RLCF), a novel approach that uses large-scale citation data as community signals to train models for scientific judgment and ideation.

3. ❓ Problem: The paper addresses the gap between current AI scientists' technical capabilities and human scientists' ability to identify high-impact research directions, aiming to enhance AI's capacity for scientific taste beyond just executing research tasks.

4. 🛠️ Methods: The authors use GRPO to train Scientific Judge on 700K citation-based paper pairs for preference modeling, then employ Comparison-Based GRPO with Scientific Judge as a reward model to train Scientific Thinker for generating high-impact research ideas.

5. 📊 Results and Evaluation: Scientific Judge achieves 80.6% accuracy, outperforming GPT-4o and Gemini 3 Pro, and generalizes across time, fields, and peer-review metrics; Scientific Thinker achieves 81.5% win rate against baseline models in proposing higher-impact research ideas.