2025-10-29 Papers

1/2

Paper 1

InteractComp: Evaluating Search Agents With Ambiguous Queries

Published: 2025-10-28

Link: http://arxiv.org/pdf/2510.24668

1. 📘 Topic and Domain: Evaluating language models' ability to handle ambiguous search queries through interactive clarification in information retrieval and natural language processing.

2. 💡 Previous Research and New Ideas: Based on existing search agent benchmarks like GAIA and BrowseComp, but introduces a novel focus on interaction capabilities during search, which previous benchmarks overlooked.

3. ❓ Problem: Current search agents assume user queries are complete and unambiguous, failing to handle real-world scenarios where queries require clarification through interaction.

4. 🛠️ Methods: Created InteractComp benchmark with 210 expert-curated questions across 9 domains using a target-distractor methodology that creates genuine ambiguity resolvable only through interaction with users.

5. 📊 Results and Evaluation: The best model achieved only 13.73% accuracy with interaction available versus 71.50% with complete context, revealing systematic overconfidence rather than reasoning deficits, while forced interaction improved performance from 14% to 40%.

InteractComp: Evaluating Search Agents With Ambiguous Queries

1/2

Paper 2

Tongyi DeepResearch Technical Report

Published: 2025-10-28

Link: http://arxiv.org/pdf/2510.24701

1. 📘 Topic and Domain: The paper presents Tongyi DeepResearch, an open-source large language model designed specifically for autonomous deep information-seeking research tasks.

2. 💡 Previous Research and New Ideas: Based on previous work in LLMs and agent systems, it introduces a novel end-to-end agentic training framework combining mid-training and post-training phases, along with automated data synthesis and customized environments.

3. ❓ Problem: The paper aims to develop an efficient, open-source AI research agent capable of conducting complex, multi-step reasoning and information seeking tasks that would typically take humans several hours.

4. 🛠️ Methods: The authors used a combination of agentic mid-training, post-training, automated data synthesis pipeline, and stage-specific environments, built on a 30.5B parameter model that activates only 3.3B parameters per token.

5. 📊 Results and Evaluation: The model achieved state-of-the-art performance across multiple benchmarks, including 32.9 on Humanity's Last Exam, 43.4 on BrowseComp, 72.2 on WebWalkerQA, and others, outperforming both open-source and proprietary systems.

Tongyi DeepResearch Technical Report

1/2

Paper 3

Uniform Discrete Diffusion with Metric Path for Video Generation

Published: 2025-10-28

Link: http://arxiv.org/pdf/2510.24717

1. 📘 Topic and Domain: Video generation using discrete diffusion models in the computer vision and deep learning domain.

2. 💡 Previous Research and New Ideas: Based on continuous diffusion models and discrete tokenization approaches, proposes a novel framework called URSA that bridges discrete and continuous methods through iterative global refinement of discrete tokens.

3. ❓ Problem: Addresses the gap between discrete and continuous video generation approaches, particularly the challenges of error accumulation and long-context inconsistency in discrete methods.

4. 🛠️ Methods: Introduces Uniform discRete diffuSion with metric pAth (URSA) featuring a Linearized Metric Path and Resolution-dependent Timestep Shifting mechanism, along with asynchronous temporal fine-tuning for multi-task capabilities.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance with a text-to-video score of 82.4 on VBench, image-to-video score of 86.2, and text-to-image score of 86.0 on DPG-Bench, demonstrating competitive results against both discrete and continuous approaches.