2025-10-27 Papers

1/2

Paper 1

DeepAgent: A General Reasoning Agent with Scalable Toolsets

Published: 2025-10-24

Link: http://arxiv.org/pdf/2510.21618

1. 📘 Topic and Domain: The paper introduces DeepAgent, an end-to-end deep reasoning agent that can autonomously use various tools and interact with environments, falling within the domain of AI agents and large language models.

2. 💡 Previous Research and New Ideas: Based on previous work in LLM-powered agents like ReAct and Plan-and-Solve, it proposes a novel unified reasoning process that integrates tool discovery and execution, moving away from predefined workflows.

3. ❓ Problem: The paper addresses the limitations of existing agent frameworks that rely on predefined workflows and limited tool sets, which constrains their ability to handle real-world tasks requiring flexible tool use and long-horizon interactions.

4. 🛠️ Methods: The paper implements autonomous memory folding to compress interaction history, uses brain-inspired memory architecture (episodic, working, and tool memories), and develops ToolPO - an end-to-end reinforcement learning strategy with LLM-simulated APIs.

5. 📊 Results and Evaluation: DeepAgent consistently outperformed baseline methods across eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrating superior performance in both labeled-tool and open-set tool retrieval scenarios.

DeepAgent: A General Reasoning Agent with Scalable Toolsets

1/2

Paper 2

Efficient Long-context Language Model Training by Core Attention Disaggregation

Published: 2025-10-20

Link: http://arxiv.org/pdf/2510.18121

1. 📘 Topic and Domain: The paper presents a technique called Core Attention Disaggregation (CAD) for improving long-context Large Language Model training efficiency.

2. 💡 Previous Research and New Ideas: Based on previous work in data parallelism and pipeline parallelism for LLM training, it proposes the novel idea of separating core attention computation from other model components to enable independent scaling.

3. ❓ Problem: The paper addresses the load imbalance issue in long-context LLM training caused by the quadratic growth of attention computation versus linear growth of other components.

4. 🛠️ Methods: The authors implemented CAD in a system called DistCA that uses token-level task scheduling, ping-pong execution for overlapping communication with computation, and in-place attention servers to optimize memory usage.

5. 📊 Results and Evaluation: Testing on up to 512 H200 GPUs with 512K context length showed up to 1.35x improvement in training throughput while eliminating data/pipeline parallelism stragglers and maintaining balanced compute and memory usage.

Efficient Long-context Language Model Training by Core Attention Disaggregation

1/2

Paper 3

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Published: 2025-10-22

Link: http://arxiv.org/pdf/2510.19600

1. 📘 Topic and Domain: Automated generation of academic project webpages from research papers using AI agents, in the domain of scientific communication and natural language processing.

2. 💡 Previous Research and New Ideas: Based on prior work in automated presentation generation (slides, posters, videos), this paper introduces a novel multi-agent collaborative approach for webpage generation with human-in-the-loop refinement.

3. ❓ Problem: Researchers spend significant time manually creating project webpages to communicate their work, which takes away from core research activities and results in inconsistent quality.

4. 🛠️ Methods: Implements AutoPage, a multi-agent system with three phases: narrative planning, multimodal content generation, and interactive page rendering, incorporating verification mechanisms and optional human feedback checkpoints.

5. 📊 Results and Evaluation: AutoPage generates high-quality webpages in under 15 minutes for less than $0.1, outperforming baselines across content accuracy, visual quality, and user preference metrics as evaluated on their new PageBench benchmark.