2025-10-27 Papers

1/2

Paper 1

DeepAgent: A General Reasoning Agent with Scalable Toolsets

Published: 2025-10-24

Link: http://arxiv.org/pdf/2510.21618

1. 📘 Topic and Domain: The paper introduces DeepAgent, an end-to-end deep reasoning agent that can autonomously use various tools and interact with environments, falling within the domain of AI agents and large language models.
2. 💡 Previous Research and New Ideas: Based on previous work in LLM-powered agents like ReAct and Plan-and-Solve, it proposes a novel unified reasoning process that integrates tool discovery and execution, moving away from predefined workflows.
3. ❓ Problem: The paper addresses the limitations of existing agent frameworks that rely on predefined workflows and limited tool sets, which constrains their ability to handle real-world tasks requiring flexible tool use and long-horizon interactions.
4. 🛠️ Methods: The paper implements autonomous memory folding to compress interaction history, uses brain-inspired memory architecture (episodic, working, and tool memories), and develops ToolPO - an end-to-end reinforcement learning strategy with LLM-simulated APIs.
5. 📊 Results and Evaluation: DeepAgent consistently outperformed baseline methods across eight benchmarks, including general tool-use tasks (ToolBench, API-Bank, TMDB, Spotify, ToolHop) and downstream applications (ALFWorld, WebShop, GAIA, HLE), demonstrating superior performance in both labeled-tool and open-set tool retrieval scenarios.

DeepAgent: A General Reasoning Agent with Scalable Toolsets

DeepAgent: Methodology Flow Chart Training Data Collection ToolBench, ALFWorld WebShop, DeepMath DeepAgent Architecture Main Reasoning Process + Auxiliary LLM Unified Thinking Process QwQ-32B Backbone Tool Management Dynamic Tool Search Tool Call Execution Memory Folding Episodic Memory Working Memory Tool Memory JSON Schema Four Action Types: Think Internal Reasoning Search Tool Discovery Call Tool Execution Fold Memory Compress ToolPO: End-to-End Reinforcement Learning LLM-Simulated APIs Tool-Call Advantage Fine-Grained Credit Assignment Clipped Surrogate Objective Evaluation Benchmarks: General Tool Usage Tasks ToolBench, API-Bank, TMDB, Spotify, ToolHop Downstream Applications ALFWorld, WebShop, GAIA, HLE Key Innovations: End-to-end reasoning with dynamic tool discovery Autonomous memory folding mechanism Brain-inspired structured memory (episodic, working, tool) ToolPO training with LLM-simulated APIs Fine-grained advantage attribution for tool calls Scalable to 16k+ tools with superior performance
Q1
1. What is the main innovation of DeepAgent compared to traditional agent frameworks?
It uses more advanced language models
It integrates tool discovery and execution within a single reasoning process
It can handle more types of tasks
Q2
2. Which component helps DeepAgent manage long interaction histories efficiently?
Autonomous memory folding mechanism
Tool simulator
Web search capability
Q3
3. What is the purpose of using LLM-simulated APIs in the ToolPO training strategy?
To reduce computational costs
To improve model accuracy
To enable more stable and efficient training without relying on unstable real-world APIs
1/2

Paper 2

Efficient Long-context Language Model Training by Core Attention Disaggregation

Published: 2025-10-20

Link: http://arxiv.org/pdf/2510.18121

1. 📘 Topic and Domain: The paper presents a technique called Core Attention Disaggregation (CAD) for improving long-context Large Language Model training efficiency.
2. 💡 Previous Research and New Ideas: Based on previous work in data parallelism and pipeline parallelism for LLM training, it proposes the novel idea of separating core attention computation from other model components to enable independent scaling.
3. ❓ Problem: The paper addresses the load imbalance issue in long-context LLM training caused by the quadratic growth of attention computation versus linear growth of other components.
4. 🛠️ Methods: The authors implemented CAD in a system called DistCA that uses token-level task scheduling, ping-pong execution for overlapping communication with computation, and in-place attention servers to optimize memory usage.
5. 📊 Results and Evaluation: Testing on up to 512 H200 GPUs with 512K context length showed up to 1.35x improvement in training throughput while eliminating data/pipeline parallelism stragglers and maintaining balanced compute and memory usage.

Efficient Long-context Language Model Training by Core Attention Disaggregation

Core Attention Disaggregation (CAD) Workflow Problem Load Imbalance in Long-context Training Key Observations 1. Statelessness 2. Composability Core Attention Disaggregation (CAD) DistCA System Implementation Document Processing CA-Task Generation Communication-Aware Scheduler Attention Servers In-place Servers Ping-Pong Execution Pipeline Support Core Attention Computation softmax(QK^T)V Load Balance Near-perfect across servers Communication Fully hidden overhead Throughput Up to 1.35x improvement Evaluation Results 512 H200 GPUs, 512K context length Eliminates DP/PP stragglers Key Components Problem Identification Observations & Analysis Core Method System Implementation Processing Components Optimizations Results & Benefits Evaluation
Q1
1. What is the main problem that Core Attention Disaggregation (CAD) aims to solve?
High memory usage in LLM training
Load imbalance between quadratic attention and linear components
Slow communication between GPUs
Q2
2. What unique characteristic of core attention makes CAD effective?
Its ability to run on multiple GPUs simultaneously
Its requirement for large memory allocation
Its statelessness and composability at token level
Q3
3. What was the maximum throughput improvement achieved by DistCA in the experiments?
1.35x
2.5x
1.15x
1/2

Paper 3

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

Published: 2025-10-22

Link: http://arxiv.org/pdf/2510.19600

1. 📘 Topic and Domain: Automated generation of academic project webpages from research papers using AI agents, in the domain of scientific communication and natural language processing.
2. 💡 Previous Research and New Ideas: Based on prior work in automated presentation generation (slides, posters, videos), this paper introduces a novel multi-agent collaborative approach for webpage generation with human-in-the-loop refinement.
3. ❓ Problem: Researchers spend significant time manually creating project webpages to communicate their work, which takes away from core research activities and results in inconsistent quality.
4. 🛠️ Methods: Implements AutoPage, a multi-agent system with three phases: narrative planning, multimodal content generation, and interactive page rendering, incorporating verification mechanisms and optional human feedback checkpoints.
5. 📊 Results and Evaluation: AutoPage generates high-quality webpages in under 15 minutes for less than $0.1, outperforming baselines across content accuracy, visual quality, and user preference metrics as evaluated on their new PageBench benchmark.

Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1

AutoPage: Human-Agent Collaborative Paper-to-Page Crafting Input: Paper (PDF) Phase 1: Narrative Planning & Structuring Paper Content Parser Page Content Planner Content Checker Phase 2: Multimodal Content Generation Text Content Generator Visual Content Generator Content Checker Phase 3: Interactive Page Rendering Page Template Matcher HTML Generator HTML Checker Human Feedback Human Feedback Human Feedback Output: Project Page (HTML) Key Features of AutoPage Coarse-to-Fine Pipeline Hierarchical generation from narrative to visual elements Multi-Agent System Specialized agents for each generation phase Quality Verification Checker agents prevent hallucination Human Collaboration Optional human checkpoints for author alignment Performance Highlights <15 min Generation Time <$0.1 Cost per Page 1500+ PageBench Dataset 7.16/10 User Preference Model Agnostic
Q1
1. What is the main innovation of AutoPage compared to previous automated research communication tools?
It's the first system to generate any kind of research presentations
It incorporates human feedback and verification in a multi-agent collaborative process
It can generate webpages completely automatically without any human input
Q2
2. What is the approximate cost and time required for AutoPage to generate a project webpage?
1 hour and $1.00
30 minutes and $0.50
15 minutes and $0.10
Q3
3. Which phase of AutoPage's pipeline comes first in the generation process?
Interactive page rendering
Multimodal content generation
Narrative planning and structuring