1. 📘 Topic and Domain: The paper focuses on enabling direct latent space collaboration between large language models in multi-agent systems, within the domain of natural language processing and artificial intelligence.
2. 💡 Previous Research and New Ideas: Based on previous research on text-based multi-agent LLM systems and single-model latent reasoning, this paper proposes a novel framework called LatentMAS that enables pure latent collaboration among multiple LLM agents without requiring text-based mediation.
3. ❓ Problem: The paper aims to overcome the inefficiencies and information bottlenecks of text-based collaboration between LLM agents by enabling them to collaborate directly in continuous latent space rather than through natural language.
4. 🛠️ Methods: The paper introduces LatentMAS, an end-to-end training-free framework that combines auto-regressive latent thoughts generation through last-layer hidden embeddings and cross-agent latent working memory transfer through shared KV caches.
5. 📊 Results and Evaluation: Across 9 benchmarks spanning math, science, commonsense reasoning and code generation, LatentMAS achieved up to 14.6% higher accuracy, reduced output token usage by 70.8%-83.7%, and provided 4×-4.3× faster end-to-end inference compared to baselines.