1. 📘 Topic and Domain: The paper presents LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model focused on agentic reasoning capabilities in the domain of large language models.
2. 💡 Previous Research and New Ideas: The paper builds on LongCat-Flash-Chat's pre-training recipe and extends it with novel ideas including environment scaling for multi-domain training, robust training under noisy environments, and a Heavy Thinking mode that jointly scales reasoning depth and width.
3. ❓ Problem: The paper aims to solve the challenge of enabling models to perform complex real-world tasks through adaptive interaction with external environments, addressing the limitations of existing models in long-horizon trajectories and heterogeneous environment interactions.
4. 🛠️ Methods: The authors use a unified training framework combining domain-parallel expert training, scalable environment construction across 20+ domains, asynchronous reinforcement learning (DORA system), curriculum-based noise injection, and a two-stage Heavy Thinking mode for test-time scaling.
5. 📊 Results and Evaluation: LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on agentic benchmarks (73.1% on BrowseComp, 77.7% on RWSearch, 88.2% on τ2-Bench, 29.3% on VitaBench) while maintaining competitive performance on general reasoning tasks, demonstrating strong generalization and robustness to real-world noise.