2025-11-27 Papers

1/2

Paper 1

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

Published: 2025-11-25

Link: http://arxiv.org/pdf/2511.20626

1. 📘 Topic and Domain: Development of a robust optimization algorithm (ROOT) for training large language models, focusing on improving stability and efficiency in deep learning optimization.

2. 💡 Previous Research and New Ideas: Based on Muon optimizer and Newton-Schulz iteration methods, proposing new adaptive coefficients for matrix orthogonalization and outlier suppression mechanisms.

3. ❓ Problem: Addressing two key limitations in existing optimizers: dimensional fragility in orthogonalization precision and vulnerability to outlier-induced noise during training.

4. 🛠️ Methods: Implements adaptive Newton iteration with dimension-specific coefficients for robust orthogonalization, and soft-thresholding for outlier suppression in gradient updates.

5. 📊 Results and Evaluation: Achieved superior performance across academic benchmarks compared to Muon and AdamW baselines, with improved convergence speed and training stability, demonstrating an average accuracy of 60.12% across various tasks.

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

1/2

Paper 2

Latent Collaboration in Multi-Agent Systems

Published: 2025-11-25

Link: http://arxiv.org/pdf/2511.20639

1. 📘 Topic and Domain: The paper focuses on enabling direct latent space collaboration between large language models in multi-agent systems, within the domain of natural language processing and artificial intelligence.

2. 💡 Previous Research and New Ideas: Based on previous research on text-based multi-agent LLM systems and single-model latent reasoning, this paper proposes a novel framework called LatentMAS that enables pure latent collaboration among multiple LLM agents without requiring text-based mediation.

3. ❓ Problem: The paper aims to overcome the inefficiencies and information bottlenecks of text-based collaboration between LLM agents by enabling them to collaborate directly in continuous latent space rather than through natural language.

4. 🛠️ Methods: The paper introduces LatentMAS, an end-to-end training-free framework that combines auto-regressive latent thoughts generation through last-layer hidden embeddings and cross-agent latent working memory transfer through shared KV caches.

5. 📊 Results and Evaluation: Across 9 benchmarks spanning math, science, commonsense reasoning and code generation, LatentMAS achieved up to 14.6% higher accuracy, reduced output token usage by 70.8%-83.7%, and provided 4×-4.3× faster end-to-end inference compared to baselines.

Latent Collaboration in Multi-Agent Systems

1/2

Paper 3

Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy

Published: 2025-11-26

Link: http://arxiv.org/pdf/2511.21579

1. 📘 Topic and Domain: Audio-visual content generation using diffusion models, focusing on synchronizing audio and video generation for both speech and environmental sounds.

2. 💡 Previous Research and New Ideas: Based on previous joint audio-video generation models that struggled with synchronization; proposes new cross-task synergy training and enhanced audio-visual alignment mechanisms.

3. ❓ Problem: Poor audio-video synchronization in existing open-source models due to "Correspondence Drift" during joint training and inefficient attention mechanisms.

4. 🛠️ Methods: Introduces three key innovations: Cross-Task Synergy training combining joint and single-modality generation, Global-Local Decoupled Interaction Module for temporal alignment, and Synchronization-Enhanced CFG for better audio-visual correspondence.

5. 📊 Results and Evaluation: Achieved state-of-the-art performance on their new Harmony-Bench dataset, significantly outperforming existing methods in audio-visual synchronization while maintaining high generation quality for both speech and environmental sounds.