2025-10-31 Papers

1/2

Paper 1

The End of Manual Decoding: Towards Truly End-to-End Language Models

Published: 2025-10-30

Link: http://arxiv.org/pdf/2510.26697

1. 📘 Topic and Domain: The paper focuses on improving language model decoding by introducing AutoDeco, a novel architecture that enables truly end-to-end generation in the domain of natural language processing.

2. 💡 Previous Research and New Ideas: Previous research relied on static, manually-tuned decoding parameters (temperature, top-p); the paper proposes a new dynamic approach where the model learns to predict its own decoding parameters during generation.

3. ❓ Problem: The paper addresses the inefficient and suboptimal nature of manual decoding hyperparameter tuning in language models, which currently requires laborious hand-tuning and cannot adapt to different contexts within a single generation.

4. 🛠️ Methods: The authors developed AutoDeco, which augments transformers with lightweight prediction heads that dynamically predict temperature and top-p values at each generation step, using a differentiable soft top-p mechanism for training.

5. 📊 Results and Evaluation: AutoDeco outperformed standard decoding methods across eight benchmarks, matched oracle-tuned baselines without task-specific tuning, added only 1-2% latency overhead, and demonstrated an emergent ability to adjust generation style based on natural language commands.

The End of Manual Decoding: Towards Truly End-to-End Language Models

1/2

Paper 2

Kimi Linear: An Expressive, Efficient Attention Architecture

Published: 2025-10-30

Link: http://arxiv.org/pdf/2510.26692

1. 📘 Topic and Domain: Development of Kimi Linear, a hybrid linear attention architecture for large language models focusing on efficient attention mechanisms and computational architecture.

2. 💡 Previous Research and New Ideas: Based on Gated DeltaNet and linear attention mechanisms, introduces Kimi Delta Attention (KDA) with finer-grained gating for more effective RNN memory usage.

3. ❓ Problem: Addresses the computational inefficiencies of standard attention mechanisms in LLMs, particularly for long-context and reinforcement learning scenarios where traditional attention has quadratic time complexity.

4. 🛠️ Methods: Implements a hybrid architecture combining KDA with Multi-Head Latent Attention in a 3:1 ratio, using specialized DPLR transition matrices and chunkwise algorithm for efficient computation.

5. 📊 Results and Evaluation: Outperforms full attention models across various tasks, reduces KV cache usage by up to 75%, and achieves up to 6× faster decoding throughput for 1M context length while maintaining superior performance.

Kimi Linear: An Expressive, Efficient Attention Architecture

1/2

Paper 3

AMO-Bench: Large Language Models Still Struggle in High School Math Competitions

Published: 2025-10-30

Link: http://arxiv.org/pdf/2510.26768

1. 📘 Topic and Domain: The paper introduces AMO-Bench, a mathematical reasoning benchmark for evaluating Large Language Models' (LLMs) performance on high-difficulty math problems at or above International Mathematical Olympiad level.

2. 💡 Previous Research and New Ideas: Based on existing math benchmarks like AIME and MATH500 where LLMs are reaching performance saturation, this paper proposes a more challenging benchmark with entirely original problems that are cross-validated by experts.

3. ❓ Problem: The paper addresses the limitation of existing math benchmarks becoming less effective for evaluating top-tier LLMs due to performance saturation and potential data memorization issues.

4. 🛠️ Methods: The authors created 50 original math problems verified by experts, designed automatic grading methods combining parser-based and LLM-based approaches, and evaluated 26 different LLMs on the benchmark.

5. 📊 Results and Evaluation: The best-performing model achieved only 52.4% accuracy on AMO-Bench with most LLMs scoring below 40%, demonstrating significant room for improvement while showing promising scaling trends with increased test-time compute.