2026-04-02 Papers

1/2

Paper 1

Reasoning Shift: How Context Silently Shortens LLM Reasoning

Published: 2026-04-01

Link: http://arxiv.org/pdf/2604.01161

1. 📘 Topic and Domain: This paper investigates the robustness of reasoning behaviors in large language models (LLMs) with test-time scaling, specifically how different context conditions affect reasoning quality and length.

2. 💡 Previous Research and New Ideas: Based on prior work in Chain-of-Thought reasoning, test-time scaling, and long-context language models; introduces the novel "reasoning shift" phenomenon where reasoning models produce significantly shorter reasoning traces (up to 50% fewer tokens) when problems are surrounded by irrelevant context.

3. ❓ Problem: The paper investigates whether models can solve isolated subproblems as effectively when surrounded by irrelevant context as they do in isolation, and how context length and content affect LLM reasoning capabilities.

4. 🛠️ Methods: Systematic evaluation across three context scenarios—problems with lengthy irrelevant context, multi-turn conversational settings, and subtasks within complex tasks—using multiple reasoning models (Qwen-3.5-27B, GPT-OSS-120B, Gemini 3 Flash Preview, Kimi K2 Thinking) on IMOAnswerBench and MATH500, with sentence-level analysis of reasoning traces.

5. 📊 Results and Evaluation: All models generated up to 50% fewer reasoning tokens under non-baseline conditions, with reduced self-verification and uncertainty management behaviors; performance dropped by 9-15% on challenging tasks, though the position of first answer candidate remained similar, indicating compression occurs post-answer rather than affecting initial reasoning.

Reasoning Shift: How Context Silently Shortens LLM Reasoning

1/2

Paper 2

Terminal Agents Suffice for Enterprise Automation

Published: 2026-03-31

Link: http://arxiv.org/pdf/2604.00073

1. 📘 Topic and Domain: The paper investigates enterprise automation using AI agents, comparing terminal-based coding agents, web/GUI agents, and tool-augmented MCP agents across enterprise platforms (ServiceNow, GitLab, ERPNext).

2. 💡 Previous Research and New Ideas: Based on prior work on web agents, GUI agents, and tool-augmented agents (MCP), the paper proposes that simple terminal-based coding agents operating through direct API interaction can be sufficient and more cost-effective than complex agent architectures for enterprise automation.

3. ❓ Problem: The paper investigates whether sophisticated agent stacks with heavy abstraction layers (like MCP tool registries or GUI interfaces) are necessary for enterprise automation, or whether minimal terminal agents interacting directly with platform APIs can achieve comparable or better results.

4. 🛠️ Methods: The authors implemented three agent paradigms with identical LLM backbones: MCP-based tool agents, Playwright web agents, and terminal-based coding agents (StarShell). They evaluated these on 729 tasks across three enterprise platforms using four frontier LLMs, measuring success rate and inference cost.

5. 📊 Results and Evaluation: Terminal agents achieved the best cost-performance tradeoff, matching or exceeding web agent accuracy in 7/12 platform-model combinations while costing 5× less on average; MCP agents had the lowest success rates (11.5-68.9%) due to limited tool coverage; documentation access and self-generated skills provided marginal or platform-dependent benefits.

Terminal Agents Suffice for Enterprise Automation

1/2

Paper 3

Universal YOCO for Efficient Depth Scaling

Published: 2026-04-01

Link: http://arxiv.org/pdf/2604.01220

1. 📘 Topic and Domain: The paper addresses efficient inference-time scaling for large language models through architectural innovation in decoder-decoder Transformer designs. 2. **💡 Previous Research and New Ideas:** Building on the YOCO (You Only Cache Once) decoder-decoder architecture and Universal Transformer, the paper proposes combining YOCO with recursive computation via a Universal Self-Decoder that iterates efficient-attention layers. 3. **❓ Problem:** Standard Transformers and prior recursive approaches like Universal Transformer suffer from high computational overhead and linearly growing KV cache as depth increases, making efficient inference-time scaling difficult. 4. **🛠️ Methods:** YOCO-U replaces the static Self-Decoder with a Universal Self-Decoder that performs T iterations of parameter-shared efficient self-attention (e.g., sliding-window attention) to enhance representational depth while keeping the Cross-Decoder unchanged for constant global KV cache. 5. **📊 Results and Evaluation:** YOCO-U achieves 0.033 lower loss than YOCO at equal FLOPs, requires ~62% fewer training tokens for comparable performance, and maintains efficient inference with linear pre-filling and negligible KV cache overhead, while outperforming baselines on general and long-context benchmarks.

2. 💡 Previous Research and New Ideas: Building on the YOCO (You Only Cache Once) decoder-decoder architecture and Universal Transformer, the paper proposes combining YOCO with recursive computation via a Universal Self-Decoder that iterates efficient-attention layers. 3. **❓ Problem:** Standard Transformers and prior recursive approaches like Universal Transformer suffer from high computational overhead and linearly growing KV cache as depth increases, making efficient inference-time scaling difficult. 4. **🛠️ Methods:** YOCO-U replaces the static Self-Decoder with a Universal Self-Decoder that performs T iterations of parameter-shared efficient self-attention (e.g., sliding-window attention) to enhance representational depth while keeping the Cross-Decoder unchanged for constant global KV cache. 5. **📊 Results and Evaluation:** YOCO-U achieves 0.033 lower loss than YOCO at equal FLOPs, requires ~62% fewer training tokens for comparable performance, and maintains efficient inference with linear pre-filling and negligible KV cache overhead, while outperforming baselines on general and long-context benchmarks.

3. ❓ Problem: Standard Transformers and prior recursive approaches like Universal Transformer suffer from high computational overhead and linearly growing KV cache as depth increases, making efficient inference-time scaling difficult. 4. **🛠️ Methods:** YOCO-U replaces the static Self-Decoder with a Universal Self-Decoder that performs T iterations of parameter-shared efficient self-attention (e.g., sliding-window attention) to enhance representational depth while keeping the Cross-Decoder unchanged for constant global KV cache. 5. **📊 Results and Evaluation:** YOCO-U achieves 0.033 lower loss than YOCO at equal FLOPs, requires ~62% fewer training tokens for comparable performance, and maintains efficient inference with linear pre-filling and negligible KV cache overhead, while outperforming baselines on general and long-context benchmarks.

4. 🛠️ Methods: YOCO-U replaces the static Self-Decoder with a Universal Self-Decoder that performs T iterations of parameter-shared efficient self-attention (e.g., sliding-window attention) to enhance representational depth while keeping the Cross-Decoder unchanged for constant global KV cache. 5. **📊 Results and Evaluation:** YOCO-U achieves 0.033 lower loss than YOCO at equal FLOPs, requires ~62% fewer training tokens for comparable performance, and maintains efficient inference with linear pre-filling and negligible KV cache overhead, while outperforming baselines on general and long-context benchmarks.

5. 📊 Results and Evaluation: YOCO-U achieves 0.033 lower loss than YOCO at equal FLOPs, requires ~62% fewer training tokens for comparable performance, and maintains efficient inference with linear pre-filling and negligible KV cache overhead, while outperforming baselines on general and long-context benchmarks.

1. 📘 主题与领域: 该论文聚焦于通过解码器-解码器架构创新实现大型语言模型的高效推理时扩展。 2. **💡 先前研究与新思路：** 基于YOCO（一次只缓存一次）解码器-解码器架构和通用Transformer，提出将YOCO与递归计算相结合，通过通用自解码器迭代高效注意力层。 3. **❓ 问题：** 标准Transformer和通用Transformer等递归方法在深度增加时面临高计算开销和KV缓存线性增长的问题，难以实现高效的推理时扩展。 4. **🛠️ 方法：** YOCO-U将静态自解码器替换为通用自解码器，通过参数共享的高效自注意力（如滑动窗口注意力）执行T次迭代以增强表征深度，同时保持交叉解码器不变以维持恒定的全局KV缓存。 5. **📊 结果与评估：** YOCO-U在相同FLOPs下比YOCO获得0.033更低的损失，在达到可比性能时减少约62%的训练token，同时保持线性预填充和可忽略的KV缓存开销，在通用和长上下文基准测试中优于基线。

2. 💡 先前研究与新思路: 基于YOCO（一次只缓存一次）解码器-解码器架构和通用Transformer，提出将YOCO与递归计算相结合，通过通用自解码器迭代高效注意力层。 3. **❓ 问题：** 标准Transformer和通用Transformer等递归方法在深度增加时面临高计算开销和KV缓存线性增长的问题，难以实现高效的推理时扩展。 4. **🛠️ 方法：** YOCO-U将静态自解码器替换为通用自解码器，通过参数共享的高效自注意力（如滑动窗口注意力）执行T次迭代以增强表征深度，同时保持交叉解码器不变以维持恒定的全局KV缓存。 5. **📊 结果与评估：** YOCO-U在相同FLOPs下比YOCO获得0.033更低的损失，在达到可比性能时减少约62%的训练token，同时保持线性预填充和可忽略的KV缓存开销，在通用和长上下文基准测试中优于基线。

3. ❓ 问题: 标准Transformer和通用Transformer等递归方法在深度增加时面临高计算开销和KV缓存线性增长的问题，难以实现高效的推理时扩展。 4. **🛠️ 方法：** YOCO-U将静态自解码器替换为通用自解码器，通过参数共享的高效自注意力（如滑动窗口注意力）执行T次迭代以增强表征深度，同时保持交叉解码器不变以维持恒定的全局KV缓存。 5. **📊 结果与评估：** YOCO-U在相同FLOPs下比YOCO获得0.033更低的损失，在达到可比性能时减少约62%的训练token，同时保持线性预填充和可忽略的KV缓存开销，在通用和长上下文基准测试中优于基线。

4. 🛠️ 方法: YOCO-U将静态自解码器替换为通用自解码器，通过参数共享的高效自注意力（如滑动窗口注意力）执行T次迭代以增强表征深度，同时保持交叉解码器不变以维持恒定的全局KV缓存。 5. **📊 结果与评估：** YOCO-U在相同FLOPs下比YOCO获得0.033更低的损失，在达到可比性能时减少约62%的训练token，同时保持线性预填充和可忽略的KV缓存开销，在通用和长上下文基准测试中优于基线。

5. 📊 结果与评估: YOCO-U在相同FLOPs下比YOCO获得0.033更低的损失，在达到可比性能时减少约62%的训练token，同时保持线性预填充和可忽略的KV缓存开销，在通用和长上下文基准测试中优于基线。

Universal YOCO for Efficient Depth Scaling

Today's Reading Tips 今日阅读推荐

Read Paper 1 first for its novel 'reasoning shift' phenomenon—a 50% reduction in reasoning tokens with irrelevant context—which has immediate practical implications for deploying LLMs in real-world applications. Papers 1 and 3 both address inference efficiency from different angles (test-time vs. architectural), while Paper 2's findings on terminal agents complement the efficiency theme by demonstrating cost-effective agent design in enterprise settings.