2025-07-23 Papers

1/2

Paper 1

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Published: 2025-07-22

Link: http://arxiv.org/pdf/2507.16784

1. 📘 Topic and Domain: Development of a large language model (TIM) and inference runtime (TIMRUN) for efficient long-horizon reasoning and tool use in natural language processing.

2. 💡 Previous Research and New Ideas: Based on traditional LLM architectures and multi-agent frameworks, proposes a novel approach modeling reasoning as recursive trees rather than linear sequences, with dynamic pruning of completed subtasks.

3. ❓ Problem: Addresses the context window limitations of LLMs that bottleneck reasoning accuracy and efficiency, particularly for long-horizon reasoning tasks and multi-hop tool use.

4. 🛠️ Methods: Implements a Thread Inference Model (TIM) that decomposes complex tasks into subtasks, coupled with TIMRUN inference engine that enables dynamic memory management and efficient tool integration through subtask pruning.

5. 📊 Results and Evaluation: Achieves comparable or better performance than baseline models while using less than 50% of cache slots, maintains stable throughput with multiple tool calls, and matches performance of complex agent frameworks without requiring specialized agent design.

Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

1/2

Paper 2

Step-Audio 2 Technical Report

Published: 2025-07-22

Link: http://arxiv.org/pdf/2507.16632

1. 📘 Topic and Domain: Step-Audio 2 is an end-to-end multi-modal large language model for audio understanding and speech conversation in the domain of artificial intelligence and speech processing.

2. 💡 Previous Research and New Ideas: Based on previous LALMs like GPT-4o, Qwen-Audio, and Step-Audio, it proposes new ideas of integrating discrete audio token generation into language modeling and incorporating retrieval-augmented generation with external tools.

3. ❓ Problem: The paper addresses the challenges in achieving natural and intelligent speech interaction, particularly in handling paralinguistic information and accessing real-world textual and acoustic knowledge.

4. 🛠️ Methods: The authors used a latent audio encoder, reasoning-centric reinforcement learning, multi-stage training on 680 billion text tokens and 8 million hours of audio data, and integrated retrieval-augmented generation with external tools like web and audio search.

5. 📊 Results and Evaluation: Step-Audio 2 achieved state-of-the-art performance across various benchmarks, including ASR (3.18% WER for English, 3.11% CER for Chinese), audio understanding (77.4% on MMAU), and speech conversation tasks, outperforming both open-source and commercial solutions.

Step-Audio 2 Technical Report

1/2

Paper 3

Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Published: 2025-07-22

Link: http://arxiv.org/pdf/2507.16814

1. 📘 Topic and Domain: Semi-off-policy reinforcement learning for enhancing visual slow-thinking reasoning capabilities in large vision-language models (LVLMs).

2. 💡 Previous Research and New Ideas: Based on previous research in on-policy and off-policy reinforcement learning for LVLMs, the paper proposes a novel semi-off-policy approach that combines on-policy visual understanding with off-policy reasoning.

3. ❓ Problem: The paper aims to solve the limitations of both on-policy RL (restricted by initial policy distribution) and off-policy RL (visual hallucination issues) in developing visual slow-thinking reasoning abilities in LVLMs.

4. 🛠️ Methods: SOPHIA combines on-policy visual understanding from LVLM with off-policy slow-thinking reasoning from language models, assigns outcome-based rewards to reasoning, propagates visual rewards backward, and uses off-policy RL algorithms to update the LVLM policy.

5. 📊 Results and Evaluation: SOPHIA improved InternVL3.0-38B by 8.50% on average across benchmarks, achieving state-of-the-art performance among open-source LVLMs and outperforming some closed-source models on MathVision (49.08%) and OlympiadBench (49.95%).