2025-11-12 Papers

1/2

Paper 1

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Published: 2025-11-10

Link: http://arxiv.org/pdf/2511.07327

1. 📘 Topic and Domain: The paper presents IterResearch, a novel paradigm for long-horizon deep research agents that can autonomously gather and synthesize information through iterative exploration in the domain of AI agents and information seeking.

2. 💡 Previous Research and New Ideas: Based on previous mono-contextual deep research approaches that accumulate all information in a single expanding context window, this paper proposes a new iterative paradigm that reconstructs the workspace after each interaction to maintain consistent reasoning capacity.

3. ❓ Problem: The paper aims to solve the limitations of existing deep research agents that suffer from context suffocation and noise contamination when handling long-horizon tasks requiring extensive information gathering and synthesis.

4. 🛠️ Methods: The authors reformulate long-horizon research as a Markov Decision Process with strategic workspace reconstruction and develop Efficiency-Aware Policy Optimization with geometric reward discounting for training.

5. 📊 Results and Evaluation: The approach achieved an average 14.5 percentage point improvement across six benchmarks compared to existing open-source agents, demonstrated unprecedented scaling to 2048 interactions, and improved frontier models by up to 19.2pp when used as a prompting strategy.

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

1/2

Paper 2

Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

Published: 2025-11-10

Link: http://arxiv.org/pdf/2511.07080

1. 📘 Topic and Domain: Development of Wasm, a data processing pipeline for creating structured Arabic multimodal corpora from web content, in the domain of natural language processing and multimodal machine learning.

2. 💡 Previous Research and New Ideas: Based on OBELICS framework for multimodal data processing, introducing new Arabic-specific adaptations and structural preservation techniques in Markdown format while maintaining document hierarchy.

3. ❓ Problem: The lack of high-quality Arabic multimodal datasets that preserve document structure, which limits the development of Arabic language models and multimodal models.

4. 🛠️ Methods: Implements a multi-stage pipeline including metadata extraction, HTML processing, content structuring, and quality filtering with Arabic-specific adjustments to perplexity modeling and node-level deduplication.

5. 📊 Results and Evaluation: Produced a flexible framework that successfully preserves both text and visual content structure, with comparative analysis showing improved filtering performance over existing approaches, though specific quantitative results were not extensively detailed.

Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

1/2

Paper 3

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Published: 2025-11-08

Link: http://arxiv.org/pdf/2511.06221

1. 📘 Topic and Domain: Development of VibeThinker-1.5B, a small language model for logical reasoning in mathematics and coding, challenging the assumption that large models are necessary for strong reasoning capabilities.

2. 💡 Previous Research and New Ideas: Based on OpenAI's o1 model's reasoning paradigm and various large language models, proposing a novel "Spectrum-to-Signal Principle" that enables small models to achieve comparable reasoning abilities to larger models.

3. ❓ Problem: Addressing the industry assumption that scaling model parameters is essential for enhancing logical reasoning capabilities, aiming to achieve comparable performance with a much smaller and cost-effective model.

4. 🛠️ Methods: Implemented a two-stage approach: "Two-Stage Diversity-Exploring Distillation" for SFT phase to generate diverse solutions, followed by "MaxEnt-Guided Policy Optimization" for RL phase to amplify correct reasoning paths.

5. 📊 Results and Evaluation: VibeThinker-1.5B outperformed larger models on mathematical benchmarks (AIME24: 80.3, AIME25: 74.4, HMMT25: 50.4) and coding tasks (LiveCodeBench V6: 51.1), surpassing models 400 times larger while costing only $7,800 to train.