2025-11-12 Papers

1/2

Paper 1

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

Published: 2025-11-10

Link: http://arxiv.org/pdf/2511.07327

1. 📘 Topic and Domain: The paper presents IterResearch, a novel paradigm for long-horizon deep research agents that can autonomously gather and synthesize information through iterative exploration in the domain of AI agents and information seeking.
2. 💡 Previous Research and New Ideas: Based on previous mono-contextual deep research approaches that accumulate all information in a single expanding context window, this paper proposes a new iterative paradigm that reconstructs the workspace after each interaction to maintain consistent reasoning capacity.
3. ❓ Problem: The paper aims to solve the limitations of existing deep research agents that suffer from context suffocation and noise contamination when handling long-horizon tasks requiring extensive information gathering and synthesis.
4. 🛠️ Methods: The authors reformulate long-horizon research as a Markov Decision Process with strategic workspace reconstruction and develop Efficiency-Aware Policy Optimization with geometric reward discounting for training.
5. 📊 Results and Evaluation: The approach achieved an average 14.5 percentage point improvement across six benchmarks compared to existing open-source agents, demonstrated unprecedented scaling to 2048 interactions, and improved frontier models by up to 19.2pp when used as a prompting strategy.

IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction

IterResearch: Markovian State Reconstruction Workflow Input Question (q) Initial State s₀ Markov Decision Process Framework State Space S (q, Mᵗ, {aᵗ⁻¹, TRᵗ⁻¹}) Bounded workspace Decision Space D (Think, Report, Action) Structured output Environment E Tools: Search, Scholar Browser, Python Iterative Deep-Research Process Round t Workspace Question: q Report: Mᵗ Context: {aᵗ⁻¹, TRᵗ⁻¹} Agent Policy π Generate Decision dᵗ (Think, Report, Action) Decision Processing Think: Reasoning Report: Mᵗ⁺¹ (Updated) Action: aᵗ Tool Execution Google Search Web Browser Scholar Python Transition Function T Workspace Reconstruction sᵗ⁺¹ = (q, Mᵗ⁺¹, {aᵗ, TRᵗ}) Strategic Forgetting O(1) Complexity EAPO Training Efficiency-Aware Reward Shaping rₜ = γᵀ⁻ᵗ · Rᵀ Geometric Discounting Adaptive Downsampling Variable trajectory lengths handling GSPO Integration Policy optimization Key Advantages Interaction Scaling Up to 2048 interactions Constant workspace Context Management No context suffocation Noise filtering Cross-Paradigm Knowledge transfer Prompting strategy Performance Gains +14.5pp improvement Competitive results Markovian State Reconstruction
Q1
1. What is the main limitation of mono-contextual deep research approaches that IterResearch aims to solve?
High computational costs and slow processing speed
Context suffocation and noise contamination in long-horizon tasks
Inability to handle multiple languages and data formats
Q2
2. How does IterResearch maintain consistent reasoning capacity across extended interactions?
By using larger context windows and more powerful models
By distributing the workload across multiple parallel agents
By reconstructing the workspace after each interaction with an evolving report
Q3
3. What impressive scaling capability did IterResearch demonstrate in the experiments?
It scaled to handle 2048 interactions while maintaining performance
It processed data 100 times faster than previous approaches
It reduced memory usage by 90% compared to baseline methods
1/2

Paper 2

Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

Published: 2025-11-10

Link: http://arxiv.org/pdf/2511.07080

1. 📘 Topic and Domain: Development of Wasm, a data processing pipeline for creating structured Arabic multimodal corpora from web content, in the domain of natural language processing and multimodal machine learning.
2. 💡 Previous Research and New Ideas: Based on OBELICS framework for multimodal data processing, introducing new Arabic-specific adaptations and structural preservation techniques in Markdown format while maintaining document hierarchy.
3. ❓ Problem: The lack of high-quality Arabic multimodal datasets that preserve document structure, which limits the development of Arabic language models and multimodal models.
4. 🛠️ Methods: Implements a multi-stage pipeline including metadata extraction, HTML processing, content structuring, and quality filtering with Arabic-specific adjustments to perplexity modeling and node-level deduplication.
5. 📊 Results and Evaluation: Produced a flexible framework that successfully preserves both text and visual content structure, with comparative analysis showing improved filtering performance over existing approaches, though specific quantitative results were not extensively detailed.

Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora

Wasm Pipeline for Arabic Multimodal Corpora Metadata Extraction Common Crawl Dumps Arabic Content Filtering URL, WARC, Metadata HTML Processing WARC File Retrieval Noise Removal CSS/Comments Cleanup Structure Conversion HTML to Markdown Text/Visual Separation Preserve Hierarchy Tag-Level Filtering Arabic-adapted thresholds: • Word Repetition (relaxed) • Language ID (stricter) • Custom KenLM Perplexity • Remove: Stopword/Punct ratio Visual Data Filtering Image URL Collection Site-level Blacklisting Conservative Approach Storage Optimization Safety Filtering Tag Deduplication Needleman-Wunsch Algorithm 80% Similarity Threshold Remove Duplicate Ads Preserve Documents Document Filtering Same Criteria as Tag-level Recalibrated Parameters Document-wide Characteristics Structured Arabic Multimodal Dataset Markdown Format Interleaved Text & Images Key Methodological Innovations Structured Data Preservation • Maintains DOM hierarchy • Preserves semantic relationships • Image-caption associations • Section hierarchies • Contextual dependencies Enhanced Perplexity Assessment • Custom KenLM model • Trained on curated corpus • Multiple Arabic dialects • Better quality filtering • Human-authored text focus Granular Node-Level Deduplication • HTML node level processing • Preserves unique content • Removes boilerplate text • Improves content diversity • Processing efficiency
Q1
1. What is the main innovation of Wasm compared to existing Arabic corpora?
It processes data faster than other pipelines
It preserves document structure and interleaved text-image relationships
It handles a larger volume of Arabic text
Q2
2. How does Wasm's approach to perplexity filtering differ from OBELICS?
It uses a custom KenLM model trained on diverse Arabic dialects
It completely removes perplexity filtering
It applies the same English perplexity model to Arabic
Q3
3. What unique deduplication strategy does Wasm implement?
Document-level deduplication using MinHash
Global deduplication across all documents
Node-level deduplication using Needleman-Wunsch algorithm
1/2

Paper 3

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Published: 2025-11-08

Link: http://arxiv.org/pdf/2511.06221

1. 📘 Topic and Domain: Development of VibeThinker-1.5B, a small language model for logical reasoning in mathematics and coding, challenging the assumption that large models are necessary for strong reasoning capabilities.
2. 💡 Previous Research and New Ideas: Based on OpenAI's o1 model's reasoning paradigm and various large language models, proposing a novel "Spectrum-to-Signal Principle" that enables small models to achieve comparable reasoning abilities to larger models.
3. ❓ Problem: Addressing the industry assumption that scaling model parameters is essential for enhancing logical reasoning capabilities, aiming to achieve comparable performance with a much smaller and cost-effective model.
4. 🛠️ Methods: Implemented a two-stage approach: "Two-Stage Diversity-Exploring Distillation" for SFT phase to generate diverse solutions, followed by "MaxEnt-Guided Policy Optimization" for RL phase to amplify correct reasoning paths.
5. 📊 Results and Evaluation: VibeThinker-1.5B outperformed larger models on mathematical benchmarks (AIME24: 80.3, AIME25: 74.4, HMMT25: 50.4) and coding tasks (LiveCodeBench V6: 51.1), surpassing models 400 times larger while costing only $7,800 to train.

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

VibeThinker-1.5B Training Pipeline Spectrum-to-Signal Principle (SSP) Diversity → Signal Amplification SFT Phase: Spectrum Generation Two-Stage Diversity-Exploring Distillation Domain-Aware Diversity Probing • Algebra • Geometry • Calculus • Statistics Expert Model Fusion Weighted Linear Combination Maximize Pass@K (Diversity) Create broad spectrum of solutions RL Phase: Signal Amplification MaxEnt-Guided Policy Optimization (MGPO) Maximum Entropy Principle Target: p_c(q) = 0.5 Optimal uncertainty Entropy Deviation Regularization KL divergence from max-entropy state Enhanced GRPO with Entropy Weighting Amplify correct signals from spectrum Multi-Stage Training Pipeline Math Reasoning 16K Context Math Reasoning 32K Context Code Generation Programming VibeThinker-1.5B Final Model Key Results • AIME25: 74.4 (vs DeepSeek R1: 70.0) • HMMT25: 50.4 (vs DeepSeek R1: 41.7) • Training Cost: $7,800 (vs DeepSeek R1: $294K) • LiveCodeBench V6: 51.1 • 400× smaller than DeepSeek R1 • Democratizes AI research access
Q1
1. What is the primary innovation that allows VibeThinker-1.5B to achieve strong reasoning capabilities despite its small size?
Using extremely large training datasets
The Spectrum-to-Signal Principle with two-stage optimization
Copying the architecture of larger models
Q2
2. What was the most significant limitation of VibeThinker-1.5B compared to larger models?
Its performance on coding tasks
Its mathematical reasoning abilities
Its general knowledge capabilities on GPQA benchmark
Q3
3. What is the potential broader impact of VibeThinker-1.5B's success according to the paper?
It could democratize AI research by making it more accessible to organizations with limited resources
It could make language models run faster on mobile devices
It could reduce the environmental impact of AI training