2025-09-18 Papers

1/2

Paper 1

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

Published: 2025-09-17

Link: http://arxiv.org/pdf/2509.14008

1. 📘 Topic and Domain: Development of Arabic-centric large language models (HALA) focusing on instruction-following and translation capabilities.

2. 💡 Previous Research and New Ideas: Based on existing multilingual LLMs and Arabic NLP work (like AraBERT, JAIS), proposing a novel translate-and-tune pipeline for creating specialized Arabic models.

3. ❓ Problem: Addressing the scarcity of high-quality Arabic instruction data and the need for better Arabic-centric language models.

4. 🛠️ Methods: Used FP8 compression of translator models, created million-scale bilingual supervision data, translated English instruction datasets to Arabic, and fine-tuned models at various scales (350M to 9B parameters) with slerp merging.

5. 📊 Results and Evaluation: HALA models achieved state-of-the-art results in both nano (≤2B) and small (7B-9B) categories on Arabic benchmarks, outperforming base models while maintaining general capabilities.

Hala Technical Report: Building Arabic-Centric Instruction & Translation Models at Scale

1/2

Paper 2

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Published: 2025-09-16

Link: http://arxiv.org/pdf/2509.13305

1. 📘 Topic and Domain: The paper focuses on developing an improved web agent system (WebSailor-V2) for autonomous information seeking and research tasks in the domain of artificial intelligence and natural language processing.

2. 💡 Previous Research and New Ideas: Based on the original WebSailor framework and ReAct paradigm, it introduces novel ideas including SailorFog-QA-V2 (an enhanced dataset with complex knowledge graphs) and a dual-environment reinforcement learning framework combining simulated and real-world training.

3. ❓ Problem: The paper aims to close the performance gap between open-source and proprietary web agents while addressing challenges in data quality and training scalability for autonomous research agents.

4. 🛠️ Methods: The authors use a comprehensive pipeline including: (1) SailorFog-QA-V2 dataset construction with dense knowledge graphs, (2) Supervised Fine-Tuning for initial training, and (3) a dual-environment Reinforcement Learning approach with both simulated and real-world components.

5. 📊 Results and Evaluation: WebSailor-V2 achieved state-of-the-art results on multiple benchmarks, scoring 35.3 on BrowseComp-EN, 44.1 on BrowseComp-ZH, and 30.6 on HLE, outperforming existing open-source agents and matching or exceeding some proprietary systems despite using a smaller model (30B parameters).

WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

1/2

Paper 3

ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization

Published: 2025-09-16

Link: http://arxiv.org/pdf/2509.13313

1. 📘 Topic and Domain: The paper introduces ReSum, a paradigm for enabling long-horizon web search capabilities in Large Language Model (LLM) agents through context summarization.

2. 💡 Previous Research and New Ideas: Based on the ReAct paradigm for LLM agents, it proposes a novel approach of periodically summarizing conversation history to overcome context window limitations.

3. ❓ Problem: The paper addresses the fundamental limitation of context window size in LLM-based web agents that prevents them from conducting extended multi-turn exploration needed for complex queries.

4. 🛠️ Methods: The paper develops ReSumTool-30B for specialized summarization and ReSum-GRPO, an algorithm that integrates GRPO with segmented trajectory training to help agents adapt to summary-based reasoning.

5. 📊 Results and Evaluation: ReSum achieved an average 4.5% improvement over ReAct across three benchmarks, with further gains up to 8.2% after ReSum-GRPO training, enabling WebResummer-30B to achieve 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en.