2026-03-19 Papers

1/2

Paper 1

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Published: 2026-03-16

Link: http://arxiv.org/pdf/2603.15726

1. 📘 Topic and Domain: The paper presents MiroThinker-1.7 and MiroThinker-H1, research agents designed for complex long-horizon reasoning tasks in the domain of agentic AI systems.

2. 💡 Previous Research and New Ideas: The paper builds on ReAct paradigm and agentic LLMs (GPT-5.4, Claude-4.6, etc.), proposing agentic mid-training for atomic capabilities and verification-centric reasoning with local and global verifiers for reliable multi-step problem solving.

3. ❓ Problem: The paper aims to solve the challenge that simply scaling interaction length in agent trajectories accumulates noise and errors rather than improving reasoning quality in complex real-world tasks.

4. 🛠️ Methods: The authors use a four-stage training pipeline (mid-training, supervised fine-tuning, preference optimization, reinforcement learning) with dual-loop agent architecture, sliding-window context management, and verification mechanisms at both local and global levels.

5. 📊 Results and Evaluation: MiroThinker-H1 achieves state-of-the-art performance with 88.2 on BrowseComp and 88.5 on GAIA, outperforming commercial agents while requiring 43% fewer interaction rounds compared to previous versions, evaluated using LLM-as-Judge across multiple benchmarks.

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

1/2

Paper 2

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

Published: 2026-03-17

Link: http://arxiv.org/pdf/2603.17187

1. 📘 Topic and Domain: The paper presents MetaClaw, a continual meta-learning framework for deployed LLM agents that enables them to evolve and adapt in real-world usage through skill synthesis and policy optimization.

2. 💡 Previous Research and New Ideas: The paper builds on memory-based methods (Reflexion), skill-based approaches (Voyager, ExpeL), and RL-based LLM training (RLHF, GRPO), proposing a novel dual-mechanism approach that combines gradient-free skill evolution with opportunistic gradient-based policy optimization while maintaining strict support-query data separation.

3. ❓ Problem: The paper addresses the fundamental tension that deployed LLM agents remain static after training while user needs and task distributions evolve continuously, causing performance degradation without service interruption for retraining.

4. 🛠️ Methods: MetaClaw employs two complementary mechanisms: skill-driven fast adaptation that analyzes failures to synthesize reusable behavioral instructions with zero downtime, and opportunistic policy optimization that performs RL-based LoRA fine-tuning during user-inactive windows detected by monitoring sleep schedules, system inactivity, and calendar events.

5. 📊 Results and Evaluation: On MetaClaw-Bench (934 questions), skill adaptation improved accuracy by up to 32% relative, the full pipeline advanced Kimi-K2.5 from 21.4% to 40.6% accuracy with 8.25× gain in task completion, and on AutoResearchClaw, skill injection alone improved composite robustness by 18.3%.

MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

1/2

Paper 3

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Published: 2026-03-17

Link: http://arxiv.org/pdf/2603.16859

1. 📘 Topic and Domain: The paper introduces SocialOmni, a benchmark for evaluating audio-visual social interactivity in omni-modal large language models (OLMs) during multi-party conversations.

2. 💡 Previous Research and New Ideas: The paper builds on existing OLM benchmarks that focus on static accuracy-centric tasks, proposing a new evaluation framework that operationalizes social interactivity across three dimensions: who (speaker identification), when (interruption timing), and how (natural response generation).

3. ❓ Problem: The paper addresses the gap in evaluating OLMs' conversational social competence, as current benchmarks fail to assess models' ability to navigate dynamic dialogue cues, determine appropriate turn-taking timing, and generate socially coherent responses in real-time multi-party settings.

4. 🛠️ Methods: The authors created a benchmark with 2,000 perception samples and 209 interaction-generation instances across 15 dialogue domains, using multiple-choice questions for speaker identification and LLM-as-judge protocols for evaluating turn-timing decisions and response quality.

5. 📊 Results and Evaluation: Testing 12 OLMs revealed no single model dominates all three axes, with significant decoupling between perceptual accuracy and generation quality - models excelling at speaker identification often fail at natural interruption generation, confirming that understanding-centric metrics alone cannot characterize conversational competence.