2025-11-24 Papers

1/2

Paper 1

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Published: 2025-11-20

Link: http://arxiv.org/pdf/2511.16043

1. 📘 Topic and Domain: The paper focuses on developing self-evolving AI agents using Large Language Models (LLMs) without human-curated training data, in the domain of artificial intelligence and machine learning.
2. 💡 Previous Research and New Ideas: Based on previous self-play and self-challenging approaches for LLMs, it proposes a novel framework called Agent0 that introduces tool integration and multi-step co-evolution between two specialized agents.
3. ❓ Problem: The paper addresses the limitation of LLM agents requiring massive human-curated datasets for training, which creates scalability bottlenecks and tethers AI development to human knowledge boundaries.
4. 🛠️ Methods: Uses two co-evolving agents initialized from the same base LLM: a curriculum agent that generates increasingly challenging tasks and an executor agent that learns to solve them, with integrated tools and multi-turn interactions supported by Group Relative Policy Optimization (GRPO).
5. 📊 Results and Evaluation: Agent0 improved mathematical reasoning performance by 18% and general reasoning performance by 24% on the Qwen3-8B-Base model across ten benchmarks, demonstrating substantial capability gains through the co-evolutionary process.

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Agent0: Self-Evolving Framework Base LLM (πbase) Curriculum Agent (πθ) Executor Agent (πφ) Co-Evolutionary Loop Curriculum Evolution Task Generation Uncertainty Runc Tool Use Rtool Repetition Rrep GRPO Update Executor Evolution Dataset Filtering Multi-turn Rollout Tool Integration ADPO Update Code Interpreter External Tool Virtuous Cycle Performance Improvements Math: +18% General: +24% Zero Human Data t t+1
Q1
1. What is the main innovation of Agent0 compared to previous self-evolving frameworks?
It uses human-curated datasets for training
It combines tool integration with multi-round co-evolution between specialized agents
It relies on single-round interactions between identical agents
Q2
2. How does the curriculum agent determine the quality of its generated tasks?
By measuring how many tasks the executor agent completes successfully
By checking if tasks match predefined templates
By using executor's uncertainty and tool-use frequency as reward signals
Q3
3. What was the most significant performance improvement achieved by Agent0 in the experiments?
18% improvement in general reasoning
24% improvement in mathematical reasoning
24% improvement in general reasoning
1/2

Paper 2

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Published: 2025-11-20

Link: http://arxiv.org/pdf/2511.16334

1. 📘 Topic and Domain: Large multimodal reasoning models (LMRMs) and training recipes for combining supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance visual-language reasoning capabilities.
2. 💡 Previous Research and New Ideas: Based on recent advances in reinforcement learning with verifiable rewards (RLVR) for language models, proposes a novel transparent and reproducible training recipe combining SFT and RL specifically for multimodal reasoning.
3. ❓ Problem: The lack of transparent, reproducible training pipelines and data curation processes for building multimodal reasoning models, which limits understanding of how these models are developed.
4. 🛠️ Methods: Developed a two-stage recipe: 1) SFT stage using 874K high-quality samples with step-by-step validation, and 2) RL stage using 74K samples across diverse domains with carefully designed reward functions and optimization strategies.
5. 📊 Results and Evaluation: Achieved 11.6% improvement over Qwen2.5-VL-7B-Instruct baseline across nine multimodal reasoning benchmarks, with particularly strong performance on benchmarks like MathVista (79.5%), WeMath (79.0%), and LogicVista (72.6%).

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

OpenMMReasoner: Multimodal Reasoning Training Recipe Supervised Fine-tuning (SFT) Pipeline Data Sources 103k raw questions Teacher Model Selection Qwen3-VL-235B Data Distillation ×8 Sampling 583k samples Answer Verification Step-by-step Cross-Domain Mixing General VQA + Math VQA + Text QA Final 874k SFT Dataset Cold Start Model Reinforcement Learning (RL) Pipeline RL Data Sources 74k samples Multi-domain Algorithm Selection GSPO Reward Function R = (1-λ)R_acc + λR_fmt Training Strategy ×16 Rollout Temp = 1.0 RL Optimization Stability + Efficiency Balance Enhanced Reasoning Final Model Key Insights Answer diversity enhances reasoning performance Teacher model selection is crucial for data quality Cross-domain knowledge improves generalization GSPO outperforms other RL algorithms Token efficiency is crucial for practical deployment Reasoning ability transfers across domains Performance Results MathVista: 79.5% (+10.3% vs baseline) MathVerse: 63.8% (+38.2% vs baseline) WeMath: 79.0% (+57.2% vs baseline) Average: 11.6% improvement Consistent SOTA performance Fully open-sourced pipeline Transfer
Q1
1. What is the main innovation that distinguishes OpenMMReasoner from previous approaches?
It uses a larger dataset than previous models
It provides a fully transparent and reproducible training pipeline combining SFT and RL
It achieves better performance through a more complex model architecture
Q2
2. During the SFT stage, what was discovered about data filtering strategies?
Strict filtering of data improved model performance significantly
Length-based filtering was more effective than difficulty-based filtering
Over-filtering reduced diversity and actually hurt performance
Q3
3. What unexpected benefit emerged during the RL training phase?
The model developed faster processing speeds
The model showed improved textual reasoning abilities without specific training
The model required less computational resources than expected
1/2

Paper 3

First Frame Is the Place to Go for Video Content Customization

Published: 2025-11-19

Link: http://arxiv.org/pdf/2511.15700

1. 📘 Topic and Domain: The paper explores video content customization through a novel perspective on the role of first frames in video generation models, focusing on computer vision and deep learning.
2. 💡 Previous Research and New Ideas: Based on existing video generation models like Wan2.2, the paper proposes a new perspective that the first frame acts as a conceptual memory buffer for storing visual entities, rather than just being a starting point.
3. ❓ Problem: The paper aims to solve the challenge of incorporating multiple reference images into pre-trained video generation models without architectural modifications or large-scale fine-tuning.
4. 🛠️ Methods: The authors develop FFGo, a lightweight add-on that uses Vision-Language Models for data curation and LoRA adaptation with just 20-50 training examples to invoke the model's innate ability to mix subjects through the first frame.
5. 📊 Results and Evaluation: Through user studies across 200 annotations, FFGo outperformed baseline models in object identity, scene identity, and overall quality, with 81.2% of users ranking it as their top choice despite using minimal training data.

First Frame Is the Place to Go for Video Content Customization

FFGo: First Frame Video Content Customization Pipeline Phase 1: Dataset Curation 2000 Videos Manual Select Element ID VLM Extract SAM 2 Process 50 Training Examples Phase 2: Few-shot LoRA Adaptation Wan2.2-I2V-A14B LoRA Rank 128 <transition> + Text Prompt I_mix Input 5hrs, 2xH200 Adapted Model θ+Δθ Phase 3: Inference V_mix = g(I_mix, C_trans) F_c frames F_g frames Remove F_c, Keep F_g Clean Customized Video Key Innovation: First Frame as Conceptual Memory Buffer • First frame stores visual entities for later reuse during generation • No architectural modifications to pre-trained models • Only 20-50 training examples needed via LoRA adaptation • Preserves rich generative priors of pre-trained models Applications Robot Manipulation Driving Simulation Filmmaking Aerial View Sim Product Demo Multi-object Mix Evaluation Results • User Study: 200 annotations across 40 users • FFGo ranked 1st in 81.2% of cases • Outperforms VACE and SkyReels-A2 • Superior object & scene identity preservation Overall Quality: 4.28/5 Object Identity: 4.53/5 Scene Identity: 4.58/5 Avg Rank: 1.21
Q1
1. What is the key insight about the first frame's role in video generation that this paper introduces?
It serves only as a temporal starting point
It acts as a conceptual memory buffer for storing visual entities
It determines the video's final resolution
Q2
2. How many training examples did FFGo require to achieve state-of-the-art performance?
Over 1 million examples
500-1000 examples
Only 20-50 examples
Q3
3. What unique advantage does FFGo have over baseline models like VACE and SkyReels-A2?
It runs much faster on standard hardware
It can handle an unlimited number of reference inputs
It requires no architectural modifications to the base model