2025-10-01 Papers

1/2

Paper 1

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Published: 2025-09-29

Link: http://arxiv.org/pdf/2509.25541

1. 📘 Topic and Domain: Vision-language model (VLM) self-improvement through gamified self-play training, focusing on computer vision and machine learning.

2. 💡 Previous Research and New Ideas: Based on reinforcement learning and self-play approaches like AlphaGo, introduces a novel framework called Vision-Zero that enables VLMs to improve through competitive visual games without human annotation.

3. ❓ Problem: Addresses the high cost and scalability limitations of current VLM training methods that rely heavily on human-curated datasets and annotations.

4. 🛠️ Methods: Implements a "Who Is the Spy" game framework where models engage in strategic reasoning across multiple roles, combined with Iterative Self-Play Policy Optimization (Iterative-SPO) that alternates between self-play and reinforcement learning.

5. 📊 Results and Evaluation: Achieved state-of-the-art performance on reasoning, chart question answering, and vision-centric understanding tasks, surpassing models trained on human-annotated datasets while significantly reducing training costs.

Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

1/2

Paper 2

OceanGym: A Benchmark Environment for Underwater Embodied Agents

Published: 2025-09-30

Link: http://arxiv.org/pdf/2509.26536

1. 📘 Topic and Domain: OceanGym is a benchmark environment for testing and evaluating AI agents in simulated underwater environments.

2. 💡 Previous Research and New Ideas: Based on prior work in embodied AI and simulation environments for ground/aerial domains, this paper introduces the first comprehensive benchmark specifically for underwater scenarios.

3. ❓ Problem: The paper addresses the lack of standardized testing environments for underwater AI agents, which face unique challenges like low visibility, dynamic currents, and complex perception requirements.

4. 🛠️ Methods: The authors created a simulated underwater environment with 8 task domains, using Multi-modal Large Language Models (MLLMs) as agents that integrate perception, memory, and decision-making capabilities.

5. 📊 Results and Evaluation: Results showed significant performance gaps between MLLMs and human experts, with MLLMs struggling particularly in low-visibility conditions (14.8% success rate) and having difficulties with sonar data interpretation, object distinction, and consistent decision-making over extended missions.

OceanGym: A Benchmark Environment for Underwater Embodied Agents

1/2

Paper 3

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Published: 2025-09-30

Link: http://arxiv.org/pdf/2509.25760

1. 📘 Topic and Domain: The paper focuses on developing truthful Large Language Models (LLMs) through reinforcement learning, addressing hallucination and uncertainty in natural language processing.

2. 💡 Previous Research and New Ideas: Based on previous work in LLM fine-tuning and reinforcement learning, it proposes a novel ternary reward system that distinguishes between correct answers, hallucinations, and abstentions, unlike traditional binary reward approaches.

3. ❓ Problem: The paper aims to solve LLMs' tendency to hallucinate or provide incorrect information rather than admitting uncertainty when faced with questions beyond their knowledge.

4. 🛠️ Methods: The authors implement TruthRL using GRPO (General Reinforcement learning from Policy Optimization) with a ternary reward system that rewards correct answers, penalizes hallucinations, and treats abstentions neutrally.

5. 📊 Results and Evaluation: Compared to vanilla RL, TruthRL reduced hallucinations by 28.9% and improved truthfulness by 21.1% across four knowledge-intensive benchmarks, demonstrating consistent gains across various backbone models in both retrieval and non-retrieval setups.