2026-02-11 Papers

1/2

Paper 1

Code2World: A GUI World Model via Renderable Code Generation

Published: 2026-02-10

Link: http://arxiv.org/pdf/2602.09856

1. 📘 Topic and Domain: This paper introduces Code2World, a GUI world model that predicts next user interface states through renderable HTML code generation for autonomous GUI agents.

2. 💡 Previous Research and New Ideas: The paper builds on existing text-based and pixel-based GUI world models but proposes a novel "renderable code generation" paradigm that uses structured HTML code as an intermediate representation to achieve both high visual fidelity and fine-grained structural controllability.

3. ❓ Problem: The paper aims to solve the limitation of existing GUI agents that lack predictive foresight, operating without simulating action consequences, which leads to costly corrections and potential failures in high-risk scenarios.

4. 🛠️ Methods: The authors use a two-stage training approach: supervised fine-tuning on synthesized AndroidCode dataset (80K+ samples), followed by Render-Aware Reinforcement Learning with dual rewards for visual semantic fidelity and action consistency.

5. 📊 Results and Evaluation: Code2World-8B achieves state-of-the-art performance in next UI prediction, rivaling GPT-5 and Gemini-3-Pro-Image, and significantly enhances downstream GUI agents with +9.5% improvement on AndroidWorld navigation tasks when used as a plug-and-play simulator.

Code2World: A GUI World Model via Renderable Code Generation

1/2

Paper 2

Chain of Mindset: Reasoning with Adaptive Cognitive Modes

Published: 2026-02-10

Link: http://arxiv.org/pdf/2602.10063

1. 📘 Topic and Domain: The paper introduces Chain of Mindset (CoM), a framework for large language model reasoning that enables dynamic switching between different cognitive modes during problem-solving across mathematics, coding, and multimodal reasoning tasks.

2. 💡 Previous Research and New Ideas: The paper builds on cognitive science research identifying distinct reasoning modes (spatial, convergent, divergent thinking) and existing LLM reasoning methods like Chain-of-Thought, proposing the novel idea of step-level adaptive mindset orchestration where models can dynamically switch between four heterogeneous cognitive modes within a single reasoning process.

3. ❓ Problem: The paper addresses the limitation that existing LLM reasoning methods apply a single fixed mindset throughout problem-solving, which prevents models from adapting their cognitive approach when different stages of the same problem require fundamentally different reasoning strategies.

4. 🛠️ Methods: The authors developed a three-layer architecture with a Meta-Agent that orchestrates four specialized mindsets (Spatial, Convergent, Divergent, Algorithmic), combined with a bidirectional Context Gate mechanism that filters information flow between components to prevent interference during mindset transitions.

5. 📊 Results and Evaluation: CoM achieved state-of-the-art performance across six challenging benchmarks, outperforming the strongest baseline by 4.96% on Qwen3-VL-32B-Instruct and 4.72% on Gemini-2.0-Flash, with results evaluated using pass@1 accuracy metrics while maintaining computational efficiency.

Chain of Mindset: Reasoning with Adaptive Cognitive Modes

1/2

Paper 3

UI-Venus-1.5 Technical Report

Published: 2026-02-09

Link: http://arxiv.org/pdf/2602.09082

1. 📘 Topic and Domain: This paper presents UI-Venus-1.5, a unified end-to-end GUI (Graphical User Interface) agent designed for automating interactions in digital environments across mobile and web platforms.

2. 💡 Previous Research and New Ideas: The paper builds on their previous UI-Venus-1.0 model and proposes three key advances: comprehensive Mid-Training with 10 billion tokens across 30+ datasets, Online Reinforcement Learning with full-trajectory rollouts, and a unified single GUI agent constructed via model merging of domain-specific models.

3. ❓ Problem: The paper aims to solve the challenge of achieving both broad generality and consistently strong task performance in GUI agents, addressing the gap between step-level and trace-level accuracy during training and the need for practical deployment-ready agents.

4. 🛠️ Methods: The authors used a four-stage training pipeline including Mid-Training for GUI knowledge injection, Offline-RL for task-specific optimization, Online-RL using Group Relative Policy Optimization (GRPO) for complex navigation, and model merging (specifically TIES-Merge) to unify specialized models into a single agent.

5. 📊 Results and Evaluation: UI-Venus-1.5 achieved state-of-the-art performance on multiple benchmarks including ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous baselines, and demonstrated robust navigation capabilities across Chinese mobile apps in real-world scenarios.