2025-10-20 Papers

1/2

Paper 1

Agentic Entropy-Balanced Policy Optimization

Published: 2025-10-16

Link: http://arxiv.org/pdf/2510.14545

1. 📘 Topic and Domain: Agentic Entropy-Balanced Policy Optimization (AEPO) for reinforcement learning in large language models (LLMs), specifically focusing on web agent training and tool use capabilities.

2. 💡 Previous Research and New Ideas: Based on previous agentic RL methods that use entropy signals for tool exploration, but introduces novel entropy balancing in both rollout and policy update phases to address limitations of excessive entropy reliance.

3. ❓ Problem: Addresses two key challenges in entropy-based RL: "High-Entropy Rollout Collapse" where excessive branching occurs along specific paths, and "High-Entropy Token Gradient Clipping" where valuable exploratory behaviors are lost during training.

4. 🛠️ Methods: Implements two core components: (1) Dynamic entropy-balanced rollout mechanism that adaptively allocates sampling budgets and penalizes consecutive high-entropy steps, and (2) Entropy-balanced policy optimization that preserves high-entropy token gradients through stop-gradient operations.

5. 📊 Results and Evaluation: Outperformed 7 mainstream RL algorithms across 14 datasets, achieving with Qwen3-14B: 47.6% on GAIA, 11.2% on HLE, and 43.0% on WebWalkerQA for Pass@1; 65.0%, 26.0%, and 70.0% respectively for Pass@5.

Agentic Entropy-Balanced Policy Optimization

1/2

Paper 2

WithAnyone: Towards Controllable and ID Consistent Image Generation

Published: 2025-10-16

Link: http://arxiv.org/pdf/2510.14975

1. 📘 Topic and Domain: Identity-consistent image generation with a focus on controllable multi-person portrait synthesis in computer vision and AI image generation.

2. 💡 Previous Research and New Ideas: Based on previous identity-consistent generation and customization models, it proposes a novel contrastive training approach using paired reference images rather than just reconstruction.

3. ❓ Problem: The paper aims to solve the "copy-paste" artifact problem where models directly replicate reference faces instead of preserving identity across natural variations in pose, expression and lighting.

4. 🛠️ Methods: Introduces WithAnyone model built on FLUX architecture using: (1) MultiID-2M dataset with paired references, (2) GT-aligned ID loss, (3) ID contrastive loss with extended negatives, and (4) 4-phase training pipeline.

5. 📊 Results and Evaluation: WithAnyone achieves state-of-the-art identity similarity while significantly reducing copy-paste artifacts, evaluated through both quantitative metrics and user studies showing improved controllability and visual quality.

WithAnyone: Towards Controllable and ID Consistent Image Generation

1/2

Paper 3

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Published: 2025-10-16

Link: http://arxiv.org/pdf/2510.14979

1. 📘 Topic and Domain: Development of native vision-language models (VLMs) that integrate vision and language processing in a unified architecture, in the domain of multimodal AI.

2. 💡 Previous Research and New Ideas: Based on modular VLMs that combine separate visual encoders and language models; proposes a novel unified architecture called NEO with native primitives that process vision and language jointly from the start.

3. ❓ Problem: Addresses limitations of modular VLMs including complex multi-stage training, rigid visual biases, and inefficient vision-language alignment by developing a more integrated approach.

4. 🛠️ Methods: Implements a unified architecture with native primitives including flexible position encoding, multi-head native attention, and native rotary position embeddings, trained end-to-end on 390M image-text pairs.

5. 📊 Results and Evaluation: NEO achieves competitive performance compared to top modular VLMs across diverse benchmarks despite using less training data, particularly strong in visual-centric tasks while showing some limitations in knowledge-intensive and OCR tasks.