2025-04-22 Papers

1/2

Paper 1

FlowReasoner: Reinforcing Query-Level Meta-Agents

Published: 2025-04-21

Link: http://arxiv.org/pdf/2504.15257

1. 📘 Topic and Domain: The paper introduces FlowReasoner, a query-level meta-agent for automating the design of personalized multi-agent systems in the domain of AI agent systems.

2. 💡 Previous Research and New Ideas: The paper builds on previous task-level meta-agents that create fixed workflows for specific tasks, proposing instead a query-level approach that generates a unique multi-agent system for each individual user query through reasoning-based optimization.

3. ❓ Problem: The paper addresses the limitation of existing multi-agent systems that are either manually designed (requiring significant human effort) or task-level automated (creating one-size-fits-all systems that lack adaptability to individual queries).

4. 🛠️ Methods: The authors distill reasoning abilities from DeepSeek R1 to endow FlowReasoner with basic multi-agent system generation capabilities, then enhance it through reinforcement learning with external execution feedback using a multi-purpose reward focused on performance, complexity, and efficiency.

5. 📊 Results and Evaluation: FlowReasoner outperforms existing methods across engineering and competition code benchmarks, notably surpassing o1-mini by 10.52% accuracy across three benchmarks, while demonstrating superior adaptability by generating personalized workflows tailored to specific queries.

FlowReasoner: Reinforcing Query-Level Meta-Agents

1/2

Paper 2

Learning to Reason under Off-Policy Guidance

Published: 2025-04-21

Link: http://arxiv.org/pdf/2504.14945

1. 📘 Topic and Domain: The paper focuses on enhancing large language models' reasoning capabilities through reinforcement learning that integrates off-policy guidance.

2. 💡 Previous Research and New Ideas: The paper builds on zero-RL approaches that train reasoning models using only on-policy rollouts and rule-based rewards, and proposes LUFFY, a framework that incorporates off-policy reasoning traces from stronger models to expand learning beyond the model's initial capabilities.

3. ❓ Problem: The paper addresses the limitation of existing zero-RL methods which constrain learning to a model's own outputs, preventing acquisition of reasoning abilities beyond its initial capabilities.

4. 🛠️ Methods: The authors use a mixed-policy approach that combines off-policy demonstrations with on-policy rollouts during training, employing policy shaping via regularized importance sampling to emphasize low-probability but crucial actions.

5. 📊 Results and Evaluation: LUFFY achieves an average gain of over +7.0 points across six math benchmarks and +6.2 points on out-of-distribution tasks, outperforming both imitation-based supervised fine-tuning and existing zero-RL methods in both performance and generalization.

Learning to Reason under Off-Policy Guidance

1/2

Paper 3

StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians

Published: 2025-04-21

Link: http://arxiv.org/pdf/2504.15281

1. 📘 Topic and Domain: The paper presents StyleMe3D, a framework for transferring artistic styles to 3D Gaussian Splatting representations while preserving geometric integrity.

2. 💡 Previous Research and New Ideas: The paper builds upon 3D Gaussian Splatting and existing style transfer techniques, proposing a novel approach that integrates multi-modal style conditioning, multi-level semantic alignment, and perceptual quality enhancement.

3. ❓ Problem: The paper addresses the challenge of stylizing 3D Gaussian Splatting scenes with artistic styles while maintaining geometric details, semantic coherence, and visual harmony.

4. 🛠️ Methods: The authors use four key components: Dynamic Style Score Distillation (DSSD) for semantic alignment, Contrastive Style Descriptor (CSD) for content-aware textures, Simultaneously Optimized Scale (SOS) for detail preservation, and 3D Gaussian Quality Assessment (3DG-QA) for aesthetic quality.

5. 📊 Results and Evaluation: StyleMe3D outperforms state-of-the-art methods in preserving geometric details and ensuring stylistic consistency across scenes, achieving higher PSNR, SSIM, and LPIPS scores while maintaining real-time rendering capabilities.