2025-07-24 Papers

1/2

Paper 1

DesignLab: Designing Slides Through Iterative Detection and Correction

Published: 2025-07-23

Link: http://arxiv.org/pdf/2507.17202

1. 📘 Topic and Domain: Automated presentation slide design refinement using AI, within the domain of visual design and human-computer interaction.

2. 💡 Previous Research and New Ideas: Based on previous work in automated layout generation and design tools, introduces a novel iterative approach with separate reviewer and contributor roles for progressive refinement.

3. ❓ Problem: Non-experts struggle to create high-quality presentation slides due to complex design choices, while existing automated tools lack the ability to iteratively refine their output.

4. 🛠️ Methods: Fine-tuned large language models for two roles: a design reviewer that detects design flaws and a design contributor that corrects them, using JSON-formatted slide representations and simulated draft-to-final pairs for training.

5. 📊 Results and Evaluation: The system outperformed existing design generation methods including commercial tools in both user studies and GPT-4 evaluations, with most slides requiring 2-3 iterations to reach optimal design quality.

DesignLab: Designing Slides Through Iterative Detection and Correction

1/2

Paper 2

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Published: 2025-07-23

Link: http://arxiv.org/pdf/2507.17512

1. 📘 Topic and Domain: A data-centric study examining multi-domain reasoning capabilities in large language models (LLMs) using reinforcement learning across mathematical reasoning, code generation, and logical puzzle solving domains.

2. 💡 Previous Research and New Ideas: Based on previous research in Reinforcement Learning with Verifiable Rewards (RLVR) which focused on single domains; introduces new investigation into cross-domain interactions and generalization capabilities.

3. ❓ Problem: Understanding how different reasoning domains interact and influence each other during reinforcement learning training, including potential mutual enhancements and conflicts between domains.

4. 🛠️ Methods: Used Group Relative Policy Optimization (GRPO) algorithm with Qwen-2.5-7B models, conducting experiments across single-domain, dual-domain, and triple-domain combinations while analyzing impacts of curriculum learning, reward designs, and training languages.

5. 📊 Results and Evaluation: Found that puzzle and math domains provide mutual support, code reasoning has mixed cross-domain effects, combining diverse data leads to more robust performance, template consistency is critical, and Chinese language training underperforms English training, with detailed evaluations across MATH500, HumanEval, CountDown and other benchmarks.

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

1/2

Paper 3

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Published: 2025-07-19

Link: http://arxiv.org/pdf/2507.14683

1. 📘 Topic and Domain: The paper focuses on developing MiroMind-M1, an open-source language model series specifically designed for mathematical reasoning through supervised fine-tuning and reinforcement learning.

2. 💡 Previous Research and New Ideas: Based on previous work in reasoning language models (RLMs) and reinforcement learning approaches, the paper proposes CAMPO (Context-Aware Multi-Stage Policy Optimization), a novel algorithm that integrates length-progressive training with adaptive repetition penalties.

3. ❓ Problem: The paper addresses the lack of transparency and reproducibility in high-performing reasoning language models, as most successful models are closed-source and their training details are not publicly available.

4. 🛠️ Methods: The authors used a two-stage training approach: first supervised fine-tuning on 719K math problems with verified chain-of-thought trajectories, followed by reinforcement learning with verifiable rewards on 62K challenging problems using their CAMPO algorithm.

5. 📊 Results and Evaluation: MiroMind-M1 achieved state-of-the-art or competitive performance among Qwen-2.5-based open-source models on mathematical benchmarks (AIME24, AIME25, MATH500), with MiroMind-M1-RL-7B reaching 73.4% on AIME24 and 57.8% on AIME25, while demonstrating superior token efficiency.