2025-07-24 Papers

1/2

Paper 1

DesignLab: Designing Slides Through Iterative Detection and Correction

Published: 2025-07-23

Link: http://arxiv.org/pdf/2507.17202

1. 📘 Topic and Domain: Automated presentation slide design refinement using AI, within the domain of visual design and human-computer interaction.
2. 💡 Previous Research and New Ideas: Based on previous work in automated layout generation and design tools, introduces a novel iterative approach with separate reviewer and contributor roles for progressive refinement.
3. ❓ Problem: Non-experts struggle to create high-quality presentation slides due to complex design choices, while existing automated tools lack the ability to iteratively refine their output.
4. 🛠️ Methods: Fine-tuned large language models for two roles: a design reviewer that detects design flaws and a design contributor that corrects them, using JSON-formatted slide representations and simulated draft-to-final pairs for training.
5. 📊 Results and Evaluation: The system outperformed existing design generation methods including commercial tools in both user studies and GPT-4 evaluations, with most slides requiring 2-3 iterations to reach optimal design quality.

DesignLab: Designing Slides Through Iterative Detection and Correction

DesignLab: Iterative Detection and Correction Workflow Initial Rough Draft (Presentation Slide) JSON Conversion Structured Format Perturbation Simulate Drafts (Remove, Shift, Alter) Model Training Phase Design Reviewer Detect Issues → TENTATIVE Design Contributor Fix TENTATIVE Elements Iterative Refinement Loop Initial Review Mark all TENTATIVE Contribute Fix Issues Re-Review Check Quality Done? Polished Design Professional Quality Interactive Mode User Selection Design Branching No (Continue) Yes (Stop) Key Features • Decomposed Roles • Progressive Refinement • Error Detection • Automated Correction
Q1
1. What is the key innovation in DesignLab's approach compared to existing presentation design tools?
Using advanced graphics processing algorithms
Separating the design process into reviewer and contributor roles
Generating completely new slide content from scratch
Q2
2. How does DesignLab simulate rough drafts for training?
By collecting real rough drafts from users
By using AI to generate random slide layouts
By introducing controlled perturbations to polished slides
Q3
3. What is a current limitation of DesignLab according to the paper?
It cannot handle slides with multiple pages
It struggles with complex data structures like tables and graphs
It only works with specific presentation software
1/2

Paper 2

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Published: 2025-07-23

Link: http://arxiv.org/pdf/2507.17512

1. 📘 Topic and Domain: A data-centric study examining multi-domain reasoning capabilities in large language models (LLMs) using reinforcement learning across mathematical reasoning, code generation, and logical puzzle solving domains.
2. 💡 Previous Research and New Ideas: Based on previous research in Reinforcement Learning with Verifiable Rewards (RLVR) which focused on single domains; introduces new investigation into cross-domain interactions and generalization capabilities.
3. ❓ Problem: Understanding how different reasoning domains interact and influence each other during reinforcement learning training, including potential mutual enhancements and conflicts between domains.
4. 🛠️ Methods: Used Group Relative Policy Optimization (GRPO) algorithm with Qwen-2.5-7B models, conducting experiments across single-domain, dual-domain, and triple-domain combinations while analyzing impacts of curriculum learning, reward designs, and training languages.
5. 📊 Results and Evaluation: Found that puzzle and math domains provide mutual support, code reasoning has mixed cross-domain effects, combining diverse data leads to more robust performance, template consistency is critical, and Chinese language training underperforms English training, with detailed evaluations across MATH500, HumanEval, CountDown and other benchmarks.

Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning

Multi-Domain Reasoning via Reinforcement Learning Workflow Data Preparation & Configuration Math Domain DeepScaleR (10k) CountDown (10k) Code Domain CodeR1-12k (12k) LeetCode + TACO Puzzle Domain Knights-and-Knaves (5.4k) Logic Puzzle Baron (2.4k) Training Configuration Model Selection Qwen2.5-7B Base/Instruct Algorithm GRPO Optimization Template R1-template (DeepSeek) Reward Binary/Partial Experimental Design Single Domain • Math only • Code only • Puzzle only Dual Domain • Math + Puzzle • Math + Code • Puzzle + Code Triple Domain Math + Code + Puzzle Additional Studies • Template variation • Curriculum learning • Reward design Evaluation Framework Math Benchmarks MATH500, AIME24, CountDown Code Benchmarks HumanEval, MBPP Puzzle Benchmarks KK, ZebraLogicBench Key Findings Cross-Domain Effects • Math ↔ Puzzle: Synergy • Code: Mixed effects • Multi-domain improves overall performance Technical Insights • SFT boosts RL • Template consistency is critical • Curriculum helps Reward Design • Task-dependent • Binary for simple tasks • Partial for complex • Need finer granularity Language Effects • RLVR is language- sensitive • English > Chinese performance
Q1
1. According to the paper, which combination of domains showed the most promising mutual support in reinforcement learning?
Math and Code
Puzzle and Math
Code and Puzzle
Q2
2. What unexpected finding did the researchers discover about template consistency in their experiments?
Templates had no impact on model performance
Mismatched templates between training and testing severely degraded performance
Complex templates performed better than simple ones
Q3
3. When training the model with CodeR1 dataset, what interesting cross-domain effect was observed?
It improved performance equally across all domains
It strengthened reasoning transfer for instruct model but constrained base model's reasoning
It completely eliminated the model's ability to solve math problems
1/2

Paper 3

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

Published: 2025-07-19

Link: http://arxiv.org/pdf/2507.14683

1. 📘 Topic and Domain: The paper focuses on developing MiroMind-M1, an open-source language model series specifically designed for mathematical reasoning through supervised fine-tuning and reinforcement learning.
2. 💡 Previous Research and New Ideas: Based on previous work in reasoning language models (RLMs) and reinforcement learning approaches, the paper proposes CAMPO (Context-Aware Multi-Stage Policy Optimization), a novel algorithm that integrates length-progressive training with adaptive repetition penalties.
3. ❓ Problem: The paper addresses the lack of transparency and reproducibility in high-performing reasoning language models, as most successful models are closed-source and their training details are not publicly available.
4. 🛠️ Methods: The authors used a two-stage training approach: first supervised fine-tuning on 719K math problems with verified chain-of-thought trajectories, followed by reinforcement learning with verifiable rewards on 62K challenging problems using their CAMPO algorithm.
5. 📊 Results and Evaluation: MiroMind-M1 achieved state-of-the-art or competitive performance among Qwen-2.5-based open-source models on mathematical benchmarks (AIME24, AIME25, MATH500), with MiroMind-M1-RL-7B reaching 73.4% on AIME24 and 57.8% on AIME25, while demonstrating superior token efficiency.

MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization

MiroMind-M1: Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Data Collection OpenR1 (418k) OpenThoughts (56k) Light-R1 (76k) Synthetic-1 (247k) Total: 719K samples Data Processing Deduplication Decontamination Length Analysis Quality Filtering SFT Training Base: Qwen2.5-Math-7B 3 epochs, No-packing LR: 5e-5, BS: 128 Max length: 32K → MiroMind-M1-SFT-7B RL Data Preparation NuminaMath-1.5 (896K) Skywork-OR1 (105K) Big-Math (50K) DAPO-Math (17K) Filtering Pipeline: Style → Duplicates → Difficulty → Length Final: 62K problems CAMPO Algorithm Context-Aware Multi-Stage Policy Optimization Multi-Stage 16K → 32K → 49K Progressive Context Length Repetition Penalty f(oi) Reduce Redundancy Enhanced Math Verifier Cascade Design MiroMind-M1-RL-7B From SFT-7B Two-stage RL AIME24: 73.4 AIME25: 57.8 MiroMind-M1-RL-32B From DeepSeek-R1 Three-stage RL AIME24: 77.5 AIME25: 65.6 Evaluation AIME24/25 MATH-500 Token Efficiency avg@k metrics Key Insights & Contributions • No-packing strategy outperforms packing in SFT training • Longer reasoning trajectories lead to better performance • Multi-stage training improves efficiency and token usage • Open-source release: models, datasets, training configs, and improved verifier Process Legend Data Collection Processing SFT Training RL Data Prep CAMPO Algorithm Model Training Evaluation
Q1
1. What is the main innovation in the CAMPO algorithm proposed in this paper?
Integration of length-progressive training with adaptive repetition penalties
Using a larger dataset for supervised fine-tuning
Implementing a new tokenization method
Q2
2. How many stages were used in the training process of MiroMind-M1?
One stage using only reinforcement learning
Two stages: supervised fine-tuning followed by reinforcement learning
Three stages: pre-training, fine-tuning, and testing
Q3
3. What was a unique aspect of the model's evaluation process that demonstrated its efficiency?
It processed more math problems per second
It achieved higher accuracy with fewer training epochs
It achieved similar or better results while using fewer tokens in its responses