2025-05-02 Papers

1/2

Paper 1

DeepCritic: Deliberate Critique with Large Language Models

Published: 2025-05-01

Link: http://arxiv.org/pdf/2505.00662

1. 📘 Topic and Domain: The paper focuses on enhancing the mathematical critique capabilities of Large Language Models (LLMs), specifically in their ability to evaluate and provide feedback on mathematical reasoning solutions.

2. 💡 Previous Research and New Ideas: Based on existing LLM critics that provide shallow critiques, the paper proposes a novel two-stage framework called DeepCritic that enables LLMs to generate more deliberate and thorough critiques of mathematical solutions.

3. ❓ Problem: The paper addresses the limitation of current LLM critics that provide superficial critiques of mathematical solutions, leading to low judgment accuracy and insufficient feedback for error correction.

4. 🛠️ Methods: The paper employs a two-stage approach: first using Qwen2.5-72B-Instruct to generate 4.5K long-form critiques for supervised fine-tuning, then applying reinforcement learning using either human-labeled data or automatically annotated data via Monte Carlo sampling.

5. 📊 Results and Evaluation: The developed DeepCritic model outperformed existing LLM critics (including GPT-4o) on various error identification benchmarks and demonstrated effectiveness in helping LLM generators refine erroneous solutions through detailed feedback.

DeepCritic: Deliberate Critique with Large Language Models

1/2

Paper 2

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Published: 2025-05-01

Link: http://arxiv.org/pdf/2505.00703

1. 📘 Topic and Domain: The paper focuses on enhancing text-to-image generation through reasoning capabilities using chain-of-thought (CoT) approaches in computer vision and artificial intelligence.

2. 💡 Previous Research and New Ideas: Based on previous work in language model reasoning and visual generation, the paper introduces a novel bi-level CoT approach combining semantic-level planning and token-level generation, which is new to image generation.

3. ❓ Problem: The paper addresses the challenge of incorporating reasoning capabilities into text-to-image generation models to improve their understanding of complex prompts and generation quality.

4. 🛠️ Methods: The authors develop BiCoT-GRPO, a reinforcement learning framework that jointly optimizes both semantic-level and token-level CoT, using an ensemble of vision experts as reward models.

5. 📊 Results and Evaluation: The resulting model T2I-R1 achieved 13% improvement on T2I-CompBench and 19% improvement on WISE benchmark, surpassing state-of-the-art model FLUX.1.

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

1/2

Paper 3

COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

Published: 2025-04-30

Link: http://arxiv.org/pdf/2504.21850

1. 📘 Topic and Domain: The paper focuses on improving multimodal large language models' ability to handle complex visual-language tasks through a novel compositional training approach.

2. 💡 Previous Research and New Ideas: The paper builds on previous visual instruction tuning research but proposes a new approach called COMPACT that explicitly controls for compositional complexity in training data rather than just scaling data volume.

3. ❓ Problem: The paper addresses how current multimodal models struggle with complex tasks requiring multiple capabilities simultaneously (like recognizing objects, counting them, and understanding spatial relationships together).

4. 🛠️ Methods: The authors develop a data generation pipeline that creates training examples combining 10 atomic visual capabilities into progressively more complex tasks (k=1,2,3 capabilities), using Gemini for generation and verification.

5. 📊 Results and Evaluation: Using only 10% of standard training data, COMPACT achieved comparable or better performance than full-scale visual instruction tuning, with particularly strong improvements on complex tasks (83.3% improvement on MMStar and 94.0% on MM-Vet for tasks requiring 4+ capabilities).