2025-05-22 Papers

1/2

Paper 1

Scaling Law for Quantization-Aware Training

Published: 2025-05-20

Link: http://arxiv.org/pdf/2505.14302

1. 📘 Topic and Domain: The paper explores scaling laws for Quantization-Aware Training (QAT) in Large Language Models (LLMs), focusing on understanding how model quantization performance scales with different parameters.

2. 💡 Previous Research and New Ideas: Based on previous scaling laws like Kaplan and Chinchilla, the paper proposes a new unified scaling law that uniquely incorporates model size, training data volume, and quantization granularity, unlike previous work that only considered model size.

3. ❓ Problem: The paper addresses the lack of understanding of how QAT behaves at 4-bit precision (W4A4), particularly how quantization error relates to model size, training data, and quantization granularity.

4. 🛠️ Methods: The authors conducted 268 QAT experiments with various model sizes and training configurations, decomposed quantization error into weight and activation components, and developed a mathematical model to predict quantization error.

5. 📊 Results and Evaluation: The study found that quantization error decreases with larger models but increases with more training tokens and coarser quantization granularity, and identified that activation quantization in the FC2 layer is the primary bottleneck for W4A4 QAT performance.

Scaling Law for Quantization-Aware Training

1/2

Paper 2

IA-T2I: Internet-Augmented Text-to-Image Generation

Published: 2025-05-21

Link: http://arxiv.org/pdf/2505.15779

1. 📘 Topic and Domain: Text-to-image generation with internet-augmented knowledge integration, in the domain of computer vision and artificial intelligence.

2. 💡 Previous Research and New Ideas: Based on existing text-to-image models like Stable Diffusion and ControlNet, proposes a novel framework to augment these models with real-time internet-retrieved reference images.

3. ❓ Problem: Addresses the challenge of T2I models failing to generate accurate images when text prompts contain uncertain knowledge (rare, unknown, or ambiguous concepts).

4. 🛠️ Methods: Implements an IA-T2I framework with active retrieval, query generation, hierarchical image selection, augmented generation, and self-reflection mechanisms to integrate internet-sourced reference images.

5. 📊 Results and Evaluation: Outperformed baseline GPT-4o by approximately 30% in human evaluation on their Img-Ref-T2I dataset, with automated GPT-4o evaluation achieving comparable results to human preference evaluation.

IA-T2I: Internet-Augmented Text-to-Image Generation

1/2

Paper 3

UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

Published: 2025-05-20

Link: http://arxiv.org/pdf/2505.14231

1. 📘 Topic and Domain: Universal visual grounding with reinforcement learning, focusing on localizing objects in images based on complex textual instructions.

2. 💡 Previous Research and New Ideas: Based on traditional visual grounding methods and recent large language models, proposing new ideas of combining reasoning-guided multimodal language models with reinforcement learning for better cross-image understanding.

3. ❓ Problem: Addressing the limitation of current visual grounding methods that struggle with implicit and complex instructions across multiple images due to lack of advanced reasoning capabilities.

4. 🛠️ Methods: Two-stage approach: (1) Chain-of-Thought supervised fine-tuning using a high-quality annotated dataset, and (2) Group Relative Policy Optimization with a novel difficulty-aware weight adjustment strategy.

5. 📊 Results and Evaluation: Achieved state-of-the-art performance on MIG-Bench with 9.1% improvement over previous methods, and demonstrated strong zero-shot generalization with 23.4% average improvement across four image and video reasoning grounding benchmarks.