2025-10-14 Papers

1/2

Paper 1

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Published: 2025-10-13

Link: http://arxiv.org/pdf/2510.11696

1. 📘 Topic and Domain: The paper focuses on quantization-enhanced reinforcement learning for Large Language Models (LLMs), specifically in the domain of model optimization and training efficiency.

2. 💡 Previous Research and New Ideas: Based on previous research in LLM quantization and reinforcement learning, the paper introduces the novel idea that quantization noise can actually benefit RL training by increasing policy entropy and exploration, contrary to its typically negative effects in supervised fine-tuning.

3. ❓ Problem: The paper addresses the high computational and memory costs of RL training for LLMs, which requires substantial GPU memory and long rollout durations.

4. 🛠️ Methods: The paper introduces QeRL, combining NVFP4 quantization with Low-Rank Adaptation (LoRA) and implementing an Adaptive Quantization Noise mechanism that dynamically adjusts noise during training to enhance exploration.

5. 📊 Results and Evaluation: QeRL achieves 1.5× speedup in rollout phase, enables RL training of 32B LLM on a single H100 GPU, and matches full-parameter fine-tuning performance on mathematical benchmarks (90.8% on GSM8K, 77.4% on MATH 500).

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

1/2

Paper 2

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

Published: 2025-10-13

Link: http://arxiv.org/pdf/2510.11712

1. 📘 Topic and Domain: High-fidelity panoramic image generation using hybrid training approaches in computer vision and deep learning.

2. 💡 Previous Research and New Ideas: Based on DiT (Diffusion Transformer) models and prior panoramic generation methods, proposes a novel hybrid training approach combining perspective and panoramic data across multiple representation levels.

3. ❓ Problem: Addresses the challenge of maintaining both geometric fidelity and photorealism in panoramic image generation, which has been limited by the scarcity of high-quality panoramic training data.

4. 🛠️ Methods: Implements a hybrid training framework with image-level regularization (perspective image guidance and panoramic refinement) and token-level supervision (circular padding, yaw loss, and cube loss).

5. 📊 Results and Evaluation: Achieves state-of-the-art performance across eleven quantitative metrics, demonstrating superior boundary consistency, image fidelity, and perceptual quality in text-to-panorama generation, inpainting, and outpainting tasks.

DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

1/2

Paper 3

Demystifying Reinforcement Learning in Agentic Reasoning

Published: 2025-10-13

Link: http://arxiv.org/pdf/2510.11701

1. 📘 Topic and Domain: The paper investigates reinforcement learning (RL) for agentic reasoning in large language models, focusing on how LLMs can effectively use external tools during reasoning.

2. 💡 Previous Research and New Ideas: Based on previous work in RL for language models and tool-integrated reasoning, it proposes new insights around data curation, algorithm design, and reasoning modes for agentic RL.

3. ❓ Problem: The paper aims to demystify and improve reinforcement learning for agentic reasoning by addressing challenges in data quality, algorithm optimization, and reasoning strategies.

4. 🛠️ Methods: The authors conduct systematic experiments analyzing three key aspects: real vs synthetic training data, exploration-friendly RL techniques (like clip higher and reward shaping), and different reasoning modes for tool use.

5. 📊 Results and Evaluation: Their approach enables a 4B parameter model to outperform 32B models on challenging benchmarks like AIME2024/2025, achieving 70.93%/68.13% accuracy, while establishing practical guidelines for effective agentic RL training.