2025-07-28 Papers

1/2

Paper 1

The Invisible Leash: Why RLVR May Not Escape Its Origin

Published: 2025-07-20

Link: http://arxiv.org/pdf/2507.14843

1. 📘 Topic and Domain: The paper examines the limitations of Reinforcement Learning with Verifiable Rewards (RLVR) in large language models, specifically focusing on reasoning capabilities and model behavior.

2. 💡 Previous Research and New Ideas: Based on recent advances in large reasoning models using RLVR, the paper proposes a new theoretical framework showing that RLVR is constrained by the base model's support and operates as a conservative reweighting mechanism.

3. ❓ Problem: The paper investigates whether RLVR truly expands a model's reasoning capabilities or merely amplifies existing high-reward outputs from the base model.

4. 🛠️ Methods: The authors conduct theoretical analysis and empirical experiments across various reasoning tasks, examining empirical support dynamics, entropy metrics, and performance on mathematical and non-mathematical reasoning benchmarks.

5. 📊 Results and Evaluation: Results show that while RLVR improves pass@1 accuracy, it tends to shrink rather than expand the model's empirical support, with entropy reduction leading to narrower solution spaces and potentially missing valid solutions accessible to the base model.

The Invisible Leash: Why RLVR May Not Escape Its Origin

1/2

Paper 2

Pixels, Patterns, but No Poetry: To See The World like Humans

Published: 2025-07-21

Link: http://arxiv.org/pdf/2507.16863

1. 📘 Topic and Domain: The paper focuses on evaluating and testing the visual perception capabilities of Multimodal Large Language Models (MLLMs) through a new benchmark called Turing Eye Test (TET).

2. 💡 Previous Research and New Ideas: Previous research focused on reasoning capabilities of MLLMs, while this paper proposes a novel approach by shifting focus to testing fundamental visual perception abilities through specialized perceptual tasks.

3. ❓ Problem: The paper addresses whether MLLMs can truly perceive visual information like humans do, revealing a fundamental gap between machine and human perception capabilities.

4. 🛠️ Methods: The authors created four diagnostic tasks (HiddenText, 3DCaptcha, ColorBlind, and ChineseLigatures) and evaluated 15 state-of-the-art MLLMs using Pass@1 and Pass@K metrics, along with analyzing model behavior through Grad-CAM visualization.

5. 📊 Results and Evaluation: Results showed catastrophic failures of current MLLMs on these perceptual tasks, with most models achieving near-zero success rates, while fine-tuning the vision tower enabled rapid adaptation, suggesting the limitation lies in visual perception rather than reasoning capabilities.

Pixels, Patterns, but No Poetry: To See The World like Humans

1/2

Paper 3

nablaNABLA: Neighborhood Adaptive Block-Level Attention

Published: 2025-07-17

Link: http://arxiv.org/pdf/2507.13546

1. 📘 Topic and Domain: Video generation using transformer models, specifically focusing on optimizing attention mechanisms in video diffusion transformers.

2. 💡 Previous Research and New Ideas: Based on previous work in sparse attention mechanisms and Sliding Tile Attention (STA), proposes a novel adaptive approach called NABLA that dynamically determines attention patterns rather than using fixed patterns.

3. ❓ Problem: Addresses the quadratic computational complexity of full attention mechanisms in video generation transformers, which becomes a bottleneck for high-resolution and long-duration videos.

4. 🛠️ Methods: Implements a Neighborhood Adaptive Block-Level Attention mechanism that uses downsampling and thresholding to dynamically select important attention blocks, combined with STA for optimal performance.

5. 📊 Results and Evaluation: Achieved 2.7× faster training and inference compared to baseline models while maintaining equivalent quality metrics (CLIP score, VBench score, human evaluation), with successful validation through both objective metrics and human evaluation studies.