2025-06-24 Papers

1/2

Paper 1

Light of Normals: Unified Feature Representation for Universal Photometric Stereo

Published: 2025-06-23

Link: http://arxiv.org/pdf/2506.18882

1. 📘 Topic and Domain: Universal photometric stereo - a computer vision technique for reconstructing 3D surface normals from multiple images captured under varying lighting conditions.

2. 💡 Previous Research and New Ideas: Based on previous encoder-decoder approaches like UniPS and SDM-UniPS; introduces new light register tokens and wavelet transforms to better decouple lighting from surface features.

3. ❓ Problem: Addresses two key challenges: 1) Decoupling illumination variations from surface normal features, and 2) Preserving high-frequency geometric details in complex surfaces.

4. 🛠️ Methods: Employs LINO-UniPS architecture with: light register tokens and global cross-image attention for lighting-normal decoupling, wavelet transform for detail preservation, and normal-gradient confidence loss; also introduces PS-Verse dataset with graded geometric complexity.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance on public benchmarks (DiLiGenT, LUCES), with improved feature consistency (higher CSIM/SSIM scores) and better normal reconstruction accuracy compared to existing methods, especially for complex geometries.

Light of Normals: Unified Feature Representation for Universal Photometric Stereo

1/2

Paper 2

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

Published: 2025-06-23

Link: http://arxiv.org/pdf/2506.18841

1. 📘 Topic and Domain: The paper focuses on ultra-long text generation using large language models through reinforcement learning in the domain of natural language processing.

2. 💡 Previous Research and New Ideas: Based on previous approaches like LongWriter that used supervised fine-tuning on synthetic data, this paper proposes a novel incentivization-based approach using reinforcement learning without relying on annotated or synthetic data.

3. ❓ Problem: The paper aims to solve the challenges of ultra-long text generation, including maximum length limits and quality degradation as sequence length increases in large language models.

4. 🛠️ Methods: The authors use Group Relative Policy Optimization (GRPO) for RL training, with specialized reward models targeting length control, writing quality, and structural formatting, combined with continual pretraining and a "think" prompting strategy.

5. 📊 Results and Evaluation: LongWriter-Zero, trained from Qwen2.5-32B, outperformed traditional SFT methods and achieved state-of-the-art results on WritingBench and Arena-Write benchmarks, surpassing even 100B+ models like DeepSeek R1 and Qwen3-235B.

LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning

1/2

Paper 3

RLPR: Extrapolating RLVR to General Domains without Verifiers

Published: 2025-06-22

Link: http://arxiv.org/pdf/2506.18254

1. 📘 Topic and Domain: Reinforcement learning for language models, specifically extending RLVR (Reinforcement Learning with Verifiable Rewards) to general domains beyond mathematics and code.

2. 💡 Previous Research and New Ideas: Based on RLVR which uses domain-specific verifiers for reward signals, proposes using LLM's intrinsic probability of generating correct answers as reward signals instead of external verifiers.

3. ❓ Problem: RLVR's reliance on domain-specific verifiers limits its scalability and application to general domains, as creating verifiers for diverse natural language tasks is prohibitively complex.

4. 🛠️ Methods: Introduces RLPR framework that uses token probabilities of reference answers as rewards, implements reward debiasing to remove question/answer biases, and applies standard deviation filtering to stabilize training.

5. 📊 Results and Evaluation: Achieved consistent improvements across both mathematical and general reasoning tasks, surpassing verifier-based methods by 1.6 points on average across seven benchmarks and outperforming concurrent verifier-free approaches by 7.6 points on TheoremQA.