2025-09-24 Papers

1/2

Paper 1

Reinforcement Learning on Pre-Training Data

Published: 2025-09-23

Link: http://arxiv.org/pdf/2509.19249

1. 📘 Topic and Domain: A new training paradigm called Reinforcement Learning on Pre-Training Data (RLPT) for optimizing Large Language Models.

2. 💡 Previous Research and New Ideas: Based on previous reinforcement learning approaches like RLHF and RLVR that rely on human annotation, this paper proposes using pre-training data directly for reinforcement learning without human feedback.

3. ❓ Problem: The growing disparity between computational resource scaling and finite high-quality text data availability that constrains conventional LLM training approaches.

4. 🛠️ Methods: Introduces next-segment reasoning objective with two tasks (Autoregressive Segment Reasoning and Middle Segment Reasoning) that rewards the model for accurately predicting subsequent text segments based on context.

5. 📊 Results and Evaluation: When applied to Qwen3-4B-Base, RLPT achieved significant improvements across multiple benchmarks (3.0-8.1 points on general domain tasks and 5.3-6.6 points on mathematical reasoning tasks) with favorable scaling behavior.

Reinforcement Learning on Pre-Training Data

1/2

Paper 2

Do You Need Proprioceptive States in Visuomotor Policies?

Published: 2025-09-23

Link: http://arxiv.org/pdf/2509.18644

1. 📘 Topic and Domain: Visuomotor policies for robotic manipulation, investigating whether proprioceptive state inputs are necessary for effective robot control.

2. 💡 Previous Research and New Ideas: Based on traditional imitation-learning visuomotor policies that use both visual and proprioceptive state inputs; proposes a novel "State-free Policy" that relies solely on visual inputs.

3. ❓ Problem: Addresses the limitation of state-based policies that overfit to training trajectories and show poor spatial generalization when manipulating objects in new positions.

4. 🛠️ Methods: Implements a State-free Policy using relative end-effector action space and dual wide-angle wrist cameras for full task observation, removing proprioceptive state inputs entirely.

5. 📊 Results and Evaluation: Achieved significantly improved spatial generalization (85% success in height generalization vs 0% with state input, 64% in horizontal generalization vs 6%), better data efficiency, and enhanced cross-embodiment adaptation across various robotic manipulation tasks.

Do You Need Proprioceptive States in Visuomotor Policies?

1/2

Paper 3

VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction

Published: 2025-09-23

Link: http://arxiv.org/pdf/2509.19297

1. 📘 Topic and Domain: Feed-forward 3D Gaussian Splatting for novel view synthesis using voxel-aligned prediction instead of traditional pixel-aligned approaches.

2. 💡 Previous Research and New Ideas: Based on previous pixel-aligned Gaussian Splatting methods, proposes a new voxel-aligned paradigm that predicts Gaussians from a 3D voxel grid rather than 2D pixels.

3. ❓ Problem: Addresses limitations of pixel-aligned methods including view-dependent density distributions, heavy reliance on input view numbers, and alignment errors in occluded or low-texture regions.

4. 🛠️ Methods: Uses a multi-view transformer for feature extraction, constructs 3D voxel features through unprojection, refines them with a sparse 3D U-Net, and predicts Gaussian parameters directly from the voxel grid.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance on RealEstate10K and ScanNet datasets with higher PSNR/SSIM scores while using fewer Gaussians, demonstrating better geometric consistency and efficiency than pixel-aligned methods.