2025-09-05 Papers

1/2

Paper 1

From Editor to Dense Geometry Estimator

Published: 2025-09-04

Link: http://arxiv.org/pdf/2509.04338

1. 📘 Topic and Domain: Dense geometry prediction (depth and normal estimation) from single images using image editing models.

2. 💡 Previous Research and New Ideas: Based on text-to-image generative models for dense prediction; newly proposes using image editing models instead of generative models as they better align with image-to-image tasks.

3. ❓ Problem: Existing generative models lack inherent understanding of geometric cues from input images, leading to suboptimal performance in dense geometry estimation.

4. 🛠️ Methods: Adapts Step1X-Edit model using consistent velocity flow matching, logarithmic quantization for precision, and cost-free joint estimation of depth and normals through global attention.

5. 📊 Results and Evaluation: Achieves over 35% performance improvement on ETH3D dataset and outperforms DepthAnything series (trained on 100x more data) across multiple zero-shot depth and normal estimation benchmarks.

From Editor to Dense Geometry Estimator

1/2

Paper 2

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

Published: 2025-09-03

Link: http://arxiv.org/pdf/2509.03867

1. 📘 Topic and Domain: The paper introduces "Drivelology" - the study of nonsensical yet meaningful language expressions - in the domain of natural language processing and linguistic analysis.

2. 💡 Previous Research and New Ideas: Based on previous research on humor, sarcasm and irony detection, this paper proposes a novel concept of "nonsense with depth" that goes beyond simple semantic inversion or contradiction.

3. ❓ Problem: The paper aims to evaluate whether large language models can truly understand and reason about linguistically complex expressions that appear nonsensical but contain deeper meaning.

4. 🛠️ Methods: The authors created DRIVEL HUB - a multilingual dataset of 1,200 examples with expert annotations, and designed four tasks (detection, tagging, narrative writing, selection) to evaluate LLMs' comprehension abilities.

5. 📊 Results and Evaluation: The results showed that current LLMs struggle with understanding deeper semantic layers of Drivelology, with even top models achieving limited performance, especially on harder reasoning tasks requiring cultural context and pragmatic understanding.

Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth

1/2

Paper 3

Towards a Unified View of Large Language Model Post-Training

Published: 2025-09-04

Link: http://arxiv.org/pdf/2509.04419

1. 📘 Topic and Domain: Theoretical unification of large language model post-training methods, specifically focusing on supervised fine-tuning (SFT) and reinforcement learning (RL) approaches in machine learning.

2. 💡 Previous Research and New Ideas: Based on existing SFT and RL post-training methods; proposes a novel unified theoretical framework showing these approaches are instances of a single optimization process rather than contradictory methods.

3. ❓ Problem: Addresses the lack of theoretical understanding of why SFT and RL can be effectively combined in LLM training, and aims to create a more efficient alternative to the resource-intensive sequential SFT-then-RL pipeline.

4. 🛠️ Methods: Introduces a Unified Policy Gradient Estimator (UPGE) that combines four components (stabilization mask, reference policy denominator, advantage estimate, and likelihood gradient), and develops Hybrid Post-Training (HPT) algorithm that dynamically switches between SFT and RL based on performance feedback.

5. 📊 Results and Evaluation: HPT consistently outperformed baselines across six mathematical reasoning benchmarks and two out-of-distribution suites, achieving a 7-point gain over the strongest baseline on AIME 2024 using Qwen2.5-Math-7B, and showed substantial improvements on smaller models like Qwen2.5-Math-1.5B and Llama3.1-8B.