2025-03-25 Papers

Paper 1

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

Published: 2025-03-24

Link: http://arxiv.org/pdf/2503.18878

1. 📘 Topic and Domain: Interpreting reasoning mechanisms in Large Language Models using Sparse Autoencoders to identify and analyze specific features responsible for reasoning capabilities.

2. 💡 Previous Research and New Ideas: Based on work showing LLMs represent concepts as linear directions in activation spaces; introduces novel approach using Sparse Autoencoders to specifically isolate reasoning-related features.

3. ❓ Problem: Understanding how reasoning capabilities are internally encoded within Large Language Models, which has remained unexplored despite advances in LLM reasoning abilities.

4. 🛠️ Methods: Used Sparse Autoencoders to decompose model activations, developed ReasonScore metric to identify reasoning features, and validated through empirical analysis, interpretability techniques, and feature steering experiments.

5. 📊 Results and Evaluation: Identified 30 features responsible for reasoning, demonstrated that amplifying these features systematically improved reasoning performance across multiple benchmarks while increasing output length by 14-29%.

Paper 2

Video-T1: Test-Time Scaling for Video Generation

Published: 2025-03-24

Link: http://arxiv.org/pdf/2503.18942

1. 📘 Topic and Domain: The paper explores test-time scaling (TTS) for video generation, operating in the domain of computer vision and generative AI.

2. 💡 Previous Research and New Ideas: Based on previous research in LLM test-time scaling and video diffusion models, the paper proposes a novel framework that reinterprets video generation as a path-searching problem from Gaussian noise space to target video distribution.

3. ❓ Problem: The paper aims to improve video generation quality without expensive model retraining by leveraging additional inference-time computation during the testing phase.

4. 🛠️ Methods: The authors develop two approaches: a random linear search strategy and a more efficient Tree-of-Frames (ToF) search method that adaptively expands and prunes video branches in an autoregressive manner, guided by test-time verifiers.

5. 📊 Results and Evaluation: The experiments demonstrated that increasing test-time computation consistently led to significant improvements in video quality and human-preference alignment across different benchmark dimensions, with ToF search achieving comparable results at lower computational costs.

Paper 3

Aether: Geometric-Aware Unified World Modeling

Published: 2025-03-24

Link: http://arxiv.org/pdf/2503.18945

1. 📘 Topic and Domain: A unified world modeling framework called AETHER for 4D reconstruction, video prediction, and visual planning in computer vision and AI.

2. 💡 Previous Research and New Ideas: Based on video generation models like CogVideoX, introduces novel integration of geometric reconstruction with generative modeling by incorporating depth estimation, camera pose tracking, and action-conditioned prediction.

3. ❓ Problem: Addresses the challenge of developing AI systems with human-like spatial reasoning capabilities by unifying reconstruction, prediction and planning in a single model.

4. 🛠️ Methods: Uses a multi-task learning approach combining video diffusion models with depth/camera pose estimation, trained on synthetic 4D data using a custom annotation pipeline, and employs geometric-aware raymap representations for camera trajectories.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance in zero-shot reconstruction tasks, outperforming specialized models, and demonstrates effective video prediction and visual planning capabilities when tested on both synthetic and real-world data.