1. 📘 Topic and Domain: The paper presents tttLRM, a Large Reconstruction Model that leverages Test-Time Training (TTT) for high-resolution, long-context, and autoregressive 3D reconstruction from multiple images.
2. 💡 Previous Research and New Ideas: The paper builds on Large Reconstruction Models (LRMs) and Test-Time Training approaches, proposing a novel architecture that interprets TTT fast weights as implicit 3D representations that can be decoded into explicit formats like 3D Gaussian Splatting with linear computational complexity.
3. ❓ Problem: The paper aims to solve the limitation of existing 3D reconstruction methods that either require slow per-scene optimization or are restricted to processing only a few input views due to quadratic attention complexity.
4. 🛠️ Methods: The authors use LaCT (Large Chunk Test-Time Training) blocks with linear complexity, encode input images as tokens that update fast weights during inference, and query these weights with virtual tokens to decode into explicit 3D representations like Gaussian Splats.
5. 📊 Results and Evaluation: The method achieves state-of-the-art performance on object (GSO) and scene-level (DL3DV-140, Tanks&Temples) datasets, outperforming GS-LRM and Long-LRM in PSNR/SSIM/LPIPS metrics while supporting up to 64 input views and being hundreds of times faster than optimization-based methods.