2026-01-02 Papers

1/2

Paper 1

mHC: Manifold-Constrained Hyper-Connections

Published: 2025-12-31

Link: http://arxiv.org/pdf/2512.24880

1. 📘 Topic and Domain: The paper proposes a new neural network architecture called Manifold-Constrained Hyper-Connections (mHC) in the domain of deep learning model design, specifically focused on improving residual connections in large language models.

2. 💡 Previous Research and New Ideas: The paper builds upon Hyper-Connections (HC) which expanded residual stream width, and proposes a new framework that projects residual connections onto a specific manifold to maintain stability while preserving performance benefits.

3. ❓ Problem: The paper addresses the instability and scalability issues in HC caused by compromised identity mapping properties when expanding residual stream width and diversifying connectivity patterns.

4. 🛠️ Methods: The paper employs the Sinkhorn-Knopp algorithm to project residual mappings onto the Birkhoff polytope (doubly stochastic matrices), while incorporating kernel fusion and infrastructure optimizations for efficiency.

5. 📊 Results and Evaluation: The method achieved superior stability and scalability compared to HC while maintaining performance advantages, with only 6.7% additional time overhead when tested on language model pre-training tasks across various model sizes (3B, 9B, and 27B parameters).

mHC: Manifold-Constrained Hyper-Connections

1/2

Paper 2

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Published: 2025-12-29

Link: http://arxiv.org/pdf/2512.23447

1. 📘 Topic and Domain: The paper focuses on improving Mixture-of-Experts (MoE) language models by enhancing the coupling between expert modules and router components.

2. 💡 Previous Research and New Ideas: Based on previous MoE architectures and routing mechanisms, the paper proposes a novel expert-router coupling (ERC) loss that ensures better alignment between router decisions and expert capabilities.

3. ❓ Problem: The paper addresses the lack of explicit constraints in MoE models that ensure router decisions align well with expert capabilities, which limits model performance.

4. 🛠️ Methods: The authors introduce an ERC loss that treats each expert's router embedding as a proxy token, feeds perturbed embeddings through experts to obtain activations, and enforces constraints to ensure proper coupling between routers and experts.

5. 📊 Results and Evaluation: Through pre-training experiments on models from 3B to 15B parameters using trillions of tokens, the ERC loss significantly improved model performance while maintaining computational efficiency, with only 0.2-0.8% overhead during training.

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

1/2

Paper 3

GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction

Published: 2025-12-31

Link: http://arxiv.org/pdf/2512.25073

1. 📘 Topic and Domain: The paper presents GaMO, a geometry-aware multi-view diffusion outpainting method for 3D scene reconstruction from sparse camera views in computer vision.

2. 💡 Previous Research and New Ideas: Previous work focused on novel view generation and regularization techniques for sparse-view reconstruction, while this paper introduces a new outpainting approach that expands existing views rather than generating new ones.

3. ❓ Problem: The paper addresses the challenge of reconstructing complete 3D scenes from limited input views, which typically results in holes, ghosting artifacts, and geometric inconsistencies.

4. 🛠️ Methods: The method uses a three-stage pipeline: coarse 3D initialization to obtain geometry priors, geometry-aware multi-view outpainting using a diffusion model with mask latent blending and iterative mask scheduling, and final 3D Gaussian Splatting refinement.

5. 📊 Results and Evaluation: The approach achieves state-of-the-art performance on Replica and ScanNet++ datasets across 3, 6, and 9 input views, with significant improvements in PSNR, SSIM, and LPIPS metrics while being 25x faster than previous methods.