2026-02-19 Papers

1/2

Paper 1

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

Published: 2026-02-18

Link: http://arxiv.org/pdf/2602.16705

1. 📘 Topic and Domain: The paper focuses on humanoid robot control for open-vocabulary visual loco-manipulation, combining perception and whole-body control for object grasping tasks.

2. 💡 Previous Research and New Ideas: The paper builds on existing humanoid tracking methods and vision-language models, proposing HERO - a modular system that combines accurate residual-aware end-effector tracking with large pre-trained vision models for generalization.

3. ❓ Problem: The paper addresses the challenge of enabling humanoid robots to accurately manipulate novel objects in novel environments using only onboard sensors, which requires both precise end-effector control and strong visual generalization.

4. 🛠️ Methods: The authors use a modular approach combining inverse kinematics, learned neural forward models for accurate pose estimation, motion planning with replanning, goal adjustment, and integration with pre-trained vision models (Grounding DINO, SAM, AnyGrasp).

5. 📊 Results and Evaluation: HERO achieves 2.5cm end-effector tracking error (3.2× better than baselines) and 83.8% success rate in real-world open-vocabulary object grasping across diverse objects and environments.

Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation

1/2

Paper 2

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

Published: 2026-02-15

Link: http://arxiv.org/pdf/2602.14111

1. 📘 Topic and Domain: Evaluating Sparse Autoencoders (SAEs) for neural network interpretability, specifically testing whether SAEs genuinely learn meaningful feature decompositions in language models.

2. 💡 Previous Research and New Ideas: Based on prior work using SAEs to decompose neural activations into interpretable features (Bricken et al., 2023), the paper proposes novel sanity checks using random baselines to test if SAEs truly discover meaningful features or merely optimize metrics.

3. ❓ Problem: The paper addresses the fundamental question of whether SAEs actually recover true underlying features in neural networks or if their apparent success on standard metrics is misleading.

4. 🛠️ Methods: Two complementary approaches: (1) synthetic experiments with known ground-truth features to test SAE recovery, and (2) comparing fully-trained SAEs against three frozen random baselines (Frozen Decoder, Soft-Frozen Decoder, Frozen Encoder) on real LLM activations.

5. 📊 Results and Evaluation: SAEs recover only 9% of true features despite 71% explained variance in synthetic tests; frozen random baselines match fully-trained SAEs on interpretability (0.87 vs 0.90), sparse probing (0.69 vs 0.72), and causal editing (0.73 vs 0.72), suggesting SAEs don't reliably learn meaningful features.

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

1/2

Paper 3

Experiential Reinforcement Learning

Published: 2026-02-14

Link: http://arxiv.org/pdf/2602.13949

1. 📘 Topic and Domain: The paper focuses on reinforcement learning for language models, specifically in the domain of agentic reasoning and sparse-reward environments.

2. 💡 Previous Research and New Ideas: The paper builds on standard reinforcement learning with verifiable rewards (RLVR) and proposes Experiential Reinforcement Learning (ERL), which adds an explicit experience-reflection-consolidation loop where models generate self-reflections to guide improved second attempts.

3. ❓ Problem: The paper addresses the challenge of learning from sparse and delayed environmental feedback in reinforcement learning, where models struggle to implicitly infer how failures should translate into behavioral improvements.

4. 🛠️ Methods: ERL employs a four-stage process: initial attempt, self-reflection generation based on feedback, refined second attempt guided by reflection, and internalization through selective distillation to consolidate improvements into the base policy.

5. 📊 Results and Evaluation: ERL outperforms RLVR across all tested environments, achieving gains of up to +81% in Sokoban, +27% in FrozenLake, and +11% in HotpotQA, demonstrating improved learning efficiency and final performance.