1. 📘 Topic and Domain: The paper introduces LMEB (Long-horizon Memory Embedding Benchmark), a comprehensive evaluation framework for text embeddings focused on long-term, context-dependent memory retrieval tasks.
2. 💡 Previous Research and New Ideas: The paper builds on existing text embedding benchmarks like MTEB, BEIR, and MIRACL that focus on traditional passage retrieval, proposing a new benchmark that specifically evaluates models' ability to handle fragmented, temporally distant, and context-dependent memory retrieval across episodic, dialogue, semantic, and procedural memory types.
3. ❓ Problem: Current embedding benchmarks fail to adequately evaluate models' capacity to handle long-horizon memory retrieval tasks that involve recalling fragmented, context-dependent information over extended periods, leaving a gap in understanding how models perform in memory-intensive scenarios.
4. 🛠️ Methods: The authors compiled 22 datasets across 4 memory types with 193 zero-shot retrieval tasks, evaluated 15 embedding models (ranging from 239M to 12B parameters) using NDCG@10 and Recall@10 metrics, and analyzed correlations between LMEB and MTEB performance.
5. 📊 Results and Evaluation: The best model achieved 61.41 Mean (Dataset) score on NDCG@10, larger models didn't consistently outperform smaller ones, and LMEB showed orthogonality to MTEB (Pearson correlation: -0.115, Spearman: -0.130), indicating that traditional passage retrieval performance doesn't generalize to long-horizon memory retrieval.