1. 📘 Topic and Domain: The paper presents UniversalRAG, a framework for retrieval-augmented generation that works across multiple modalities (text, image, video) and granularities of information.
2. 💡 Previous Research and New Ideas: Previous RAG approaches were limited to single modalities or unified embeddings that suffered from modality gaps; this paper proposes modality-aware routing and multi-granular retrieval.
3. ❓ Problem: The paper aims to solve the limitations of existing RAG systems that can't effectively handle queries requiring different types of knowledge sources (text, images, videos) and different levels of detail.
4. 🛠️ Methods: The paper implements a routing mechanism that dynamically selects the most appropriate modality and granularity level for each query, maintaining separate embedding spaces for different modalities and offering both training-free and trained router options.
5. 📊 Results and Evaluation: UniversalRAG outperformed baseline approaches across 8 multimodal benchmarks, with trained routers achieving better performance on in-domain queries while GPT-4o showed stronger generalization to out-of-domain queries.