1. 📘 Topic and Domain: Language-centric omnimodal representation learning in multimodal large language models (MLLMs), focusing on cross-modal alignment and embedding capabilities.
2. 💡 Previous Research and New Ideas: Based on previous CLIP-style and MLLM-based embedding approaches, proposing that MLLMs achieve implicit cross-modal alignment during generative pretraining, allowing for lightweight contrastive learning refinement.
3. ❓ Problem: Understanding why MLLM-based embedding approaches outperform traditional CLIP-based models and developing more efficient methods for cross-modal representation learning.
4. 🛠️ Methods: Developed LCO-EMB framework using language-centric paired data for contrastive learning refinement, analyzed through anisotropy and kernel similarity studies, and validated on various benchmarks.
5. 📊 Results and Evaluation: Achieved state-of-the-art performance across diverse modalities and benchmarks, discovered a Generation-Representation Scaling Law showing representation capabilities scale with generative abilities, and validated findings on a challenging visual-document retrieval task.