1. 📘 Topic and Domain: The paper focuses on developing high-performance Multimodal Reward Models (MRMs) for aligning Multimodal Large Language Models with human preferences in the domain of AI alignment and multimodal machine learning.
2. 💡 Previous Research and New Ideas: Based on existing reward modeling approaches for text-only LLMs, the paper proposes a systematic guide for building MRMs by investigating every crucial component in the development pipeline, introducing BaseReward as a simple yet effective architecture.
3. ❓ Problem: The paper addresses the lack of a systematic guide for building state-of-the-art Multimodal Reward Models, which are crucial for aligning MLLMs with human preferences.
4. 🛠️ Methods: The authors conducted comprehensive experimental analyses of reward modeling paradigms, reward head architecture, training strategies, data curation, backbone model selection, and ensemble methods, ultimately developing BaseReward using a Qwen2.5-VL backbone with an optimized two-layer reward head.
5. 📊 Results and Evaluation: BaseReward established new state-of-the-art performance on major benchmarks, including MM-RLHF-Reward Bench (11% improvement), VL-Reward Bench (18% improvement), and showed consistent performance gains when integrated into reinforcement learning pipelines across various tasks.