1. 📘 Topic and Domain: This paper introduces MMHU, a large-scale multimodal benchmark dataset for understanding human behavior in autonomous driving scenarios.
2. 💡 Previous Research and New Ideas: Previous research focused on individual aspects of human behavior (motion, intention, trajectory) in driving, but this paper proposes the first unified comprehensive dataset combining multiple behavior aspects with rich annotations.
3. ❓ Problem: The lack of a unified benchmark dataset for evaluating algorithms that comprehensively understand human behaviors in autonomous driving scenarios, which is crucial for driving safety.
4. 🛠️ Methods: The authors developed a human-in-the-loop annotation pipeline to collect and label 57k human instances from diverse video sources (Waymo, YouTube, self-collected), providing motion data, trajectories, text descriptions, and critical behavior labels.
5. 📊 Results and Evaluation: The dataset improved performance across multiple tasks when used for model training - motion prediction accuracy improved by 9.49 MPJPE, intention prediction accuracy increased by 7.4%, behavior QA accuracy rose by 15.96%, and motion generation showed significant qualitative improvements in driving scenarios.