1. 📘 Topic and Domain: The paper proposes a multi-agent framework called MAPS for solving multimodal scientific problems that involve both text and diagrams in fields like mathematics, physics, and chemistry.
2. 💡 Previous Research and New Ideas: Based on existing work in multimodal large language models (MLLMs), the paper introduces a novel multi-agent framework inspired by Big Seven Personality theory and Socratic guidance, representing a first attempt at using personality traits for agent specialization.
3. ❓ Problem: The paper addresses two key challenges in multimodal scientific problem-solving: the difficulty of multi-modal comprehensive reasoning and the lack of reflective/rethinking capabilities in existing models.
4. 🛠️ Methods: The paper implements a framework with seven distinct agents based on personality traits, using a progressive four-agent solving strategy and a Critic agent inspired by Socratic questioning to guide problem-solving through structured stages with continuous feedback.
5. 📊 Results and Evaluation: The framework achieved superior results across EMMA, Olympiad, and MathVista datasets, outperforming state-of-the-art models by 15.84% and slightly exceeding human expert performance by 3.58%.