2025-03-24 Papers

Paper 1

MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Published: 2025-03-21

Link: http://arxiv.org/pdf/2503.16905

1. 📘 Topic and Domain: The paper proposes a multi-agent framework called MAPS for solving multimodal scientific problems that involve both text and diagrams in fields like mathematics, physics, and chemistry.

2. 💡 Previous Research and New Ideas: Based on existing work in multimodal large language models (MLLMs), the paper introduces a novel multi-agent framework inspired by Big Seven Personality theory and Socratic guidance, representing a first attempt at using personality traits for agent specialization.

3. ❓ Problem: The paper addresses two key challenges in multimodal scientific problem-solving: the difficulty of multi-modal comprehensive reasoning and the lack of reflective/rethinking capabilities in existing models.

4. 🛠️ Methods: The paper implements a framework with seven distinct agents based on personality traits, using a progressive four-agent solving strategy and a Critic agent inspired by Socratic questioning to guide problem-solving through structured stages with continuous feedback.

5. 📊 Results and Evaluation: The framework achieved superior results across EMMA, Olympiad, and MathVista datasets, outperforming state-of-the-art models by 15.84% and slightly exceeding human expert performance by 3.58%.

Paper 2

MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization

Published: 2025-03-21

Link: http://arxiv.org/pdf/2503.16874

1. 📘 Topic and Domain: Automated prompt optimization (APO) for large language models in natural language processing.

2. 💡 Previous Research and New Ideas: Based on previous research in prompt optimization techniques like generation-search and meta prompts, this paper proposes a novel multi-agent framework incorporating Socratic dialogue for systematic prompt optimization.

3. ❓ Problem: The paper aims to solve two key issues in existing APO methods: limited flexibility of fixed templates and inefficient search in prompt spaces.

4. 🛠️ Methods: The paper develops MARS, a multi-agent framework with seven specialized agents including a Planner for optimization path design and a Teacher-Critic-Student system that uses Socratic guidance dialogue patterns for iterative prompt refinement.

5. 📊 Results and Evaluation: MARS outperformed previous state-of-the-art methods by 6.04% on general tasks and 6.42% on domain-specific tasks, demonstrating superior effectiveness in prompt optimization across multiple datasets and evaluation metrics.

Paper 3

When Less is Enough: Adaptive Token Reduction for Efficient Image Representation

Published: 2025-03-20

Link: http://arxiv.org/pdf/2503.16660

1. 📘 Topic and Domain: The paper focuses on adaptive token reduction for efficient image representation in vision transformers and multimodal models.

2. 💡 Previous Research and New Ideas: Based on previous token pruning and merging methods in vision transformers, the paper proposes a novel autoencoder-based approach with Gumbel-Softmax selection to identify and retain only the most informative visual tokens.

3. ❓ Problem: The paper addresses the challenge of reducing computational costs in vision encoders that typically generate large numbers of visual tokens, many of which may be redundant or irrelevant.

4. 🛠️ Methods: The authors implement a trainable autoencoder with Gumbel-Softmax mechanism to select informative features, consisting of a Feature Selector that creates binary masks and a Feature Reconstructor that restores masked tokens.

5. 📊 Results and Evaluation: Testing with LLaVA-NEXT and LLaVA-OneVision models showed that up to 50% of visual features could be removed with minimal performance loss on OCR tasks, while general domain tasks maintained performance even with only 30% of tokens retained.