1. 📘 Topic and Domain: This paper introduces GENIUS, a benchmark for evaluating Generative Fluid Intelligence (GFI) in unified multimodal models, focusing on their ability to perform dynamic reasoning and adaptation in visual generation tasks rather than just retrieving pre-trained knowledge.
2. 💡 Previous Research and New Ideas: The paper builds on the Cattell-Horn-Carroll theory of intelligence that distinguishes between Crystallized Intelligence (knowledge retrieval) and Fluid Intelligence (novel problem solving), proposing the first formal definition and benchmark for Generative Fluid Intelligence with three core dimensions: Implicit Pattern Induction, Ad-hoc Constraint Execution, and Contextual Knowledge Adaptation.
3. ❓ Problem: The paper addresses the gap in evaluating whether current unified multimodal models possess true general intelligence for visual generation, as existing benchmarks primarily assess memorized knowledge rather than the ability to reason, adapt, and solve novel visual generation problems on the fly.
4. 🛠️ Methods: The authors created a manually curated benchmark with 510 expert-designed samples across 5 tasks and 20 sub-tasks, employed hybrid evaluation using LMM-as-a-judge with three metrics (Rule Compliance, Visual Consistency, Aesthetic Quality), and proposed a training-free attention adjustment mechanism based on theoretical analysis of in-context learning as implicit fine-tuning.
5. 📊 Results and Evaluation: The systematic evaluation of 12 models revealed significant performance deficits with even the best proprietary model (Nano Banana Pro) achieving only 57.19% overall score, demonstrating that current models struggle with fluid intelligence tasks and often prioritize aesthetic quality over logical rule compliance, while the proposed attention mechanism showed consistent improvements across all tasks.