2025-11-21 Papers

1/2

Paper 1

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

Published: 2025-11-20

Link: http://arxiv.org/pdf/2511.16668

1. 📘 Topic and Domain: The paper introduces V-ReasonBench, a benchmark suite for evaluating reasoning capabilities in video generation models across four dimensions: structured problem-solving, spatial cognition, pattern-based inference, and physical dynamics.

2. 💡 Previous Research and New Ideas: Based on Chain-of-Frame paradigm in video generation and Chain-of-Thought in language models, it proposes a new unified benchmark framework focusing specifically on evaluating reasoning abilities rather than just visual quality.

3. ❓ Problem: The paper addresses the lack of systematic and reliable evaluation methods for assessing reasoning capabilities in video generation models.

4. 🛠️ Methods: The benchmark uses a hybrid evaluation strategy combining mask-based, grid-based, and VLM-based evaluation methods across 326 reasoning instances, with pass@k as the primary metric for assessment.

5. 📊 Results and Evaluation: Testing six state-of-the-art video models revealed varying strengths across different reasoning dimensions, with Sora-2 leading overall (43.86% average), followed by Hailuo-02 (37.52%), while the benchmark achieved 97.09% alignment with human judgment.

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

1/2

Paper 2

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

Published: 2025-11-20

Link: http://arxiv.org/pdf/2511.16669

1. 📘 Topic and Domain: Video-Next-Event Prediction (VNEP), using video generation as an answer modality for predicting future events in video understanding and AI reasoning.

2. 💡 Previous Research and New Ideas: Based on Next-Event Prediction (NEP) which generates textual descriptions of future events; introduces a novel paradigm of using generated videos instead of text to demonstrate predicted events.

3. ❓ Problem: The limitation of text-only answers in conveying complex physical actions and procedures, requiring a solution that can provide more intuitive and customized visual demonstrations.

4. 🛠️ Methods: Proposes VANS model using Joint-GRPO (Group Relative Policy Optimization) to align a Vision-Language Model with a Video Diffusion Model through reinforcement learning, trained on their newly created VANS-Data-100K dataset.

5. 📊 Results and Evaluation: Achieves state-of-the-art performance in both event prediction accuracy and video generation quality, with significant improvements in ROUGE-L scores (0.3631), CLIP-V scores (0.8021), and reduced FVD (78.32) compared to baseline methods.

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

1/2

Paper 3

What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity

Published: 2025-11-19

Link: http://arxiv.org/pdf/2511.15593

1. 📘 Topic and Domain: Study of ideation diversity in AI research agents' performance on machine learning tasks using the MLE-bench benchmark.

2. 💡 Previous Research and New Ideas: Based on previous work on AI research agents and automated machine learning tools, proposes new methods to quantify and control agents' ideation diversity.

3. ❓ Problem: Understanding what factors drive success in AI research agents' performance, specifically focusing on whether ideation diversity is a key bottleneck.

4. 🛠️ Methods: Analyzed 11,000 agent trajectories across different models and scaffolds, measured diversity using Shannon entropy of model architectures, and conducted controlled experiments by modifying prompts to affect diversity levels.

5. 📊 Results and Evaluation: Found strong correlation between ideation diversity and agent performance, with higher-diversity agents achieving better results across multiple evaluation metrics, demonstrating that ideation diversity is indeed a key performance factor.