2025-06-19 Papers

1/2

Paper 1

Sekai: A Video Dataset towards World Exploration

Published: 2025-06-18

Link: http://arxiv.org/pdf/2506.15675

1. 📘 Topic and Domain: A large-scale video dataset called Sekai for world exploration, focusing on computer vision and video generation.

2. 💡 Previous Research and New Ideas: Based on existing video generation datasets that have limitations in location diversity and duration; proposes a new dataset with worldwide coverage, longer durations, and rich annotations.

3. ❓ Problem: Existing video generation datasets are not well-suited for world exploration training due to limited locations, short duration, static scenes, and lack of exploration-related annotations.

4. 🛠️ Methods: Developed a curation pipeline to collect, pre-process, and annotate videos from YouTube and video games, including shot detection, quality filtering, and comprehensive annotation of location, scene type, weather, crowd density, captions, and camera trajectories.

5. 📊 Results and Evaluation: Created a dataset of over 5,000 hours of videos from 100+ countries across 750 cities, with demonstrated quality through statistical analysis and successful training of an interactive world exploration model called YUME.

Sekai: A Video Dataset towards World Exploration

1/2

Paper 2

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Published: 2025-06-18

Link: http://arxiv.org/pdf/2506.15681

1. 📘 Topic and Domain: Vision-language model distillation for transferring knowledge from large models to smaller ones in multimodal AI systems.

2. 💡 Previous Research and New Ideas: Based on traditional knowledge distillation techniques but proposes a novel "Recalibrator" component to overcome token type incompatibility between different models.

3. ❓ Problem: The challenge of distilling knowledge between vision-language models with different token types (vocabulary sizes, token splits, and ordering schemes), which current methods cannot handle.

4. 🛠️ Methods: Introduces GenRecal framework with a Recalibrator that aligns and adapts feature representations between heterogeneous VLMs through a three-stage training process.

5. 📊 Results and Evaluation: The framework outperformed baseline performances on multiple benchmarks, achieving better results than both open and closed-source VLMs while enabling distillation between previously incompatible model architectures.

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

1/2

Paper 3

ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs

Published: 2025-06-18

Link: http://arxiv.org/pdf/2506.15211

1. 📘 Topic and Domain: The paper explores how abstract reasoning prototypes enable cross-domain generalization in Large Language Models (LLMs), focusing on logical reasoning and planning capabilities.

2. 💡 Previous Research and New Ideas: Based on previous work on Long Chain-of-Thought reasoning and LRM training, the paper introduces the novel concept of "reasoning prototypes" as fundamental patterns that enable cross-domain transfer.

3. ❓ Problem: The paper aims to understand and enhance the underlying mechanisms that allow LLMs trained on specific reasoning tasks to transfer their abilities to different types of problems.

4. 🛠️ Methods: The authors developed ProtoReasoning framework using Prolog for logical reasoning and PDDL for planning tasks, with automated prototype construction and verification systems.

5. 📊 Results and Evaluation: The approach achieved significant improvements across multiple benchmarks: 4.7% on logical reasoning (Enigmata-Eval), 6.3% on planning tasks, 4.0% on general reasoning (MMLU), and 1.0% on mathematics (AIME24).