1. 📘 Topic and Domain: A large-scale video dataset called Sekai for world exploration, focusing on computer vision and video generation.
2. 💡 Previous Research and New Ideas: Based on existing video generation datasets that have limitations in location diversity and duration; proposes a new dataset with worldwide coverage, longer durations, and rich annotations.
3. ❓ Problem: Existing video generation datasets are not well-suited for world exploration training due to limited locations, short duration, static scenes, and lack of exploration-related annotations.
4. 🛠️ Methods: Developed a curation pipeline to collect, pre-process, and annotate videos from YouTube and video games, including shot detection, quality filtering, and comprehensive annotation of location, scene type, weather, crowd density, captions, and camera trajectories.
5. 📊 Results and Evaluation: Created a dataset of over 5,000 hours of videos from 100+ countries across 750 cities, with demonstrated quality through statistical analysis and successful training of an interactive world exploration model called YUME.