1. 📘 Topic and Domain: Automatic generation of academic presentation videos from research papers using AI agents, in the domain of computer vision and AI for research.
2. 💡 Previous Research and New Ideas: Based on prior work in slide generation and video synthesis, proposing the first comprehensive framework to generate complete academic presentations including slides, speech, talking head, and cursor movements.
3. ❓ Problem: The highly labor-intensive process of creating academic presentation videos (taking hours to produce 2-10 minute videos), which involves slide design, recording, and editing.
4. 🛠️ Methods: Developed PaperTalker, a multi-agent framework that integrates slide generation with layout refinement, subtitling, speech synthesis, cursor grounding, and talking-head rendering, while enabling parallel slide-wise generation.
5. 📊 Results and Evaluation: The system outperformed human-made presentations by 10% in PresentQuiz accuracy and achieved comparable ratings in user studies, evaluated using a new benchmark (Paper2Video) with 101 paired papers and presentations and four novel metrics (Meta Similarity, PresentArena, PresentQuiz, IP Memory).