Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

Abstract

The promotion of academic papers has become an important means of enhancing research visibility. However, existing automated methods struggle with limited storytelling, insufficient aesthetic quality, and constrained self-adjustment, making it difficult to achieve efficient and engaging dissemination. At the heart of those challenges is a simple principle: there is no way to improve it when you cannot evaluate it right. To address this, we introduce EvoPresent, a self-improvement agent framework that unifies coherent narratives, aesthetic-aware designs, and realistic presentation delivery via virtual characters. Central to EvoPresent is PresAesth, a multi-task reinforcement learning (RL) aesthetic model that provides reliable aesthetic scoring, defect adjustment, and comparative feedback, enabling iterative self-improvement even under limited aesthetic training data. To systematically evaluate the methods, we introduce EvoPresent Benchmark, a comprehensive benchmark comprising: Presentation Generation Quality, built on 650 top-tier AI conference papers with multimodal resources (slides, videos and scripts) to assess both content and design; and Aesthetic Awareness, consisting of 2,000 slide pairs with varying aesthetic levels, supporting joint training and evaluation on scoring, defect adjustment, and comparison. Our findings highlight that (i) High-quality feedback is essential for agent self-improvement, while initial capability alone does not guarantee effective self-correction. (ii) Automated generation pipelines exhibit a trade-off between visual design and content construction. (iii) Multi-task RL training shows stronger generalization in aesthetic awareness tasks.

🎥 Demo

Interactive Demo

Interactive slide interface generated by EvoPresent with real-time navigation and exploration capabilities.

Video Demonstration

Generated video featuring virtual presenter delivering academic content with natural gestures and speech.

💻 EvoPresent Agent Pipeline

Overview of the EvoPresent framework. (a) EvoPresent first performs content extraction and voice generation, then constructs the storyline and script, followed by content enhancement using image generation and knowledge retrieval. Design and rendering are handled next, and the aesthetic checker evaluates the initial slide and provides adjustments. (b) PresAesth is trained on a human-preference aesthetic dataset via multiple tasks (scoring, defect adjustment, and comparison). (c) The PresAesth model guides the agent framework in iterative self-improvement.

✨ Aesthetic Judgement for Self-Improvement

The key to EvoPresent's high-quality output is its iterative "draft-feedback-refinement" cycle, which is managed by a dedicated Checker Agent. This agent acts as an AI design critic, leveraging our specialized aesthetic model, PresAesth, to evaluate and progressively refine the presentation.

At its core, PresAesth is a sophisticated model trained within a multi-task learning framework using Multi-Task Group Policy Optimization (GRPO) on a rich dataset of human aesthetic preferences. This unified training approach enables the model to simultaneously perform three distinct tasks from a single input: it assigns an absolute aesthetic score, identifies specific design defects such as poor layout or typography, and makes pairwise comparisons to determine the better of two slide versions. This multi-faceted feedback is crucial, as it powers the self-improvement loop and allows the framework to continuously polish its own work to a professional standard.

EvoPresent Benchmark

The EvoPresent Benchmark offers a comprehensive suite for evaluating both presentation generation and aesthetic models. Its data sources are twofold: first, curated materials from top-tier AI conferences, including slides, videos, and scripts; second, a specialized dataset of paired slides with varying aesthetic quality. Correspondingly, its evaluation metrics assess two key areas: content fidelity and design quality are measured against the conference materials, while the model's capabilities in absolute scoring, defect identification, and pairwise comparison are tested using the paired aesthetic slides. This structure enables rigorous and reproducible evaluation for both content generation and aesthetic judgment.

🎨 Aesthetic Comparison

BibTeX

@misc{liu2025presentingpaperartselfimprovement,
    title={Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations},
    author={Chengzhi Liu and Yuzhe Yang and Kaiwen Zhou and Zhen Zhang and Yue Fan and Yannan Xie and Peng Qi and Xin Eric Wang},
    year={2025},
    eprint={2510.05571},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2510.05571}
}