AeSlides is a reinforcement learning framework designed to improve the aesthetic layout of slides generated by large language models (LLMs) by addressing the modality gap between text-centric generation and visual quality. It uses verifiable metrics to quantify slide layout quality and a GRPO-based reinforcement learning method to optimize models for aesthetically coherent layouts.
| Category | Detail |
|---|---|
| Released by | Not yet disclosed. |
| Release date | |
| What it is | A reinforcement learning framework for aesthetic slide generation. |
| Who it is for | Developers and researchers working on LLM-based presentation tools. |
| Where to get it | https://github.com/ympan0508/aeslides |
| Price | Not yet disclosed. |
- AeSlides is a reinforcement learning framework for LLM-based slide generation.
- It uses verifiable metrics to quantify slide layout quality.
- The framework employs a GRPO-based method for aesthetic optimization.
- AeSlides significantly improves aspect ratio compliance and reduces layout issues.
- Human evaluations show a substantial improvement in overall slide quality.
- AeSlides addresses the modality gap in LLM slide generation by focusing on visual aesthetics.
- It introduces verifiable metrics for accurate and efficient layout quality assessment.
- The GRPO-based reinforcement learning method directly optimizes for aesthetic layouts.
- AeSlides improves aspect ratio compliance to 85% and reduces various layout flaws.
- Human evaluators rated AeSlides-generated slides higher than other methods.
What is AeSlides
AeSlides is a reinforcement learning framework that incentivizes aesthetic layout in large language model (LLM)-based slide generation [arXiv:2604.22840]. It aims to bridge the gap between text-centric generation and the visual quality requirements of slides [arXiv:2604.22840]. The framework uses verifiable rewards to provide explicit aesthetic supervision during the generation process [arXiv:2604.22840].
What is new vs the previous version
AeSlides introduces explicit aesthetic principles as supervision, which was previously unexplored [arXiv:2604.22840]. Existing solutions often rely on heavy visual reflection or large-scale dataset fine-tuning [arXiv:2604.22840]. These methods incur high inference costs or provide weak aesthetic supervision [arXiv:2604.22840].
- Verifiable Metrics: AeSlides introduces a suite of meticulously designed verifiable metrics [arXiv:2604.22840]. These metrics quantify slide layout quality accurately and efficiently [arXiv:2604.22840].
- Direct Optimization: It employs a GRPO-based reinforcement learning method for direct optimization [arXiv:2604.22840]. This method specifically targets aesthetically coherent layouts [arXiv:2604.22840].
- Efficiency: The approach offers an efficient and scalable way to align slide generation with human aesthetic preferences [arXiv:2604.22840].
How does AeSlides work
AeSlides operates by integrating verifiable aesthetic metrics into a reinforcement learning framework.
- Define Verifiable Metrics: AeSlides first establishes a set of verifiable metrics [arXiv:2604.22840]. These metrics quantify specific aspects of slide layout quality [arXiv:2604.22840].
- Quantify Layout Quality: The metrics capture key layout issues in an accurate, efficient, and low-cost manner [arXiv:2604.22840].
- Develop GRPO-based RL: A GRPO-based reinforcement learning method is then developed [arXiv:2604.22840].
- Optimize Slide Generation: This method directly optimizes slide generation models [arXiv:2604.22840]. The optimization targets aesthetically coherent layouts [arXiv:2604.22840].
- Incentivize Aesthetics: The verifiable metrics serve as rewards, incentivizing the LLM to produce more aesthetic designs [arXiv:2604.22840].
Benchmarks and evidence
| Metric | Before AeSlides | With AeSlides | Improvement | Source |
|---|---|---|---|---|
| Aspect Ratio Compliance | 36% | 85% | +49% | [arXiv:2604.22840] |
| Whitespace Reduction | Not yet disclosed. | Not yet disclosed. | 44% | [arXiv:2604.22840] |
| Element Collisions Reduction | Not yet disclosed. | Not yet disclosed. | 43% | [arXiv:2604.22840] |
| Visual Imbalance Reduction | Not yet disclosed. | Not yet disclosed. | 28% | [arXiv:2604.22840] |
| Human Evaluation Score (Overall Quality) | 3.31 | 3.56 | +7.6% | [arXiv:2604.22840] |
AeSlides, trained with 5K prompts on GLM-4.7-Flash, significantly improved aspect ratio compliance [arXiv:2604.22840]. It reduced whitespace by 44%, element collisions by 43%, and visual imbalance by 28% [arXiv:2604.22840]. Human evaluation showed a 7.6% increase in overall quality, outperforming reflection-based and model-based reward optimization approaches [arXiv:2604.22840]. It even edged out Claude-Sonnet-4.5 in human evaluations [arXiv:2604.22840].
Who should care
Builders
Builders of AI presentation tools should care about AeSlides for its method of improving visual aesthetics [arXiv:2604.22840]. The verifiable reward system offers a new paradigm for efficient and scalable aesthetic alignment [arXiv:2604.22840]. This could lead to more sophisticated and user-friendly slide generation capabilities.
Enterprise
Enterprise users seeking high-quality, automated presentation generation will find AeSlides relevant. Improved aesthetic layouts can enhance professional communication and brand consistency. This technology can streamline content creation workflows within organizations.
End users
End users who rely on AI for creating presentations will benefit from AeSlides. They can expect more visually appealing and coherent slides with less manual adjustment. This directly addresses the common issue of aesthetically suboptimal layouts from current LLM tools [arXiv:2604.22840].
Investors
Investors in AI and productivity software should note AeSlides’ potential impact. Enhancing the visual quality of LLM outputs can increase market adoption for slide generation tools. This framework represents a valuable advancement in practical AI applications.
How to use AeSlides today
The repository for AeSlides is available at https://github.com/ympan0508/aeslides [arXiv:2604.22840]. Developers can access the code to implement or adapt the framework. This allows for direct experimentation and integration into existing LLM-based slide generation pipelines.
AeSlides vs competitors
| Feature | AeSlides | Model-based Reward Optimization | Reflection-based Agentic Approaches | Claude-Sonnet-4.5 |
|---|---|---|---|---|
| Aesthetic Supervision | Explicit, verifiable metrics [arXiv:2604.22840] | Indirect, model-dependent [arXiv:2604.22840] | Heavy visual reflection [arXiv:2604.22840] | Not yet disclosed. |
| Inference Cost | Low-cost [arXiv:2604.22840] | Not yet disclosed. | High [arXiv:2604.22840] | Not yet disclosed. |
| Optimization Method | GRPO-based reinforcement learning [arXiv:2604.22840] | Not yet disclosed. | Not yet disclosed. | Not yet disclosed. |
| Aspect Ratio Compliance | 85% [arXiv:2604.22840] | Not yet disclosed. | Not yet disclosed. | Not yet disclosed. |
| Human Evaluation (Overall Quality) | 3.56 (+7.6% improvement) [arXiv:2604.22840] | Inferior to AeSlides [arXiv:2604.22840] | Inferior to AeSlides [arXiv:2604.22840] | Inferior to AeSlides [arXiv:2604.22840] |
AeSlides distinguishes itself by using explicit, verifiable aesthetic principles for supervision [arXiv:2604.22840]. This contrasts with model-based reward optimization and reflection-based agentic approaches, which provide weaker or more costly supervision [arXiv:2604.22840]. Its GRPO-based reinforcement learning method directly optimizes for aesthetic layouts, leading to superior human evaluation scores [arXiv:2604.22840].
Risks, limits, and myths
- Subjectivity of Aesthetics: Aesthetic preferences can be subjective and vary across cultures [arXiv:2604.22840]. AeSlides’ metrics aim for general principles, but edge cases may exist.
- Training Data Dependency: The effectiveness of AeSlides still depends on the quality and diversity of its training prompts [arXiv:2604.22840].
- Computational Resources: While efficient, reinforcement learning frameworks can still require significant computational resources for training.
- Myth: LLMs inherently understand visual aesthetics. Fact: LLMs are text-centric, and their quality is governed by visual aesthetics, creating a modality gap [arXiv:2604.22840]. AeSlides aims to bridge this gap.
FAQ
- What problem does AeSlides solve?
- AeSlides solves the problem of aesthetically suboptimal layouts in LLM-generated slides, which arises from the modality gap between text-centric generation and visual quality [arXiv:2604.22840].
- How does AeSlides improve slide aesthetics?
- AeSlides improves slide aesthetics by using verifiable metrics to quantify layout quality and a GRPO-based reinforcement learning method to directly optimize for aesthetically coherent layouts [arXiv:2604.22840].
- What are “verifiable rewards” in AeSlides?
- Verifiable rewards in AeSlides are based on meticulously designed metrics that accurately, efficiently, and at low cost quantify slide layout quality [arXiv:2604.22840].
- Which LLM was used to train AeSlides?
- AeSlides was trained using 5K prompts on GLM-4.7-Flash [arXiv:2604.22840].
- How much did AeSlides improve aspect ratio compliance?
- AeSlides improved aspect ratio compliance from 36% to 85% [arXiv:2604.22840].
- Did human evaluators prefer slides generated by AeSlides?
- Yes, human evaluators showed a substantial improvement in overall quality, increasing scores from 3.31 to 3.56 (+7.6%) for AeSlides-generated slides [arXiv:2604.22840].
- Is AeSlides open source?
- Yes, the repository for AeSlides is available at https://github.com/ympan0508/aeslides [arXiv:2604.22840].
- Can AeSlides be used with other LLMs?
- Not yet disclosed. The paper specifies training on GLM-4.7-Flash [arXiv:2604.22840].
Glossary
- Large Language Model (LLM)
- A type of artificial intelligence model trained on vast amounts of text data to understand and generate human-like language [Wikipedia].
- Reinforcement Learning (RL)
- A machine learning paradigm where an agent learns to make decisions by performing actions in an environment to maximize a cumulative reward.
- Modality Gap
- The discrepancy between the primary input/output modality of a system (e.g., text for LLMs) and the modality required for high-quality output (e.g., visual aesthetics for slides) [arXiv:2604.22840].
- GRPO
- Generalized Relative Policy Optimization, a specific algorithm used in reinforcement learning for policy optimization [arXiv:2604.22840].
- Verifiable Metrics
- Quantifiable measures that can be objectively checked or proven, used in AeSlides to assess slide layout quality [arXiv:2604.22840].
Sources
- AeSlides: Incentivizing Aesthetic Layout in LLM-Based Slide Generation via Verifiable Rewards
- r/notebooklm on Reddit: I built a free, open-source collection of 20+ visual styles for NotebookLM slides. Would love to get your feedback! 🎨✨
- Best Free AI PPT Generators I Actually Use | by Mehul Gupta | Data Science in Your Pocket | Medium
- LLMs are pretty good at making slideshows now
- Large language model – Wikipedia
- LAMP: Lift Image-Editing as General 3D Priors for Open-world Manipulation
- PrismML — Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs
- Tetrix
- Cloud-Native Model Lifecycle Management for Enterprise …