TensorRT-LLM v1.3.0rc13 Improves Nemotron 3 Nano Omni

TensorRT-LLM v1.3.0rc13, released on April 29, 2026, is a release candidate that introduces initial support and optimizations for Nemotron 3 Nano Omni models. This version also enhances audio extraction from video and optimizes ViT attention for Nemotron and Nemotron Nano VL models, aiming to improve performance and reduce memory usage.

Attribute	Detail
Released by	tensorrt-llm (NVIDIA)
Release date	April 29, 2026
What it is	A release candidate for TensorRT-LLM with model support and optimizations.
Who it is for	Developers and researchers using NVIDIA’s TensorRT-LLM and Nemotron models.
Where to get it	Not yet disclosed.
Price	Not yet disclosed.

TensorRT-LLM v1.3.0rc13 was released on April 29, 2026.
It adds initial support for Nemotron 3 Nano Omni models.
Optimizations for Nemotron and Nemotron Nano VL models are included.
Audio extraction from video is a new feature.
ViT attention is optimized, and initialization memory is reduced.

What is TensorRT-LLM v1.3.0rc13
What is new vs the previous version
How does TensorRT-LLM v1.3.0rc13 work
Benchmarks and evidence
Who should care
How to use TensorRT-LLM v1.3.0rc13 today
TensorRT-LLM v1.3.0rc13 vs competitors
Risks, limits, and myths
FAQ
Glossary
Next step
Sources

TensorRT-LLM v1.3.0rc13 enhances support for Nemotron 3 Nano Omni models.
The release includes optimizations for audio extraction and ViT attention.
Initialization memory is reduced for Nemotron and Nemotron Nano VL models.
Known issues with audio-from-video and chunked prefill are being addressed.
Per-model VisualGen example scripts and shared configs are added.

What is TensorRT-LLM v1.3.0rc13

TensorRT-LLM v1.3.0rc13 is a release candidate for NVIDIA’s TensorRT-LLM library, focusing on enhancing model compatibility and performance. This version, released on April 29, 2026, specifically targets improvements for Nemotron 3 Nano Omni and Nemotron/Nemotron Nano VL models.

What is new vs the previous version

TensorRT-LLM v1.3.0rc13 introduces several key updates compared to previous versions:

Model Support: Initial support and optimizations for Nemotron 3 Nano Omni models are included.
Audio Extraction: Audio extraction from video is now supported for Nemotron and Nemotron Nano VL models.
ViT Attention Optimization: ViT attention is optimized for Nemotron and Nemotron Nano VL models.
Memory Reduction: Initialization memory is reduced for Nemotron and Nemotron Nano VL models.
VisualGen Examples: Per-model VisualGen example scripts, shared configs, and metadata updates are added.

How does TensorRT-LLM v1.3.0rc13 work

TensorRT-LLM v1.3.0rc13 works by integrating specific optimizations and new features into the TensorRT-LLM framework. It provides enhanced support for Nemotron 3 Nano Omni through initial optimizations. The release also includes updated code for audio extraction from video sources. Furthermore, it optimizes the ViT attention mechanism and reduces initialization memory for Nemotron and Nemotron Nano VL models.

Benchmarks and evidence

Feature/Optimization	Impact	Source
Nemotron 3 Nano Omni Support	Initial optimizations provided.	tensorrt-llm release notes
Audio Extraction from Video	Added for Nemotron and Nemotron Nano VL models.	tensorrt-llm release notes
ViT Attention Optimization	Improved performance for Nemotron and Nemotron Nano VL models.	tensorrt-llm release notes
Initialization Memory Reduction	Reduced memory footprint for Nemotron and Nemotron Nano VL models.	tensorrt-llm release notes
VisualGen Example Scripts	Added per-model scripts, configs, and metadata updates.	tensorrt-llm release notes

Who should care

Builders

Builders developing applications with Nemotron models should care about TensorRT-LLM v1.3.0rc13. The new optimizations and model support can improve their application’s performance and efficiency. Developers working with multi-modal AI will benefit from audio extraction features.

Enterprise

Enterprises leveraging NVIDIA’s AI ecosystem for large-scale deployments should care. The memory reductions and performance optimizations can lead to cost savings. Enhanced model support expands the range of deployable AI solutions.

End users

End users will experience improved performance and new capabilities in applications powered by Nemotron models. Faster inference and reduced memory usage can lead to a smoother user experience. New features like audio extraction from video enable richer multi-modal interactions.

Investors

Investors in AI and NVIDIA should note the continuous development and optimization of TensorRT-LLM. These updates indicate ongoing innovation and commitment to the AI inference market. Improved model efficiency can drive broader adoption of NVIDIA’s platforms.

How to use TensorRT-LLM v1.3.0rc13 today

To use TensorRT-LLM v1.3.0rc13, developers would typically update their TensorRT-LLM installation. This involves cloning the repository and building from source. Specific instructions for integrating Nemotron 3 Nano Omni or utilizing audio extraction would be in the documentation. Developers can also explore the new VisualGen example scripts.

TensorRT-LLM v1.3.0rc13 vs competitors

Feature	TensorRT-LLM v1.3.0rc13	GammaOS Next v1.3.0	RapidPipeline for 3ds Max v1.3.0
Primary Focus	LLM inference optimization, Nemotron support	Handheld gaming OS (Android 14)	3ds Max integration and pipeline tools
Model Support	Nemotron 3 Nano Omni, Nemotron, Nemotron Nano VL	Not applicable	Not applicable
Key Enhancements	Audio extraction, ViT attention optimization, memory reduction	Tuned for RK3576, LineageOS 21	Categorized actions, auto token auth
Release Date	April 29, 2026	Not yet disclosed.	Not yet disclosed.
Target Platform	NVIDIA GPUs	Anbernic RG Vita Pro (RK3576)	3ds Max environment

Risks, limits, and myths

Known Issues: Audio-from-video and chunked prefill for video have known issues. These issues are actively being worked on by the development team.
Release Candidate Status: As a release candidate (rc13), this version might contain bugs or incomplete features. It is not a final stable release.
Hardware Dependency: TensorRT-LLM is optimized for NVIDIA GPUs, limiting its direct applicability to other hardware. Performance benefits are tied to NVIDIA’s ecosystem.
Myth: All LLMs are supported equally: TensorRT-LLM focuses on specific models and architectures, not all LLMs receive the same level of optimization.

FAQ

What is the release date of TensorRT-LLM v1.3.0rc13?
TensorRT-LLM v1.3.0rc13 was released on April 29, 2026.
Which models are supported in TensorRT-LLM v1.3.0rc13?
TensorRT-LLM v1.3.0rc13 supports Nemotron 3 Nano Omni, Nemotron, and Nemotron Nano VL models.
What new features are included in this release?
New features include audio extraction from video and optimized ViT attention.
Are there any known issues with TensorRT-LLM v1.3.0rc13?
Yes, known issues exist for audio-from-video and chunked prefill for video.
Does this release reduce memory usage?
Yes, initialization memory is reduced for Nemotron and Nemotron Nano VL models.
What are VisualGen example scripts?
VisualGen example scripts are per-model examples with shared configs and metadata updates.
Is TensorRT-LLM v1.3.0rc13 a stable release?
No, it is a release candidate (rc13), indicating it is not a final stable release.
Can TensorRT-LLM v1.3.0rc13 be used with non-NVIDIA hardware?
TensorRT-LLM is optimized for NVIDIA GPUs, so its benefits are primarily for NVIDIA hardware.

Glossary

TensorRT-LLM: An open-source library by NVIDIA for optimizing and deploying large language models (LLMs) for inference.
Nemotron 3 Nano Omni: A specific AI model developed by NVIDIA, receiving initial support and optimizations in this release.
ViT Attention: Vision Transformer attention, a mechanism used in models processing visual data, now optimized.
Release Candidate (RC): A software version that is potentially a final product but still subject to minor changes or bug fixes.
Chunked Prefill: A technique for processing input data in chunks, particularly relevant for long sequences in LLMs.

Review the official TensorRT-LLM documentation for detailed installation and usage instructions for v1.3.0rc13.

Sources

v1.3.0rc13

Author

Siegfried Kamgo

Founder and editorial lead at FrontierWisdom. Engineer turned operator-analyst writing about AI systems, automation infrastructure, decentralised stacks, and the practical economics of frontier technology. Focus: turning fast-moving releases into durable, implementation-ready playbooks.

TensorRT-LLM v1.3.0rc13 Improves Nemotron 3 Nano Omni Support

What is TensorRT-LLM v1.3.0rc13

What is new vs the previous version

How does TensorRT-LLM v1.3.0rc13 work

Benchmarks and evidence