Ollama v0.21.1 Release: Kimi CLI, MLX Improvements

Ollama v0.21.1, released on April 22, 2026, introduces the Kimi CLI for agentic execution tasks and significantly enhances MLX runner performance. This update improves sampling speed, adds logprobs support, and refines prompt tokenization and thread safety for MLX-compatible models, alongside other fixes.

Fact	Detail
Released by	Ollama
Release date	2026-04-22
What it is	An update to the Ollama platform, introducing Kimi CLI and various performance enhancements.
Who it is for	Developers and users working with local AI models, especially those using MLX or agentic systems.
Where to get it	Not yet disclosed.
Price	Not yet disclosed.

Ollama v0.21.1 introduces the Kimi CLI for agentic execution.
MLX runner now supports logprobs for compatible models.
MLX sampling is faster due to fused top-P and top-K.
MLX prompt tokenization is improved via request handler goroutines.
GLM4 MoE Lite performance is enhanced with a fused sigmoid router head.

What is Ollama v0.21.1
What is new vs the previous version
How does Ollama v0.21.1 work
Benchmarks and evidence
Who should care
- Builders
- End users
How to use Ollama v0.21.1 today
Risks, limits, and myths
FAQ
Glossary
Next step
Sources

The Kimi CLI, accessible through Ollama, enables multi-agent systems for complex tasks [Source event].
MLX runner in v0.21.1 now includes logprobs support for compatible models [Source event].
MLX sampling speed is improved by fusing top-P and top-K in one pass [Source event].
Prompt tokenization for MLX is enhanced by moving it into request handler goroutines [Source event].
GLM4 MoE Lite sees performance gains from a fused sigmoid router head [Source event].
Structured outputs for Gemma 4 models are fixed when think=false [Source event].
The model picker in the macOS app no longer shows stale models after chat switching [Source event].

What is Ollama v0.21.1

Ollama v0.21.1 is an update to the Ollama platform, released on April 22, 2026, that introduces the Kimi CLI and various performance and stability improvements [Source event]. This version focuses on enhancing the MLX runner, improving agentic execution capabilities, and fixing specific bugs [Source event].

What is new vs the previous version

Ollama v0.21.1 introduces several key enhancements compared to its predecessor, v0.21.0 [Source event].

Kimi CLI Integration: Users can now install and run the Kimi CLI directly through Ollama [Source event]. The Kimi CLI, with Kimi K2.6, excels at long-horizon agentic execution tasks using a multi-agent system [Source event].
MLX Runner Logprobs Support: The MLX runner now supports logprobs for compatible models [Source event].
Faster MLX Sampling: MLX sampling is faster due to fused top-P and top-K in a single sort pass [Source event]. Repeat penalties are also applied within the sampler [Source event].
Improved MLX Prompt Tokenization: Prompt tokenization for MLX is enhanced by moving it into request handler goroutines [Source event].
Better MLX Thread Safety: Array management in MLX now has improved thread safety [Source event].
GLM4 MoE Lite Performance: GLM4 MoE Lite performance is improved with a fused sigmoid router head [Source event].
Fixed Model Picker: The macOS app’s model picker no longer shows stale models after switching chats [Source event].
Fixed Gemma 4 Structured Outputs: Structured outputs for Gemma 4 are fixed when think=false [Source event].

How does Ollama v0.21.1 work

Ollama v0.21.1 integrates new features and optimizations into its existing framework [Source event].

Kimi CLI Launch: Users launch the Kimi CLI via the ollama launch kimi command, specifying a model like kimi-k2.6:cloud [Source event].
Multi-Agent Execution: The Kimi CLI, powered by Kimi K2.6, utilizes a multi-agent system for complex, long-horizon tasks [Source event].
MLX Runner Enhancements: The MLX runner processes models with improved sampling efficiency [Source event]. Fused top-P and top-K operations streamline the sampling process [Source event].
Logprobs Calculation: Compatible models benefit from logprobs support within the MLX runner [Source event].
Optimized Tokenization: Prompt tokenization is handled by dedicated request handler goroutines, improving efficiency [Source event].
Thread-Safe Array Management: MLX ensures better thread safety for array operations, contributing to stability [Source event].
GLM4 MoE Lite Optimization: A fused sigmoid router head boosts the performance of GLM4 MoE Lite models [Source event].

Benchmarks and evidence

Specific benchmark results for Ollama v0.21.1 are not yet disclosed.

Feature/Improvement	Claimed Benefit	Evidence Source
Kimi CLI with Kimi K2.6	Excels at long horizon agentic execution tasks through a multi-agent system.	[Source event]
Faster MLX sampling	Achieved with fused top-P and top-K in a single sort pass.	[Source event]
Improved MLX prompt tokenization	By moving tokenization into request handler goroutines.	[Source event]
GLM4 MoE Lite performance	Improved with a fused sigmoid router head.	[Source event]

Who should care

Builders

Builders should care about Ollama v0.21.1 for its enhanced MLX capabilities and the new Kimi CLI [Source event]. The improved MLX sampling and logprobs support offer more control and speed for model development [Source event]. The Kimi CLI provides a robust multi-agent system for complex automation tasks [Source event].

End users

End users of Ollama’s macOS app will benefit from the fixed model picker, ensuring a smoother experience [Source event]. Users interacting with Gemma 4 models will find structured outputs more reliable [Source event].

How to use Ollama v0.21.1 today

To use the Kimi CLI with Ollama v0.21.1, you can launch it via the command line [Source event].

Launch Kimi CLI:

ollama launch kimi --model kimi-k2.6:cloud

This command installs and runs the Kimi CLI, leveraging the kimi-k2.6:cloud model [Source event].

Risks, limits, and myths

Myth: Ollama v0.21.1 is a standalone game. Fact: Ollama v0.21.1 is an AI platform update, not related to games like “Simple Days v0.21.1” by Mega Lono [Source event, 1, 2, 3, 4].
Limit: Kimi CLI requires specific models. The Kimi CLI is demonstrated with kimi-k2.6:cloud for optimal agentic execution [Source event].
Risk: Compatibility with older models. While MLX improvements are noted, specific compatibility details for all older models are not explicitly stated [Source event].

FAQ

What is the release date of Ollama v0.21.1?: Ollama v0.21.1 was released on April 22, 2026 [Source event].
What is Kimi CLI in Ollama v0.21.1?: Kimi CLI is a command-line interface that can be installed and run through Ollama, excelling at long-horizon agentic execution tasks [Source event].
Which model does Kimi CLI use for agentic tasks?: The Kimi CLI with Kimi K2.6 excels at agentic execution tasks through a multi-agent system [Source event].
How does Ollama v0.21.1 improve MLX performance?: Ollama v0.21.1 improves MLX performance with faster sampling, logprobs support, and better prompt tokenization [Source event].
Are there any fixes for Gemma 4 in this release?: Yes, structured outputs for Gemma 4 are fixed when think=false in Ollama v0.21.1 [Source event].
What is the improvement for GLM4 MoE Lite?: GLM4 MoE Lite sees performance improvement with a fused sigmoid router head in Ollama v0.21.1 [Source event].
Does Ollama v0.21.1 fix issues in the macOS app?: Yes, the model picker in the macOS app no longer shows stale models after switching chats [Source event].
Is Ollama v0.21.1 related to any games?: No, Ollama v0.21.1 is an AI platform update and is not related to games like “Simple Days v0.21.1” [1, 2, 3, 4].

Glossary

Agentic Execution: The capability of an AI system to perform complex, multi-step tasks autonomously, often involving planning and decision-making [Source event].
MLX Runner: A component within Ollama responsible for executing machine learning models, particularly those optimized for Apple silicon [Source event].
Logprobs: Logarithmic probabilities, often used in language models to represent the likelihood of generated tokens [Source event].
Top-P Sampling: A text generation technique that samples from the smallest set of tokens whose cumulative probability exceeds a threshold P [Source event].
Top-K Sampling: A text generation technique that samples from the K most likely next tokens [Source event].
Goroutines: Lightweight, concurrently executing functions in the Go programming language, used for efficient task handling [Source event].
GLM4 MoE Lite: A specific variant of the GLM4 model architecture, likely a Mixture-of-Experts (MoE) model, optimized for efficiency [Source event].

Explore the Kimi CLI within Ollama to leverage its multi-agent capabilities for complex tasks.

Sources

Author

siego237

Writes for FrontierWisdom on AI systems, automation, decentralized identity, and frontier infrastructure, with a focus on turning emerging technology into practical playbooks, implementation roadmaps, and monetization strategies for operators, builders, and consultants.

Ollama v0.21.1 Release: Kimi CLI, MLX Improvements

What is Ollama v0.21.1

What is new vs the previous version

How does Ollama v0.21.1 work

Benchmarks and evidence

Who should care

Builders

End users

How to use Ollama v0.21.1 today

Risks, limits, and myths

FAQ

Glossary

Sources

Author

Kamgo Siegfried

Leave a Reply Cancel reply

Ollama v0.21.1 Release: Kimi CLI, MLX Improvements

Turn this article into a repeatable weekly edge.

What is Ollama v0.21.1

What is new vs the previous version

How does Ollama v0.21.1 work

Benchmarks and evidence

Who should care

Builders

End users

How to use Ollama v0.21.1 today

Risks, limits, and myths

FAQ

Glossary

Sources

Author

Kamgo Siegfried

Get the next blueprint before it becomes common advice.

Related Articles

OpenAI GPT-5.5 Bio Bug Bounty: $25,000 for AI Jailbreaks

Data Augmentation Strategies for Transformer Models Address Class

FHIR Data Format Affects LLM Medication Reconciliation by 19 F1

Leave a Reply Cancel reply