Ministral 3B: Mistral's Smallest Dense Mobile LLM

3B parameters, 32K context, image understanding, Apache 2.0 — Ministral 3B is Mistral AI's smallest dense model, built for phones, IoT, and edge hardware.

Last reviewed: May 2026
Parameters3 B
Size (quantized)2 GB
Context length32,768 tokens
Modalitytext+vision
Licenseapache-2.0
Min RAM4 GB
VersionMinistral 3B
Released2025-12

What is it?

Ministral 3B is the smallest member of Mistral AI’s Ministral 3 family, released in December 2025. The Ministral 3 line ships dense models at 3 B, 8 B, and 14 B parameters — all under Apache 2.0, all with optional image understanding. Unlike the larger Mistral Small 4 (a 119 B MoE for servers), Ministral was designed from the ground up for edge deployment: phones, lightweight laptops, IoT hardware. The 3 B variant trades some raw capability for the ability to run almost anywhere with a CPU and 4 GB of RAM.

Core specs at a glance

(See spec card above — populated from structured data.)

What devices can run it?

The 3 B variant at Q4 quantization fits in roughly 2 GB of storage with about 2-4 GB of RAM headroom. That covers Pixel 8 and newer, iPhone 15 Pro and newer, most Android phones with 4 GB+ RAM released since 2023, and any consumer laptop including older Intel/AMD CPUs and Apple silicon. Mistral specifically optimized for CPU-only inference, so devices without dedicated NPUs still see usable token throughput (10-20 tok/s on a modern laptop CPU).

Strengths and limitations

Strengths. Strong CPU-only performance — many on-device peers assume NPU offload, while Ministral runs well even on older hardware. Apache 2.0 license matches Gemma 4 and Qwen for contract simplicity. Trained to “generate fewer unnecessary tokens” — practical benefit is faster, cheaper responses. Image understanding is a free upgrade over text-only peers like Llama 3.2 Mobile or DeepSeek-R1 Distill.

Limitations. No audio modality (Gemma 4 and Phi-4-multimodal both offer it). 32 K context is half of Gemma 4’s 128 K and an order of magnitude less than Qwen 3.5’s 262 K — long-document workloads should pick a different model. Vision capability is solid but not specialized like MiniCPM-V 4.0.

When to choose it (and when not to)

Choose Ministral 3B if: you need a balanced text+vision model that runs on a wide range of hardware, especially CPU-only laptops; you want Apache 2.0 license simplicity; your workload favors short, focused outputs (classification, routing, summarization, voice notes); your latency budget is tight.

Skip it if: you need long-context support (Gemma 4 at 128 K or Qwen 3.5 at 262 K are better); you need audio (Gemma 4 or Phi-4-multimodal); you need state-of-the-art vision benchmarks (MiniCPM-V 4.0 outperforms in pure vision tasks).

How it compares to similar on-device models

Closest peers are Microsoft Phi-4-multimodal (larger, more powerful, MIT, also adds audio) and Gemma 4 E2B (smaller, also Apache 2.0, longer context, also has audio). Ministral 3B’s distinguishing trait is excellent CPU-only performance and a focus on terse, efficient outputs — Phi and Gemma both implicitly target NPU-equipped flagships. For a side-by-side, see the leaderboard.

In a real Cove app

Cove Voice uses Gemma 4 to summarize voice notes. Ministral 3B would be a strong alternative in this exact niche — it’s tuned for terse outputs, runs on more diverse hardware (Cove ships to many older laptops via the desktop builds), and Apache 2.0 simplifies licensing. We picked Gemma 4 because we needed the same model for image understanding in Cove Photo, but for a Cove app that was voice-only, Ministral 3B would be on the shortlist.

See it in a real Cove app

FAQ

Is Ministral 3B the same as Mistral Small?

No. Mistral Small 4 (released March 2026) is a 119B-parameter MoE model targeting servers and large workstations. Ministral 3B is a separate, much smaller dense model for phones, edge, and IoT. The naming is confusing because Mistral re-purposed 'Small' for the server tier.

What devices can run Ministral 3B?

Pixel 8 and newer, iPhone 15 Pro and newer, most modern Android phones with 4 GB+ RAM, and consumer laptops including Apple silicon. The 3B variant runs comfortably on CPUs alone for many use cases — particularly suited to lightweight classification and routing tasks where startup speed matters.

Does Ministral 3B support images?

Yes. The Ministral 3 family (3B / 8B / 14B) all ship with image understanding capabilities. The 3B variant trades some image accuracy for a smaller footprint, but it is genuinely multimodal — not text-only like Llama 3.2 Mobile or DeepSeek-R1 Distill.

What's the license?

Apache 2.0 — Mistral has consistently shipped open weights under Apache 2.0 across the Ministral 3 family, including all base, instruct, and reasoning variants. This makes it one of the most contract-friendly mobile LLMs alongside Gemma 4 and Qwen 3.5.

How does it compare to Phi-4-multimodal or Gemma 4?

Ministral 3B is smaller (3B vs Phi-4 multimodal's 5.6B; comparable to Gemma 4's 2.3B effective). It runs on more modest hardware than Phi but lacks Gemma's audio modality. Pick Ministral if you want a balanced text+vision dense model with predictable latency and broad device reach.

Citations