What is it?
Ministral 3B is the smallest member of Mistral AI’s Ministral 3 family, released in December 2025. The Ministral 3 line ships dense models at 3 B, 8 B, and 14 B parameters — all under Apache 2.0, all with optional image understanding. Unlike the larger Mistral Small 4 (a 119 B MoE for servers), Ministral was designed from the ground up for edge deployment: phones, lightweight laptops, IoT hardware. The 3 B variant trades some raw capability for the ability to run almost anywhere with a CPU and 4 GB of RAM.
Core specs at a glance
(See spec card above — populated from structured data.)
What devices can run it?
The 3 B variant at Q4 quantization fits in roughly 2 GB of storage with about 2-4 GB of RAM headroom. That covers Pixel 8 and newer, iPhone 15 Pro and newer, most Android phones with 4 GB+ RAM released since 2023, and any consumer laptop including older Intel/AMD CPUs and Apple silicon. Mistral specifically optimized for CPU-only inference, so devices without dedicated NPUs still see usable token throughput (10-20 tok/s on a modern laptop CPU).
Strengths and limitations
Strengths. Strong CPU-only performance — many on-device peers assume NPU offload, while Ministral runs well even on older hardware. Apache 2.0 license matches Gemma 4 and Qwen for contract simplicity. Trained to “generate fewer unnecessary tokens” — practical benefit is faster, cheaper responses. Image understanding is a free upgrade over text-only peers like Llama 3.2 Mobile or DeepSeek-R1 Distill.
Limitations. No audio modality (Gemma 4 and Phi-4-multimodal both offer it). 32 K context is half of Gemma 4’s 128 K and an order of magnitude less than Qwen 3.5’s 262 K — long-document workloads should pick a different model. Vision capability is solid but not specialized like MiniCPM-V 4.0.
When to choose it (and when not to)
Choose Ministral 3B if: you need a balanced text+vision model that runs on a wide range of hardware, especially CPU-only laptops; you want Apache 2.0 license simplicity; your workload favors short, focused outputs (classification, routing, summarization, voice notes); your latency budget is tight.
Skip it if: you need long-context support (Gemma 4 at 128 K or Qwen 3.5 at 262 K are better); you need audio (Gemma 4 or Phi-4-multimodal); you need state-of-the-art vision benchmarks (MiniCPM-V 4.0 outperforms in pure vision tasks).
How it compares to similar on-device models
Closest peers are Microsoft Phi-4-multimodal (larger, more powerful, MIT, also adds audio) and Gemma 4 E2B (smaller, also Apache 2.0, longer context, also has audio). Ministral 3B’s distinguishing trait is excellent CPU-only performance and a focus on terse, efficient outputs — Phi and Gemma both implicitly target NPU-equipped flagships. For a side-by-side, see the leaderboard.
In a real Cove app
Cove Voice uses Gemma 4 to summarize voice notes. Ministral 3B would be a strong alternative in this exact niche — it’s tuned for terse outputs, runs on more diverse hardware (Cove ships to many older laptops via the desktop builds), and Apache 2.0 simplifies licensing. We picked Gemma 4 because we needed the same model for image understanding in Cove Photo, but for a Cove app that was voice-only, Ministral 3B would be on the shortlist.