On-Device AI Leaderboard 2026: Best Mobile LLMs Compared

Last reviewed: May 2026

Model	Vendor	Params	Size	Context	Modality	License	Min RAM	In Cove?	Details
Gemma 4 E2B	Google DeepMind	2.3B	1.5 GB	128,000	text+vision+audio	apache-2.0	4 GB	✓	View →
Microsoft Phi-4 multimodal	Microsoft Research	5.6B	3.5 GB	128,000	text+vision+audio	mit	6 GB	✓	View →
Apple Foundation Models	Apple	3B	Not disclosed	Not disclosed	text+vision	apple-proprietary	8 GB	✓	View →
Llama 3.2 Mobile	Meta AI	3B	2 GB	128,000	text	llama-community	6 GB	✓	View →
Qwen 3.5 2B	Alibaba Cloud	2B	1.5 GB	262,000	text+vision	apache-2.0	4 GB	✓	View →
Ministral 3B	Mistral AI	3B	2 GB	32,768	text+vision	apache-2.0	4 GB	✓	View →
DeepSeek R1 Distill (Qwen 1.5B)	DeepSeek	1.5B	1 GB	32,768	text	apache-2.0	4 GB	✓	View →
MiniCPM-V 4.0	ModelBest / OpenBMB	4.1B	2.5 GB	32,768	text+vision	modelbest-terms	4 GB	✓	View →

Methodology

How we built this leaderboard. All 8 models are evaluated against the same dimensions — parameters, quantized size, context window, modality, license, and minimum device RAM — sourced from official model cards (Hugging Face, vendor blogs, official documentation) as of the last-reviewed date shown above. We do not run our own benchmarks; instead, we cross-reference 2-3 authoritative sources per data point and prefer the vendor's own claim where it conflicts with third-party reproduction. Numbers may diverge from your real-world experience by ±10-20% depending on quantization scheme (Q4_K_M, AWQ, GPTQ all behave differently), runtime (LiteRT, MediaPipe, ExecuTorch, llama.cpp, Core ML), and device thermal throttling. Each model card carries its own `lastReviewed` field; this page is refreshed every quarter. Conflicts and ambiguities are tracked in our open GitHub repo.

Pick a model by use case

Translation

Gemma 4 E2B — Multimodal text+vision+audio in 1.5GB; the most balanced general-purpose pick
Apple Foundation Models — Native to iOS 26 — zero install for Apple users
Qwen 3.5 2B — 262K context for long documents; strong on Chinese/multilingual

Voice & notes

DeepSeek R1 Distill (Qwen 1.5B) — 1.5B reasoning specialist — runs on 4GB RAM phones
Ministral 3B — Ministral 3B — solid all-rounder for daily note tasks
Microsoft Phi-4 multimodal — Phi-4 multimodal handles voice + text + images in one model

Vision & photo

MiniCPM-V 4.0 — Specialized in vision tasks; 4B model that punches above its weight
Gemma 4 E2B — Native vision + audio in just 1.5GB; runs on most flagship phones
Microsoft Phi-4 multimodal — 5.6B multimodal — strongest reasoning when paired with vision

Pick a model by device

Flagship (8GB+ RAM)

Run the largest mobile-optimized models comfortably

Mid-range (6GB RAM)

Sweet spot for size/capability balance

Older devices (4GB RAM)

Smaller models that still deliver real value

Apps

Use cases

Learn

Get Cove

Trust