On-Device AI Leaderboard 2026: Best Mobile LLMs Compared

Last reviewed: May 2026
ModelVendorParamsSizeContextModalityLicenseMin RAMIn Cove?Details
Gemma 4 E2B Google DeepMind 2.3B 1.5 GB 128,000 text+vision+audio apache-2.0 4 GB View →
Microsoft Phi-4 multimodal Microsoft Research 5.6B 3.5 GB 128,000 text+vision+audio mit 6 GB View →
Apple Foundation Models Apple 3B Not disclosed Not disclosed text+vision apple-proprietary 8 GB View →
Llama 3.2 Mobile Meta AI 3B 2 GB 128,000 text llama-community 6 GB View →
Qwen 3.5 2B Alibaba Cloud 2B 1.5 GB 262,000 text+vision apache-2.0 4 GB View →
Ministral 3B Mistral AI 3B 2 GB 32,768 text+vision apache-2.0 4 GB View →
DeepSeek R1 Distill (Qwen 1.5B) DeepSeek 1.5B 1 GB 32,768 text apache-2.0 4 GB View →
MiniCPM-V 4.0 ModelBest / OpenBMB 4.1B 2.5 GB 32,768 text+vision modelbest-terms 4 GB View →

Methodology

How we built this leaderboard. All 8 models are evaluated against the same dimensions — parameters, quantized size, context window, modality, license, and minimum device RAM — sourced from official model cards (Hugging Face, vendor blogs, official documentation) as of the last-reviewed date shown above. We do not run our own benchmarks; instead, we cross-reference 2-3 authoritative sources per data point and prefer the vendor's own claim where it conflicts with third-party reproduction. Numbers may diverge from your real-world experience by ±10-20% depending on quantization scheme (Q4_K_M, AWQ, GPTQ all behave differently), runtime (LiteRT, MediaPipe, ExecuTorch, llama.cpp, Core ML), and device thermal throttling. Each model card carries its own `lastReviewed` field; this page is refreshed every quarter. Conflicts and ambiguities are tracked in our open GitHub repo.

Pick a model by use case

Translation

  • Gemma 4 E2B — Multimodal text+vision+audio in 1.5GB; the most balanced general-purpose pick
  • Apple Foundation Models — Native to iOS 26 — zero install for Apple users
  • Qwen 3.5 2B — 262K context for long documents; strong on Chinese/multilingual

Vision & photo

  • MiniCPM-V 4.0 — Specialized in vision tasks; 4B model that punches above its weight
  • Gemma 4 E2B — Native vision + audio in just 1.5GB; runs on most flagship phones
  • Microsoft Phi-4 multimodal — 5.6B multimodal — strongest reasoning when paired with vision

Pick a model by device