What is it?
Llama 3.2 Mobile is Meta’s purpose-built family of small on-device language models, released in September 2024. Meta shipped the 1 B and 3 B variants as text-only models — distinct from the same generation’s 11 B and 90 B vision models, which target larger devices. The mobile pair was created using structured pruning of larger Llama 3.1 models combined with knowledge distillation: logits from the Llama 3.1 8 B and 70 B models were used as token-level targets during pretraining, retaining surprising amounts of reasoning capability for their size. As of mid-2026, these remain Meta’s primary mobile offering — there is no direct Llama 4 successor in this size class yet.
Core specs at a glance
(See spec card above — populated from structured data.)
What devices can run it?
The 3 B variant runs comfortably on Pixel 8 and newer, iPhone 15 Pro and newer, and Snapdragon 8 Gen 3+ Android phones. Meta partnered with Qualcomm and MediaTek for day-one launch optimization, and the model is heavily tuned for ARM CPUs via Grouped-Query Attention. The 1 B variant relaxes those requirements significantly — it runs on phones from 2022 onwards with at least 4 GB of RAM. Both versions are deployable via llama.cpp, MLC, or PyTorch ExecuTorch.
Strengths and limitations
Strengths. Massive 128 K context — same as Gemma 4 E2B, ahead of most other on-device peers. Strong reasoning for its parameter count thanks to distillation from the Llama 3.1 8 B and 70 B teachers. Mature ecosystem: llama.cpp, LM Studio, Ollama, MLC, and dozens of fine-tunes. Day-one mobile chip optimization. Open weights, easy to fine-tune.
Limitations. Text-only — no images, no audio. The Llama Community License has a 700 M MAU clause that complicates contracts with mega-services. No clear successor in the Llama 4 generation. Multilingual quality varies; English and major European languages are strongest.
When to choose it (and when not to)
Choose Llama 3.2 Mobile if: your workload is text-only (chat, summarization, classification, RAG); you want the broadest open-source ecosystem and tooling; you need a fully open mobile model with 128 K context; your target devices have at least 6 GB of RAM (3 B) or 4 GB (1 B).
Skip it if: your workload includes images or audio (Gemma 4, Phi-4-multimodal, or MiniCPM-V are better); you operate a service with 700 M+ MAU and need permissive licensing without negotiation (Apache 2.0 alternatives like Gemma 4, Qwen 3.5, or Mistral fit better); you want the absolute smallest text model (DeepSeek-R1 Distill 1.5 B is a finer-grained option).
How it compares to similar on-device models
The closest peers are Gemma 4 E2B (smaller, multimodal, Apache 2.0) and Ministral 3B (similar size, also multimodal, Apache 2.0). Llama 3.2 wins on ecosystem maturity and 128 K context but loses on modality and license simplicity. For a side-by-side, see the leaderboard.
In a real Cove app
Cove Voice uses Gemma 4 today for summarizing voice notes — a workload that fits Llama 3.2’s text-only profile equally well. We chose Gemma 4 because we need the same model to handle photo Q&A in Cove Photo, and Llama 3.2 Mobile does not see images. If a future Cove app were text-only (say, a journaling assistant), Llama 3.2 3B would be a strong alternative — particularly for users who want full ecosystem tooling and Hugging Face fine-tunes.