What is it?
Qwen 3.5 2B is Alibaba Cloud’s mobile-first member of the Qwen 3.5 Small Series, released March 1, 2026. The series ships in four sizes — 0.8 B, 2 B, 4 B, and 9 B — and unlike most “smaller-than-flagship” model families, Qwen 3.5 Small was designed from scratch for on-device deployment rather than distilled from a larger sibling. The 2 B variant is the sweet spot for phones: small enough to run on 4 GB-RAM mid-range devices, large enough to deliver useful reasoning and multilingual capability.
Core specs at a glance
(See spec card above — populated from structured data.)
What devices can run it?
The 2 B variant at Q4 quantization fits in roughly 1.5 GB of storage with about 2-3 GB of RAM headroom for context. That puts the floor at most modern Android phones with 4 GB+ RAM and any iPhone from the 15 Pro generation onwards. On flagship hardware (Pixel 8 Pro, iPhone 17 Pro, Galaxy S24 Ultra) it produces 30-50 tokens per second. Mid-range phones see 15-25 tok/s, still very usable for chat-style interactions.
Strengths and limitations
Strengths. Industry-leading 262 K context window for an on-device model — twice Gemma 4 E2B’s 128 K. Native support for 200+ languages with particular strength in Chinese, Japanese, Korean, and English. Hybrid Gated Delta + sparse Mixture-of-Experts architecture means strong performance per active parameter. Apache 2.0 license simplifies enterprise contracts.
Limitations. No native audio modality (Gemma 4 and Phi-4-multimodal both add audio). Marginally weaker than Gemma 4 on pure English benchmarks. Vision capability is solid but trails MiniCPM-V 4.0’s specialized multimodal training. The MoE architecture produces variable latency depending on which experts route per token, which can complicate real-time applications.
When to choose it (and when not to)
Choose Qwen 3.5 2B if: your users include significant Chinese, Japanese, or Korean speakers; you need very long context (legal documents, codebases, full chat histories); you target broad device coverage including 4 GB-RAM phones; you want Apache 2.0 license simplicity.
Skip it if: you need on-device audio (Gemma 4 or Phi-4-multimodal); your workload is vision-heavy and benchmark accuracy matters most (MiniCPM-V 4.0 is specialized for vision); you need predictable latency for real-time use cases (dense models like Llama 3.2 3B have more uniform per-token cost).
How it compares to similar on-device models
Closest peers are Gemma 4 E2B (smaller, multimodal text+vision+audio, Apache 2.0, 128 K context) and MiniCPM-V 4.0 (specialized for vision, 4 B parameters, larger but vision-strong). Qwen wins on context length and multilingual reach; Gemma wins on audio; MiniCPM wins on vision tasks. For a side-by-side, see the leaderboard.
In a real Cove app
Cove Travel handles dozens of language pairs offline using Gemma 4. For Mandarin, Cantonese, Japanese, and Korean translation tasks specifically, Qwen 3.5 2B would be a stronger base — its training data weight on East Asian languages is unmatched among current open-weight on-device models. If a future Cove release ships a “Cove China” variant tuned for the domestic market, Qwen 3.5 2B would be our starting point.