What is it?
DeepSeek-R1-Distill-Qwen-1.5B is the smallest member of DeepSeek’s R1 distill family, released in January 2025 alongside the full DeepSeek-R1 model. The distill takes Qwen-2.5-Math-1.5B as its base architecture and fine-tunes it on 800,000 chain-of-thought reasoning samples generated by the much larger 671 B-parameter R1 teacher. The result is a 1.5 B-parameter model that explicitly reasons step-by-step on math, code, and logic tasks — at a fraction of R1’s cost and on dramatically more accessible hardware.
Core specs at a glance
(See spec card above — populated from structured data.)
What devices can run it?
The 1.5 B variant at Q4 quantization is roughly a 1 GB download and runs on essentially anything: Pixel 7 and newer, iPhone 14 and newer, Snapdragon Copilot+ PCs, modern Intel/AMD laptops on CPU alone, and Apple silicon Macs. On a CPU you get 5-10 tokens per second, which is slow but functional for testing and lightweight tasks. On Apple silicon laptops or modest GPUs you’ll see 50-60 tok/s. Snapdragon NPUs with ONNX optimization deliver under-70ms time-to-first-token for short prompts.
Strengths and limitations
Strengths. Genuine chain-of-thought reasoning at a 1.5 B parameter footprint — unmatched among on-device peers. Apache 2.0 license inherited from Qwen-2.5 base. Tiny enough to run alongside other models on the same device. Strong on structured math and code; routes naturally into reasoning-augmented agents without needing prompt-engineering tricks.
Limitations. Quality is bounded by parameter count. AIME 2024 pass@1 of 28.9% versus the full R1’s roughly 80% is a meaningful gap — don’t expect frontier-grade results. Text-only — no vision, no audio. Less fluent than equivalent-size general-purpose chat models on open-ended tasks. Latency increases with reasoning depth: chain-of-thought is verbose by design.
When to choose it (and when not to)
Choose R1 Distill 1.5B if: your workload is reasoning-dominant (math homework helpers, code assistants, logical agents); you need to ship to low-end hardware (4 GB RAM laptops, mid-range phones); you want explicit chain-of-thought output for transparency; Apache 2.0 license matters.
Skip it if: your workload is open-ended chat (Gemma 4 or Qwen 3.5 are better generalists); you need multimodality (Gemma 4, Phi-4-multimodal, MiniCPM-V); you need frontier reasoning quality (full DeepSeek-R1 in the cloud, or wait for the next distill generation).
How it compares to similar on-device models
Closest peers are Qwen 3.5 2B (general-purpose, multilingual, multimodal, 262K context) and Ministral 3B (general-purpose, also Apache 2.0, image-capable). R1 Distill differs by being explicitly reasoning-tuned at a smaller size. For full side-by-side, see the leaderboard.
In a real Cove app
Cove Voice uses Gemma 4 to summarize voice notes — that’s general-purpose chat-style summarization, where Gemma’s broader fluency wins. R1 Distill 1.5B would be the model to pick for reasoning-heavy add-ons: extracting action items with explicit logic, structured task decomposition, or math-related transcripts. We’ve prototyped it as a future Cove Voice mode for power users who want chain-of-thought summaries instead of bullet lists.