DeepSeek R1 Distill (Qwen 1.5B): Reasoning on 4 GB Phones

Parameters	1.5 B
Size (quantized)	1 GB
Context length	32,768 tokens
Modality	text
License	apache-2.0
Min RAM	4 GB
Version	DeepSeek-R1-Distill-Qwen-1.5B
Released	2025-01

Parameters

1.5 B

Size (quantized)

1 GB

Context length

32,768 tokens

Modality

text

License

apache-2.0

Min RAM

4 GB

Version

DeepSeek-R1-Distill-Qwen-1.5B

Released

2025-01

What is it?

DeepSeek-R1-Distill-Qwen-1.5B is the smallest member of DeepSeek’s R1 distill family, released in January 2025 alongside the full DeepSeek-R1 model. The distill takes Qwen-2.5-Math-1.5B as its base architecture and fine-tunes it on 800,000 chain-of-thought reasoning samples generated by the much larger 671 B-parameter R1 teacher. The result is a 1.5 B-parameter model that explicitly reasons step-by-step on math, code, and logic tasks — at a fraction of R1’s cost and on dramatically more accessible hardware.

Core specs at a glance

(See spec card above — populated from structured data.)

What devices can run it?

The 1.5 B variant at Q4 quantization is roughly a 1 GB download and runs on essentially anything: Pixel 7 and newer, iPhone 14 and newer, Snapdragon Copilot+ PCs, modern Intel/AMD laptops on CPU alone, and Apple silicon Macs. On a CPU you get 5-10 tokens per second, which is slow but functional for testing and lightweight tasks. On Apple silicon laptops or modest GPUs you’ll see 50-60 tok/s. Snapdragon NPUs with ONNX optimization deliver under-70ms time-to-first-token for short prompts.

Strengths and limitations

Strengths. Genuine chain-of-thought reasoning at a 1.5 B parameter footprint — unmatched among on-device peers. Apache 2.0 license inherited from Qwen-2.5 base. Tiny enough to run alongside other models on the same device. Strong on structured math and code; routes naturally into reasoning-augmented agents without needing prompt-engineering tricks.

Limitations. Quality is bounded by parameter count. AIME 2024 pass@1 of 28.9% versus the full R1’s roughly 80% is a meaningful gap — don’t expect frontier-grade results. Text-only — no vision, no audio. Less fluent than equivalent-size general-purpose chat models on open-ended tasks. Latency increases with reasoning depth: chain-of-thought is verbose by design.

When to choose it (and when not to)

Choose R1 Distill 1.5B if: your workload is reasoning-dominant (math homework helpers, code assistants, logical agents); you need to ship to low-end hardware (4 GB RAM laptops, mid-range phones); you want explicit chain-of-thought output for transparency; Apache 2.0 license matters.

Skip it if: your workload is open-ended chat (Gemma 4 or Qwen 3.5 are better generalists); you need multimodality (Gemma 4, Phi-4-multimodal, MiniCPM-V); you need frontier reasoning quality (full DeepSeek-R1 in the cloud, or wait for the next distill generation).

How it compares to similar on-device models

Closest peers are Qwen 3.5 2B (general-purpose, multilingual, multimodal, 262K context) and Ministral 3B (general-purpose, also Apache 2.0, image-capable). R1 Distill differs by being explicitly reasoning-tuned at a smaller size. For full side-by-side, see the leaderboard.

In a real Cove app

Cove Voice uses Gemma 4 to summarize voice notes — that’s general-purpose chat-style summarization, where Gemma’s broader fluency wins. R1 Distill 1.5B would be the model to pick for reasoning-heavy add-ons: extracting action items with explicit logic, structured task decomposition, or math-related transcripts. We’ve prototyped it as a future Cove Voice mode for power users who want chain-of-thought summaries instead of bullet lists.

FAQ

What does 'distilled' mean here?

DeepSeek created the 1.5B variant by training Qwen-2.5-Math-1.5B base on 800,000 reasoning samples generated by the full 671B-parameter DeepSeek-R1 teacher. The student inherits R1's chain-of-thought reasoning style but operates at a fraction of the cost — and on dramatically smaller hardware.

Is it as smart as the full DeepSeek R1?

No. The 1.5B distill hits 28.9% pass@1 on AIME 2024 (52.7% with consensus@64), versus the full R1's roughly 80% pass@1, and 83.9% on MATH-500 versus R1's 97.3%. It's still genuinely doing chain-of-thought reasoning, but quality is bounded by the small parameter count. Use it where you want reasoning patterns, not parity with frontier models.

What devices can run it?

Almost anything with 4 GB+ RAM: Pixel 7 and newer, iPhone 14 and newer, Snapdragon Copilot+ PCs, and any modern laptop CPU. Throughput ranges from 5-10 tok/s on plain CPUs up to 60+ tok/s on Apple silicon laptops. Time-to-first-token under 70ms on Snapdragon NPUs.

Is the license really Apache 2.0?

For this specific distill, yes. The Qwen-distilled variants inherit Qwen-2.5's Apache 2.0 base license. Note that the full DeepSeek-R1 weights themselves are MIT, and Llama-distilled variants follow the Llama Community License — the licensing depends on which base model was distilled.

Why pick this over Llama 3.2 1B or Gemma 4?

Pick R1 Distill when reasoning is the dominant axis — math, code, logic puzzles. Llama 3.2 1B and Gemma 4 are stronger general-purpose chat models. R1 Distill explicitly trades general fluency for chain-of-thought capability per gram, which is unique among 1.5B-class on-device options.

Apps

Use cases

Learn

Get Cove

Trust

DeepSeek R1 Distill (Qwen 1.5B): Reasoning Specialist for Tiny Devices

What is it?

Core specs at a glance

What devices can run it?

Strengths and limitations

When to choose it (and when not to)

How it compares to similar on-device models

In a real Cove app

See it in a real Cove app

FAQ

Citations

What is it?

Core specs at a glance

What devices can run it?

Strengths and limitations

When to choose it (and when not to)

How it compares to similar on-device models

In a real Cove app

Related models

See it in a real Cove app

FAQ

Citations