Deep Seek R1 0528
DeepSeek‑R1‑0528 is an optimized upgrade of the open‑source reasoning model from DeepSeek, unveiled on May 28, 2025, with notable improvements across math, coding, and logical reasoning benchmarks—positioning it as a leader among open models and a challenger to giants like OpenAI’s o3 and Google’s Gemini 2.5 Pro .
Monitor Your Tokens & Top Up Anytime
Stay in flow. Track your token balance or add more with just one click.
🧠 DeepSeek R1 0528 — Spec Overview
🔧 Architecture & Core Specs
- Architecture: Sparse Mixture-of-Experts (MoE) Transformer
- Total Parameters: ~685 billion
- Active Parameters per Inference: ~37 billion (2 experts active per layer)
- Layers: Estimated ~80+
- Hidden Size: Estimated ~8192
- Feedforward Dimensions: ~32K
- Attention Heads: 64
- Attention Type: Multi-head attention with MoE routing
- Activation Function: SwiGLU
- Normalization: RMSNorm
- Positional Encoding: Rotary Positional Embeddings (RoPE)
- Precision: Trained in BF16 / FP32, inference via FP8, FP4, or INT formats
📏 Context & Tokenization
- Max Context Length: 128,000 tokens
- Average Chain-of-Thought Depth: ~23,000 tokens per problem
- Tokenizer: Custom, SentencePiece-like, optimized for multilingual and code efficiency
📊 Benchmarks & Performance
Benchmark | Score |
---|---|
AIME 2025 | 87.5% (up from 70% in earlier versions) |
AIME 2024 | 91.4% |
MMLU-Redux EM | 93.4% |
HumanEval (code gen) | ~90% |
LiveCodeBench | 73.3% |
GPQA-Diamond | 81.0% |
ARC Challenge | ~84–86% |
Codeforces-Div1 Equivalent | ~1930 Elo |
🛠 Tooling & Inference
- Supported Output Formats: Text, JSON, function-calling structured output
- Works With: LLaMA.cpp, vLLM, SGLang, Text Gen Web UI, LM Studio, Ollama
- Quantization Options:
- FP8 / BF16 / INT8 / INT4
- GGUF & AWQ available
- Quantized Model Size: ~160–180 GB
- Full Model (FP16): ~715 GB
⚙️ Hardware Requirements
Hardware | Performance |
---|---|
No GPU / CPU-only | ~1 token/sec with 180 GB RAM |
1× RTX 4090 (24GB) | ~3–5 tokens/sec (quantized) |
M3 Ultra (Apple Silicon) | Real-time quantized inference under 200W |
A100 x4 / H100 x2 | Needed for full FP16/32 inference |
🧬 Training Info
- Training Tokens: ~14.8 trillion
- Training Method: RLHF + CoT-guided fine-tuning
- Training Hardware: 2,048× H800 GPUs with MoE acceleration
- Estimated Training Cost: $5–6 million
- Trained By: DeepSeek (China-based lab)
✅ Strengths
- Top-tier performance in math, logic, and code
- Sparse activation = efficiency with scale
- Long-form CoT + 128K token context
- MIT license (commercial use allowed)
- Ideal for structured reasoning, RAG, and data agents
⚠️ Limitations
- Still hallucinates in edge cases
- Larger quantized models require high RAM/GPU VRAM
- Verbose by default (can be mitigated with prompt engineering or variants)
🧠 TL;DR Summary
One of the strongest open models in existence today
685B param MoE, only ~37B used per token
128K tokens context, excels at math, code, logic
Runs on 24GB GPU with quantization
Open-source (MIT) and free to use