Deep Seek R1 0528

DeepSeek‑R1‑0528 is an optimized upgrade of the open‑source reasoning model from DeepSeek, unveiled on May 28, 2025, with notable improvements across math, coding, and logical reasoning benchmarks—positioning it as a leader among open models and a challenger to giants like OpenAI’s o3 and Google’s Gemini 2.5 Pro .

Monitor Your Tokens & Top Up Anytime

Stay in flow. Track your token balance or add more with just one click.

🔍 Token Usage 💳 Purchase Tokens

Hello 👋, how can I help you today?

Gathering thoughts ...

🚀 Go Supernova – Power Users’ Favorite Plan

Get 35,000 GPT‑4.1 tokens every month, plus access to Claude, Gemini, Llama 4 & Stable Diffusion Pro. Ideal for marketers, agencies & heavy AI workflows.

💫 Subscribe to Supernova – $39/month

🧠 DeepSeek R1 0528 — Spec Overview

🔧 Architecture & Core Specs

Architecture: Sparse Mixture-of-Experts (MoE) Transformer
Total Parameters: ~685 billion
Active Parameters per Inference: ~37 billion (2 experts active per layer)
Layers: Estimated ~80+
Hidden Size: Estimated ~8192
Feedforward Dimensions: ~32K
Attention Heads: 64
Attention Type: Multi-head attention with MoE routing
Activation Function: SwiGLU
Normalization: RMSNorm
Positional Encoding: Rotary Positional Embeddings (RoPE)
Precision: Trained in BF16 / FP32, inference via FP8, FP4, or INT formats

📏 Context & Tokenization

Max Context Length: 128,000 tokens
Average Chain-of-Thought Depth: ~23,000 tokens per problem
Tokenizer: Custom, SentencePiece-like, optimized for multilingual and code efficiency

📊 Benchmarks & Performance

Benchmark	Score
AIME 2025	87.5% (up from 70% in earlier versions)
AIME 2024	91.4%
MMLU-Redux EM	93.4%
HumanEval (code gen)	~90%
LiveCodeBench	73.3%
GPQA-Diamond	81.0%
ARC Challenge	~84–86%
Codeforces-Div1 Equivalent	~1930 Elo

🛠 Tooling & Inference

Supported Output Formats: Text, JSON, function-calling structured output
Works With: LLaMA.cpp, vLLM, SGLang, Text Gen Web UI, LM Studio, Ollama
Quantization Options:
- FP8 / BF16 / INT8 / INT4
- GGUF & AWQ available
Quantized Model Size: ~160–180 GB
Full Model (FP16): ~715 GB

⚙️ Hardware Requirements

Hardware	Performance
No GPU / CPU-only	~1 token/sec with 180 GB RAM
1× RTX 4090 (24GB)	~3–5 tokens/sec (quantized)
M3 Ultra (Apple Silicon)	Real-time quantized inference under 200W
A100 x4 / H100 x2	Needed for full FP16/32 inference

🧬 Training Info

Training Tokens: ~14.8 trillion
Training Method: RLHF + CoT-guided fine-tuning
Training Hardware: 2,048× H800 GPUs with MoE acceleration
Estimated Training Cost: $5–6 million
Trained By: DeepSeek (China-based lab)

✅ Strengths

Top-tier performance in math, logic, and code
Sparse activation = efficiency with scale
Long-form CoT + 128K token context
MIT license (commercial use allowed)
Ideal for structured reasoning, RAG, and data agents

⚠️ Limitations

Still hallucinates in edge cases
Larger quantized models require high RAM/GPU VRAM
Verbose by default (can be mitigated with prompt engineering or variants)

🧠 TL;DR Summary

One of the strongest open models in existence today

685B param MoE, only ~37B used per token

128K tokens context, excels at math, code, logic

Runs on 24GB GPU with quantization

Open-source (MIT) and free to use