Sorcerer LM 8x22B

Sorcerer LM 8×22B is a fine-tuned beast built on top of WizardLM-2’s massive MoE engine, with a razor-sharp focus on roleplay, storytelling, and expressive dialogue. Think of it as the GPT-4 of fantasy writing—stylized, immersive, and emotionally fluent.

Monitor Your Tokens & Top Up Anytime

Get 15,000 free tokens ($15 value) instantly when you sign up. No strings attached: your tokens never expire and there are no subscriptions.

🔍 Token Usage 💳 Purchase Tokens

Hello 👋, how can I help you today?

Gathering thoughts ...

🚀 Go Supernova – Power Users’ Favorite Plan

Get 35,000 GPT‑4.1 tokens every month, plus access to Claude, Gemini, Llama 4 & Stable Diffusion Pro. Ideal for marketers, agencies & heavy AI workflows.

💫 Subscribe to Supernova – $39/month

🧠 Model Summary

Name: Sorcerer LM 8×22B
Architecture: Mixture of Experts (MoE)
Base Model: WizardLM-2 8×22B
Total Parameters: ~141 billion
Active Experts per Token: 2 out of 8 (each ~22B)
Effective Parameters per Inference: ~44 billion
Model Type: Decoder-only transformer
Fine-tune Method: LoRA (Low-Rank Adaptation)
LoRA Config: r=16, α=32, 16-bit adapters
Training Epochs: 2
Specialization: Roleplay, creative writing, story immersion

⚙️ Technical Specs

Feature	Detail
Context Length	16k–32k tokens (base model supports up to 64k)
Precision	FP16 / bf16 (inference), 16-bit LoRA
Tokenizer	GPT-style BPE (inherits from WizardLM/Mistral tokenizer lineage)
Sampling Settings	Temp ~1.0–1.2, Top-p = 1.0, Typical-p ~0.7, Penalty ~0.6

🔧 Model Origin

Built From: WizardLM-2 8×22B (Mixtral-style MoE)
Purpose: Enhance narrative depth and expressiveness; tailored for RP and storytelling use cases
Training Dataset: Cleaned logs from C2-style conversation datasets (RP-heavy), deduplicated for story coherence
Base Training: WizardLM’s alignment approach (Evol-Instruct, AI-align-AI, RLEIF pipeline)

💬 Inference Behavior

Strong at multi-character dialogue, emotion expression, scene construction
Handles long-memory threads and narrative consistency
Balanced between helpful assistant and stylized character mode
Uses softmax routing for expert selection (reduces inference cost)

📌 Deployment Details

Platform	Info
Access	Through major hosted APIs like OpenRouter, LangDB, Infermatic
Local Use	Full GGUF/ggml and 4-bit quant models not yet confirmed but expected
Use Cases	AI Dungeon-style games, novel co-writing, VTuber or NPC simulators, immersive assistants
Pricing (est.)	~$4.50/million tokens input/output on OpenRouter-like APIs

🧪 Strengths

Immersive storytelling with a Claude-like emotional IQ
Better narrative coherence than raw Mixtral/Mistral models
LoRA-tuned for personality, expressive dialogue, vivid imagery
Good reasoning performance retained from WizardLM-2 base

⚠️ Limitations

Not multimodal — no image or audio understanding
Higher token cost due to MoE architecture
Still early-stage LoRA; might hallucinate on factual prompts
Limited documentation for self-hosting or embedding

🧙 Summary

Want a benchmark battle between this and Claude 3 Opus or GPT-4o? Or instructions to self-host a comparable local model for RP? I can set you up.