Sorcerer LM 8x22B

Sorcerer LM 8×22B is a fine-tuned beast built on top of WizardLM-2’s massive MoE engine, with a razor-sharp focus on roleplay, storytelling, and expressive dialogue. Think of it as the GPT-4 of fantasy writing—stylized, immersive, and emotionally fluent.

Monitor Your Tokens & Top Up Anytime

Get 15,000 free tokens ($15 value) instantly when you sign up. No strings attached: your tokens never expire and there are no subscriptions.

  • Hello 👋, how can I help you today?
Gathering thoughts ...

🚀 Go Supernova – Power Users’ Favorite Plan

Get 35,000 GPT‑4.1 tokens every month, plus access to Claude, Gemini, Llama 4 & Stable Diffusion Pro. Ideal for marketers, agencies & heavy AI workflows.

💫 Subscribe to Supernova – $39/month

🧠 Model Summary

  • Name: Sorcerer LM 8×22B
  • Architecture: Mixture of Experts (MoE)
  • Base Model: WizardLM-2 8×22B
  • Total Parameters: ~141 billion
  • Active Experts per Token: 2 out of 8 (each ~22B)
  • Effective Parameters per Inference: ~44 billion
  • Model Type: Decoder-only transformer
  • Fine-tune Method: LoRA (Low-Rank Adaptation)
  • LoRA Config: r=16, α=32, 16-bit adapters
  • Training Epochs: 2
  • Specialization: Roleplay, creative writing, story immersion

⚙️ Technical Specs

FeatureDetail
Context Length16k–32k tokens (base model supports up to 64k)
PrecisionFP16 / bf16 (inference), 16-bit LoRA
TokenizerGPT-style BPE (inherits from WizardLM/Mistral tokenizer lineage)
Sampling SettingsTemp ~1.0–1.2, Top-p = 1.0, Typical-p ~0.7, Penalty ~0.6

🔧 Model Origin

  • Built From: WizardLM-2 8×22B (Mixtral-style MoE)
  • Purpose: Enhance narrative depth and expressiveness; tailored for RP and storytelling use cases
  • Training Dataset: Cleaned logs from C2-style conversation datasets (RP-heavy), deduplicated for story coherence
  • Base Training: WizardLM’s alignment approach (Evol-Instruct, AI-align-AI, RLEIF pipeline)

💬 Inference Behavior

  • Strong at multi-character dialogue, emotion expression, scene construction
  • Handles long-memory threads and narrative consistency
  • Balanced between helpful assistant and stylized character mode
  • Uses softmax routing for expert selection (reduces inference cost)

📌 Deployment Details

PlatformInfo
AccessThrough major hosted APIs like OpenRouter, LangDB, Infermatic
Local UseFull GGUF/ggml and 4-bit quant models not yet confirmed but expected
Use CasesAI Dungeon-style games, novel co-writing, VTuber or NPC simulators, immersive assistants
Pricing (est.)~$4.50/million tokens input/output on OpenRouter-like APIs

🧪 Strengths

  • Immersive storytelling with a Claude-like emotional IQ
  • Better narrative coherence than raw Mixtral/Mistral models
  • LoRA-tuned for personality, expressive dialogue, vivid imagery
  • Good reasoning performance retained from WizardLM-2 base

⚠️ Limitations

  • Not multimodal — no image or audio understanding
  • Higher token cost due to MoE architecture
  • Still early-stage LoRA; might hallucinate on factual prompts
  • Limited documentation for self-hosting or embedding

🧙 Summary

Sorcerer LM 8×22B is a fine-tuned beast built on top of WizardLM-2’s massive MoE engine, with a razor-sharp focus on roleplay, storytelling, and expressive dialogue. Think of it as the GPT-4 of fantasy writing—stylized, immersive, and emotionally fluent.

Want a benchmark battle between this and Claude 3 Opus or GPT-4o? Or instructions to self-host a comparable local model for RP? I can set you up.

Sign up free. No credit card needed. Instantly get 15,000 tokens to explore premium AI tools.

X