Running on Morph’s custom kernels and inference stack optimized for codegen. OpenAI-compatible. Same key as Fast Apply and Compact. Base URLDocumentation Index
Fetch the complete documentation index at: https://docs.morphllm.com/llms.txt
Use this file to discover all available pages before exploring further.
https://api.morphllm.com/v1.
| Model | ID | Speed | Context | In / Out per 1M | Modalities |
|---|---|---|---|---|---|
| Qwen 3.5 397B | morph-qwen35-397b | ~200 tok/s | 262k | 3.50 | text + image |
| MiniMax M2.7 | morph-minimax27-230b | ~90 tok/s | 200k | 1.20 | text |
| Qwen 3.6 27B | morph-qwen36-27b | ~100 tok/s | 131k | 2.40 | text |
| DeepSeek V4 Flash beta | morph-dsv4flash | ~150 tok/s | 393k | 0.40 | text |
tools, response_format (JSON mode + JSON schema), structured outputs, logprobs, and reasoning.
Automatic prefix caching is on for all models. No configuration needed. Use Model Router to pick automatically per request.
Quick Start
- Python
- TypeScript
- Vercel AI SDK
- cURL
Tools and Structured Output
reasoning: { effort: "medium" } ("low" / "high"). Reasoning tokens bill as output.
Pricing
Per-token, no minimums. The table above is canonical. Live rates:/v1/models.
- Images (Qwen 397B only) bill as text tokens at the input rate
- 4xx requests are not billed; partial generations bill for tokens returned
Pitfalls
Latency worse than expected
Latency worse than expected
TPS numbers are generation throughput, not end-to-end. With 30k tokens of context, prefill dominates first-token wait even with caching. For agent loops, keep a smaller working context with Compact rather than filling the full window.
Tool calls not working
Tool calls not working
These models use OpenAI tool-call shape, not Anthropic
tool_use blocks or Gemini functionDeclarations. Use the OpenAI SDK or @ai-sdk/openai pointed at our base URL.JSON mode returns prose
JSON mode returns prose
Pass
response_format: { type: "json_object" } and say “respond in JSON” in your prompt. For strict shape control: response_format: { type: "json_schema", json_schema: { ... } }.See Also
- Model Router — auto-route between these and frontier models per request
- Compact — shrink context before paying for it
- WarpGrep — code search for retrieval when context is the bottleneck