Open Source Models

Running on Morph’s custom kernels and inference stack optimized for codegen. OpenAI-compatible at https://api.morphllm.com/v1, and Anthropic-compatible at /v1/messages (see Endpoints).

Model	Model ID	Context
Kimi K3 2.8T	`morph-kimik3`	1M
Kimi K3 2.8T Fast	`morph-kimik3-fast`	1M
GLM-5.2 744B	`morph-glm52-744b`	1M
MiniMax M3 428B	`morph-minimax3-428b`	256k
DeepSeek V4 Flash ^beta	`morph-dsv4flash`	1M
Qwen 3.6 27B	`morph-qwen36-27b`	131k
Gemma 4 31B ^multimodal	`morph-gemma4-31b`	175k

Throughput runs ~90–200 tok/s depending on model and load. All models support tools, response_format (JSON mode + JSON schema), structured outputs, logprobs, and reasoning. Per-token rates are on the pricing page and live at /api/models/json.

Quick Start

Python
TypeScript
Anthropic SDK
Vercel AI SDK
cURL

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.morphllm.com/v1",
)

response = client.chat.completions.create(
    model="morph-glm52-744b",
    messages=[
        {"role": "system", "content": "You are a senior backend engineer."},
        {"role": "user", "content": "Refactor this Express handler to use async/await: ..."},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.morphllm.com/v1",
});

const stream = await client.chat.completions.create({
  model: "morph-glm52-744b",
  messages: [{ role: "user", content: "Write a tiny rate limiter in TS." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_API_KEY",
    base_url="https://api.morphllm.com",  # no /v1 suffix
)

message = client.messages.create(
    model="morph-glm52-744b",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a tiny rate limiter in TS."}],
)
print(message.content[-1].text)

Every open-source chat model serves the Anthropic Messages API at /v1/messages — same models, same billing, same rate limits. Streaming, tool_use blocks, and thinking all work; details in Endpoints.

import { createOpenAI } from "@ai-sdk/openai";
import { generateText } from "ai";

const morph = createOpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.morphllm.com/v1",
});

const { text } = await generateText({
  model: morph("morph-glm52-744b"),
  prompt: "Summarize this PR diff in one paragraph: ...",
});

curl -X POST "https://api.morphllm.com/v1/chat/completions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "morph-glm52-744b",
    "messages": [
      {"role": "user", "content": "Write a SQL query that finds the top 5 customers by revenue last quarter."}
    ],
    "temperature": 0.2
  }'

Tools and Structured Output

const response = await client.chat.completions.create({
  model: "morph-glm52-744b",
  messages: [{ role: "user", content: "What's the weather in SF?" }],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get weather for a city",
        parameters: {
          type: "object",
          properties: { city: { type: "string" } },
          required: ["city"],
        },
      },
    },
  ],
  response_format: { type: "json_object" },
});

Reasoning is off by default. Enable with reasoning: { effort: "medium" } ("low" / "high"). Reasoning tokens bill as output. Automatic prefix caching is on for all models, with per-request TTL control. Use Model Router to pick automatically per request.

Service Tiers

GLM-5.2 supports the OpenAI service_tier parameter.

Tier	Behavior
`default` (or `auto`, or omitted)	Standard processing. What every request gets today.
`standby`	Best-effort capacity. Served when the fleet has headroom, rejected with a retryable 429 when it doesn’t. No latency target, no SLA.

response = client.chat.completions.create(
    model="morph-glm52-744b",
    messages=[{"role": "user", "content": "Label these 500 rows: ..."}],
    service_tier="standby",
)

How standby works:

A standby request is admitted only while the fleet is under roughly a quarter of its serving capacity. When no region qualifies, you get 429 with error.code: "resource_unavailable" and a Retry-After header. Nothing is generated and nothing is billed. Expect most standby throughput off-peak.
On a 429, retry with exponential backoff. If you need the result now, resend with service_tier: "default".
The response echoes the tier that served it in a service_tier field (on the final usage chunk when streaming).
Streaming, tools, and structured output work the same as default.
Unknown tier values return 400 listing the accepted ones.

Use standby for evals, batch labeling, data generation, and anything a retry loop can absorb. Keep interactive and agent-loop traffic on default: under load, default requests are served in full while standby is shed in ~200ms. Standby bills at the standard per-token rates today. Available on GLM-5.2 (morph-glm52-744b); sending it to other models is a no-op.

Pitfalls

Latency worse than expected

TPS numbers are generation throughput, not end-to-end. With 30k tokens of context, prefill dominates first-token wait even with caching. For agent loops, keep a smaller working context with Compact rather than filling the full window.

Tool calls not working

On /v1/chat/completions these models use OpenAI tool-call shape; Anthropic tool_use blocks work on /v1/messages. Match the tool format to the endpoint. Gemini functionDeclarations work on neither.

JSON mode returns prose

Pass response_format: { type: "json_object" } and say “respond in JSON” in your prompt. For strict shape control: response_format: { type: "json_schema", json_schema: { ... } }.

Get Started

Models

API Reference

Integrations

Quick Start

Tools and Structured Output

Service Tiers

Pitfalls

See Also

​Quick Start

​Tools and Structured Output

​Service Tiers

​Pitfalls

​See Also

Quick Start

Tools and Structured Output

Service Tiers

Pitfalls

See Also