Morph

Specialized models and subagents for AI coding agents. Fast Apply edits files at 10,500 tok/s, WarpGrep searches codebases in ~6s, Compact compresses context at 33,000 tok/s. Base URL: https://api.morphllm.com/v1 (OpenAI-compatible, works with any OpenAI SDK)

When to use which product

Need	Product	Model ID
Apply a code edit to a file	Fast Apply	`morph-v3-fast` (default) or `morph-v3-large` (complex edits)
Let the system pick fast vs large	Router + Apply	`auto`
Search a local codebase	WarpGrep	`morph-warp-grep-v1`
Search a public GitHub repo	WarpGrep (GitHub mode)	`morph-warp-grep-v1`
Compress chat history / context	Compact	`morph-compactor`
Generate code embeddings	Embedding	`morph-embedding-v4`
Rerank search results	Rerank	`morph-rerank-v4`
Route prompts by complexity	Router	`morph-routers`

Instructions for agents

All endpoints are OpenAI-compatible. Use base_url: https://api.morphllm.com/v1 with any OpenAI SDK.
Always set temperature: 0 for Apply and WarpGrep calls.
Apply is for edits, not file creation. It merges a partial update into an existing file. Do not send empty <code> blocks.
Use // ... existing code ... markers in update snippets to indicate unchanged regions. This is required.
Apply message format: Single user message containing <instruction> (what the edit does), <code> (original file), and <update> (partial edit with markers) XML tags.
Include <instruction> in Apply calls. Accuracy jumps from 92% to 98% when you describe what the edit does.
WarpGrep has built-in tools. Do NOT pass a tools array in your request. The model uses grep_search, read, list_directory, glob, finish internally.
WarpGrep for local repos requires ripgrep installed and a <repo_structure> block in the user message.
Compact preserves exact bytes. Every surviving line is byte-for-byte identical to input. Use the query param to tell it what matters for the next LLM call.
Compact supports <keepContext> tags to protect critical sections from compression.
Router returns a model recommendation, not a completion. Use it to decide which downstream model to call, then call that model separately.
Never guess model names. The current models are listed in the table above. Do not invent model IDs like morph-v2 or morph-fast.

Gotchas

Sending the full updated file as <update> instead of a partial snippet wastes tokens and defeats the purpose of Apply.
Forgetting // ... existing code ... markers causes Apply to treat the snippet as the complete file content.
Passing a tools array to WarpGrep overrides its built-in tools and breaks search.
Using Apply to create new files (empty <code>) gives poor results. Write the file directly instead.
Compact without query uses the last user message as the relevance signal. For best compression, always pass query explicitly.

Apply example

from openai import OpenAI

client = OpenAI(base_url="https://api.morphllm.com/v1", api_key="your-key")

response = client.chat.completions.create(
    model="morph-v3-fast",
    messages=[{
        "role": "user",
        "content": "<instruction>Add error handling for division by zero</instruction>\n<code>function divide(a, b) {\n  return a / b;\n}</code>\n<update>function divide(a, b) {\n  if (b === 0) throw new Error('Division by zero');\n  return a / b;\n}</update>"
    }],
    temperature=0
)

Resources

Full documentation index: https://docs.morphllm.com/llms.txt
Complete docs in one file (~9k tokens): https://docs.morphllm.com/llms-full.txt
MCP server: npx @morphllm/morphmcp@latest
Dashboard & API keys: https://morphllm.com/dashboard

​Morph

​When to use which product

​Instructions for agents

​Gotchas

​Apply example

​Resources