Skip to main content

Morph

Specialized models and subagents for AI coding agents. Fast Apply edits files at 10,500 tok/s, WarpGrep searches codebases in ~6s, Compact compresses context at 33,000 tok/s. Base URL: https://api.morphllm.com/v1 (OpenAI-compatible, works with any OpenAI SDK)

When to use which product

NeedProductModel ID
Apply a code edit to a fileFast Applymorph-v3-fast (default) or morph-v3-large (complex edits)
Let the system pick fast vs largeRouter + Applyauto
Search a local codebaseWarpGrepmorph-warp-grep-v1
Search a public GitHub repoWarpGrep (GitHub mode)morph-warp-grep-v1
Compress chat history / contextCompactmorph-compactor
Generate code embeddingsEmbeddingmorph-embedding-v4
Rerank search resultsRerankmorph-rerank-v4
Route prompts by complexityRoutermorph-routers

Instructions for agents

  • All endpoints are OpenAI-compatible. Use base_url: https://api.morphllm.com/v1 with any OpenAI SDK.
  • Always set temperature: 0 for Apply and WarpGrep calls.
  • Apply is for edits, not file creation. It merges a partial update into an existing file. Do not send empty <code> blocks.
  • Use // ... existing code ... markers in update snippets to indicate unchanged regions. This is required.
  • Apply message format: Single user message containing <instruction> (what the edit does), <code> (original file), and <update> (partial edit with markers) XML tags.
  • Include <instruction> in Apply calls. Accuracy jumps from 92% to 98% when you describe what the edit does.
  • WarpGrep has built-in tools. Do NOT pass a tools array in your request. The model uses grep_search, read, list_directory, glob, finish internally.
  • WarpGrep for local repos requires ripgrep installed and a <repo_structure> block in the user message.
  • Compact preserves exact bytes. Every surviving line is byte-for-byte identical to input. Use the query param to tell it what matters for the next LLM call.
  • Compact supports <keepContext> tags to protect critical sections from compression.
  • Router returns a model recommendation, not a completion. Use it to decide which downstream model to call, then call that model separately.
  • Never guess model names. The current models are listed in the table above. Do not invent model IDs like morph-v2 or morph-fast.

Gotchas

  • Sending the full updated file as <update> instead of a partial snippet wastes tokens and defeats the purpose of Apply.
  • Forgetting // ... existing code ... markers causes Apply to treat the snippet as the complete file content.
  • Passing a tools array to WarpGrep overrides its built-in tools and breaks search.
  • Using Apply to create new files (empty <code>) gives poor results. Write the file directly instead.
  • Compact without query uses the last user message as the relevance signal. For best compression, always pass query explicitly.

Apply example

from openai import OpenAI

client = OpenAI(base_url="https://api.morphllm.com/v1", api_key="your-key")

response = client.chat.completions.create(
    model="morph-v3-fast",
    messages=[{
        "role": "user",
        "content": "<instruction>Add error handling for division by zero</instruction>\n<code>function divide(a, b) {\n  return a / b;\n}</code>\n<update>function divide(a, b) {\n  if (b === 0) throw new Error('Division by zero');\n  return a / b;\n}</update>"
    }],
    temperature=0
)

Resources