Morph
Specialized models and subagents for AI coding agents. Fast Apply edits files at 10,500 tok/s, WarpGrep searches codebases in ~6s, Compact compresses context at 33,000 tok/s. Base URL:https://api.morphllm.com/v1 (OpenAI-compatible, works with any OpenAI SDK)
When to use which product
| Need | Product | Model ID |
|---|---|---|
| Apply a code edit to a file | Fast Apply | morph-v3-fast (default) or morph-v3-large (complex edits) |
| Let the system pick fast vs large | Router + Apply | auto |
| Search a local codebase | WarpGrep | morph-warp-grep-v1 |
| Search a public GitHub repo | WarpGrep (GitHub mode) | morph-warp-grep-v1 |
| Compress chat history / context | Compact | morph-compactor |
| Generate code embeddings | Embedding | morph-embedding-v4 |
| Rerank search results | Rerank | morph-rerank-v4 |
| Route prompts by complexity | Router | morph-routers |
Instructions for agents
- All endpoints are OpenAI-compatible. Use
base_url: https://api.morphllm.com/v1with any OpenAI SDK. - Always set
temperature: 0for Apply and WarpGrep calls. - Apply is for edits, not file creation. It merges a partial update into an existing file. Do not send empty
<code>blocks. - Use
// ... existing code ...markers in update snippets to indicate unchanged regions. This is required. - Apply message format: Single user message containing
<instruction>(what the edit does),<code>(original file), and<update>(partial edit with markers) XML tags. - Include
<instruction>in Apply calls. Accuracy jumps from 92% to 98% when you describe what the edit does. - WarpGrep has built-in tools. Do NOT pass a
toolsarray in your request. The model usesgrep_search,read,list_directory,glob,finishinternally. - WarpGrep for local repos requires
ripgrepinstalled and a<repo_structure>block in the user message. - Compact preserves exact bytes. Every surviving line is byte-for-byte identical to input. Use the
queryparam to tell it what matters for the next LLM call. - Compact supports
<keepContext>tags to protect critical sections from compression. - Router returns a model recommendation, not a completion. Use it to decide which downstream model to call, then call that model separately.
- Never guess model names. The current models are listed in the table above. Do not invent model IDs like
morph-v2ormorph-fast.
Gotchas
- Sending the full updated file as
<update>instead of a partial snippet wastes tokens and defeats the purpose of Apply. - Forgetting
// ... existing code ...markers causes Apply to treat the snippet as the complete file content. - Passing a
toolsarray to WarpGrep overrides its built-in tools and breaks search. - Using Apply to create new files (empty
<code>) gives poor results. Write the file directly instead. - Compact without
queryuses the last user message as the relevance signal. For best compression, always passqueryexplicitly.
Apply example
Resources
- Full documentation index: https://docs.morphllm.com/llms.txt
- Complete docs in one file (~9k tokens): https://docs.morphllm.com/llms-full.txt
- MCP server:
npx @morphllm/morphmcp@latest - Dashboard & API keys: https://morphllm.com/dashboard