The problem
Coding agents waste most of their compute on things that aren’t reasoning. Cognition measured this: their agent spent 60% of turns searching for code. Anthropic found multi-agent architectures improve task completion by 90% when mechanical work runs in specialized subprocesses. And if you’ve built an agent that edits files, you’ve seen the failure mode: your model rewrites a 500-line file to change 3 lines, burns tokens, and introduces drift. These aren’t reasoning problems. They’re mechanical problems, and they have mechanical solutions.| Without Morph | With Morph |
|---|---|
| Agent rewrites a 500-line file to change 3 lines. Costs ~$0.12, takes 8 seconds. | Agent sends a 10-line edit snippet. Merged in under a second at 10,500 tok/s, 98% accuracy. |
| Agent spends 60% of turns searching. Results fill the context window. | WarpGrep searches in a separate context. Finds code in 3.8 steps. Main context stays clean. |
| After 50 turns, chat history is 80% filler. Model starts forgetting. | Compact removes irrelevant lines at 33,000 tok/s. 50-70% reduction. Every surviving line is verbatim. |
Get running in 30 seconds
Install the MCP server. One command, andedit_file + codebase_search appear in your editor.
Full MCP setup guide
Per-client configuration, CLAUDE.md prompts, and troubleshooting
Building an agent? Use the SDK.
OpenAI-compatible API. Point any OpenAI SDK athttps://api.morphllm.com/v1.
Products
| Product | What it does | Speed | Key metric |
|---|---|---|---|
| Fast Apply | Merges edit snippets into files | 10,500 tok/s | 98% accuracy |
| WarpGrep | Searches code in an isolated context window | ~3.8 steps | #1 SWE-Bench Pro |
| Compact | Removes irrelevant lines from chat history | 33,000 tok/s | 50-70% reduction, verbatim |
| Router | Routes prompts to the right model tier | ~430ms | $0.001/request |
Fast Apply: how it works
Fast Apply: how it works
Your agent describes a change as a lazy edit snippet (just the changed lines, with
// ... existing code ... markers). Fast Apply merges that snippet into the original file and returns the result.98% accuracy. Sub-second latency on typical files. This is the same approach Cursor uses.Full guide →WarpGrep: how it works
WarpGrep: how it works
WarpGrep is a separate LLM that searches your codebase in its own context window. It takes a natural language query, issues 8 parallel tool calls per turn, and returns file/line-range spans in ~3.8 steps (under 6 seconds on most repos).The key detail: it runs in isolation. Your main agent’s context stays clean. No 200-file grep dumps polluting the conversation.Paired with Opus, Codex, or MiniMax, WarpGrep reaches #1 on SWE-Bench Pro, 15.6% cheaper and 28% faster than single-model approaches.Full guide →
Compact: how it works
Compact: how it works
Shrinks chat history and code context before sending it to your LLM. 100K tokens compress in under 2 seconds. 50-70% reduction. Every surviving line is byte-for-byte identical to the original.The optional
query parameter makes compression much better. It tells the model what the user is about to ask, so query="auth middleware" keeps auth code and drops DB setup.1M token context window. You can compress entire repositories in a single call.Full guide →Common gotchas
My agent rewrites the whole file instead of using edit snippets
My agent rewrites the whole file instead of using edit snippets
Fast Apply only helps if your agent outputs partial edits. You need to update your agent’s system prompt to use
// ... existing code ... markers. Without this, your agent generates full-file rewrites and there’s nothing for Fast Apply to merge. See the prompt templates.WarpGrep results seem incomplete
WarpGrep results seem incomplete
WarpGrep needs ripgrep installed locally for codebase search. If ripgrep isn’t on PATH, searches will fail silently. GitHub search runs on the cloud and doesn’t need ripgrep.
Compact is dropping lines I need
Compact is dropping lines I need
Use the
query parameter. Without it, Compact makes generic compression decisions. With a specific query like "database connection pooling", it keeps the relevant lines and drops the rest.I'm using Python, not TypeScript
I'm using Python, not TypeScript
The Morph API is OpenAI-compatible. Use the OpenAI Python SDK, point it at
https://api.morphllm.com/v1, and pass your Morph API key. See the quickstart for Python examples. WarpGrep has a dedicated Python guide.If you’re coming from…
- Claude Code / Codex
- Cursor
- Aider / Continue
- Building your own agent
Install the MCP server.
edit_file and codebase_search appear as tools automatically. No code changes. MCP quickstart →Next steps
Fast Apply Quickstart
Prompt templates, code examples, verification
WarpGrep Guide
Codebase search, GitHub search, streaming
Compact Guide
Query-conditioned compression, keepContext tags
MCP Integration
Claude Code, Cursor, Codex, VS Code
SDK Reference
Full TypeScript SDK documentation
API Playground
Test with live examples
Enterprise
Dedicated instances, self-hosted deployments, zero data retention. 99.9% uptime SLA, SOC2, SSO.Talk to Sales
Custom deployments and volume pricing