Skip to main content

The problem

Coding agents waste most of their compute on things that aren’t reasoning. Cognition measured this: their agent spent 60% of turns searching for code. Anthropic found multi-agent architectures improve task completion by 90% when mechanical work runs in specialized subprocesses. And if you’ve built an agent that edits files, you’ve seen the failure mode: your model rewrites a 500-line file to change 3 lines, burns tokens, and introduces drift. These aren’t reasoning problems. They’re mechanical problems, and they have mechanical solutions.
Without MorphWith Morph
Agent rewrites a 500-line file to change 3 lines. Costs ~$0.12, takes 8 seconds.Agent sends a 10-line edit snippet. Merged in under a second at 10,500 tok/s, 98% accuracy.
Agent spends 60% of turns searching. Results fill the context window.WarpGrep searches in a separate context. Finds code in 3.8 steps. Main context stays clean.
After 50 turns, chat history is 80% filler. Model starts forgetting.Compact removes irrelevant lines at 33,000 tok/s. 50-70% reduction. Every surviving line is verbatim.

Get running in 30 seconds

Install the MCP server. One command, and edit_file + codebase_search appear in your editor.
npx -y @morphllm/morph-setup --morph-api-key YOUR_API_KEY
This auto-detects Claude Code, Cursor, Codex, and VS Code, then configures them all.
No API key yet? Grab one from your dashboard. Free tier included.

Full MCP setup guide

Per-client configuration, CLAUDE.md prompts, and troubleshooting

Building an agent? Use the SDK.

OpenAI-compatible API. Point any OpenAI SDK at https://api.morphllm.com/v1.
import { MorphClient } from '@morphllm/morphsdk';

const morph = new MorphClient({ apiKey: process.env.MORPH_API_KEY });

// Edit a file: 10,500 tok/s, 98% accuracy
const edit = await morph.fastApply.execute({
  target_filepath: 'src/auth.ts',
  instructions: 'Add null check before session creation',
  code_edit: '// ... existing code ...\nif (!user) throw new Error("Not found");\n// ... existing code ...'
});

// Search a codebase: 3.8 steps, 8 parallel tool calls per turn
const search = await morph.warpGrep.execute({
  searchTerm: 'Find authentication middleware',
  repoRoot: '.'
});

// Compress context: 33,000 tok/s, 50-70% reduction
const compact = await morph.compact({
  input: chatHistory,
  query: 'JWT token validation'
});
npm install @morphllm/morphsdk

Products

ProductWhat it doesSpeedKey metric
Fast ApplyMerges edit snippets into files10,500 tok/s98% accuracy
WarpGrepSearches code in an isolated context window~3.8 steps#1 SWE-Bench Pro
CompactRemoves irrelevant lines from chat history33,000 tok/s50-70% reduction, verbatim
RouterRoutes prompts to the right model tier~430ms$0.001/request
Your agent describes a change as a lazy edit snippet (just the changed lines, with // ... existing code ... markers). Fast Apply merges that snippet into the original file and returns the result.98% accuracy. Sub-second latency on typical files. This is the same approach Cursor uses.
If your agent omits // ... existing code ... markers, Fast Apply treats missing sections as deletions. Make sure your agent prompt includes the marker format. See the quickstart for prompt templates.
Full guide →
WarpGrep is a separate LLM that searches your codebase in its own context window. It takes a natural language query, issues 8 parallel tool calls per turn, and returns file/line-range spans in ~3.8 steps (under 6 seconds on most repos).The key detail: it runs in isolation. Your main agent’s context stays clean. No 200-file grep dumps polluting the conversation.Paired with Opus, Codex, or MiniMax, WarpGrep reaches #1 on SWE-Bench Pro, 15.6% cheaper and 28% faster than single-model approaches.
WarpGrep also searches public GitHub repos without cloning. Pass a GitHub URL instead of a local path.
Full guide →
Shrinks chat history and code context before sending it to your LLM. 100K tokens compress in under 2 seconds. 50-70% reduction. Every surviving line is byte-for-byte identical to the original.The optional query parameter makes compression much better. It tells the model what the user is about to ask, so query="auth middleware" keeps auth code and drops DB setup.1M token context window. You can compress entire repositories in a single call.Full guide →

Common gotchas

Fast Apply only helps if your agent outputs partial edits. You need to update your agent’s system prompt to use // ... existing code ... markers. Without this, your agent generates full-file rewrites and there’s nothing for Fast Apply to merge. See the prompt templates.
WarpGrep needs ripgrep installed locally for codebase search. If ripgrep isn’t on PATH, searches will fail silently. GitHub search runs on the cloud and doesn’t need ripgrep.
Use the query parameter. Without it, Compact makes generic compression decisions. With a specific query like "database connection pooling", it keeps the relevant lines and drops the rest.
The Morph API is OpenAI-compatible. Use the OpenAI Python SDK, point it at https://api.morphllm.com/v1, and pass your Morph API key. See the quickstart for Python examples. WarpGrep has a dedicated Python guide.

If you’re coming from…

Install the MCP server. edit_file and codebase_search appear as tools automatically. No code changes. MCP quickstart →

Next steps

Fast Apply Quickstart

Prompt templates, code examples, verification

WarpGrep Guide

Codebase search, GitHub search, streaming

Compact Guide

Query-conditioned compression, keepContext tags

MCP Integration

Claude Code, Cursor, Codex, VS Code

SDK Reference

Full TypeScript SDK documentation

API Playground

Test with live examples

Enterprise

Dedicated instances, self-hosted deployments, zero data retention. 99.9% uptime SLA, SOC2, SSO.

Talk to Sales

Custom deployments and volume pricing