morph-compactor is query-conditioned. Pass the upcoming user question, and it keeps lines relevant to that question while dropping the rest. The same source file compressed with query="auth middleware" vs query="database schema" produces different output.
Nothing gets rewritten. Every surviving line is byte-for-byte identical to your input.
| Model ID | morph-compactor |
| APIs | POST /v1/responses (OpenAI-compatible), POST /v1/compress (with line ranges) |
| Speed | 15,000 tok/s |
| Context window | 262K tokens |
| Typical reduction | 50-70% fewer tokens |
| Verbatim accuracy | 98% |
| Query-conditioned | Yes, optional query/prompt/instructions parameter |
Try the API
API reference with examples
Quick Start
Query-Conditioned Compression
Pass the upcoming user question asprompt. The model tokenizes the query, scores every token in the input for relevance to that query, and drops low-scoring lines. Different queries on the same input produce different output.
prompt, the model auto-detects from the last user message.
Two Endpoints
/v1/responses returns compressed text. Works with any OpenAI SDK.
/v1/compress returns compressed messages plus compacted_line_ranges per message, showing the exact 1-indexed line ranges that were filtered out. Use this when you need to track what was removed.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
input | string or array | required | Chat history or text |
prompt / instructions | string | - | Upcoming question for prompt-aware mode |
model | string | morph-compactor | Model ID |
prompt in the SDK maps to instructions in the raw API.FAQ
How is this different from summarization?
How is this different from summarization?
Summarization rewrites. Compaction deletes. Every sentence that survives compaction is character-for-character identical to your original. No drift, no hallucinated context.
When should I compact?
When should I compact?
Agent loops past 50 turns. Multi-session chat where you carry history forward. Any pipeline where you pay for tokens that are greetings and filler.
String input or conversation array?
String input or conversation array?
Both work. A conversation array preserves role structure through compaction.
Can I use the standard OpenAI SDK?
Can I use the standard OpenAI SDK?
Yes. Point
baseURL at https://api.morphllm.com/v1 and call client.responses.create() with model: "morph-compactor".How fast?
How fast?
15,000 tok/s. 10K tokens compresses in under a second. 140K tokens in about 9 seconds.