Compact is a context compaction tool. It takes long conversation histories and drops unnecessary sentences — greetings, filler, repeated information — while preserving everything else character-for-character. Unlike summarization, nothing gets rewritten.
How it works: You pass a conversation to Compact. The model scans every sentence and decides: keep or drop. Filler like “Sure!”, “Got it”, duplicated facts, and superseded information get cut. Numbers, code blocks, first mentions, corrections, and unanswered questions stay. The output looks like the input with holes cut out.
Your Agent Compact
│ │
│ 50-turn conversation │
│ (12K tokens of context) │
│ ─────────────────────────────▶
│ │ ← drops greetings, filler
│ │ ← drops duplicate explanations
│ │ ← drops superseded statements
│ │ ← keeps code, numbers, questions
│ [compacted context] │
│ (4K tokens, ~100ms) │
│ ◀─────────────────────────────
│ │
Why compact instead of summarize? Summarization rewrites content — the LLM downstream sees paraphrased context, which can introduce drift and hallucination. Compaction preserves exact original phrasing for everything that remains.
Uses the OpenAI Responses API format — any existing OpenAI SDK works out of the box.
morph-compactor runs at 3,300+ tok/s with 98% verbatim accuracy. Typical compaction reduces token count by 50-70%.
Quick Start
MorphClient
OpenAI SDK
Python
cURL
import { MorphClient } from '@morphllm/morphsdk';
const morph = new MorphClient({ apiKey: "YOUR_API_KEY" });
const result = await morph.responses.compact({
input: [
{ role: "user", content: "Hi, I need help with my project." },
{ role: "assistant", content: "Sure! I'd be happy to help. What are you working on?" },
{ role: "user", content: "A Node.js app. I set up the DB and frontend. Now I need JWT auth." },
{ role: "assistant", content: "I can help with that. Are you using sessions or JWTs?" },
{ role: "user", content: "JWTs. Show me the middleware setup." },
],
});
console.log(result.output); // Compacted — filler removed
console.log(result.usage); // { input_tokens, output_tokens, total_tokens }
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.morphllm.com/v1",
});
const response = await client.responses.create({
model: "morph-compactor",
input: [
{ role: "user", content: "Hi, I need help with my project." },
{ role: "assistant", content: "Sure! I'd be happy to help. What are you working on?" },
{ role: "user", content: "A Node.js app. I set up the DB and frontend. Now I need JWT auth." },
{ role: "assistant", content: "I can help with that. Are you using sessions or JWTs?" },
{ role: "user", content: "JWTs. Show me the middleware setup." },
],
});
console.log(response.output[0].content[0].text);
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.morphllm.com/v1",
)
response = client.responses.create(
model="morph-compactor",
input=[
{"role": "user", "content": "Hi, I need help with my project."},
{"role": "assistant", "content": "Sure! I'd be happy to help. What are you working on?"},
{"role": "user", "content": "A Node.js app. I set up the DB and frontend. Now I need JWT auth."},
{"role": "assistant", "content": "I can help with that. Are you using sessions or JWTs?"},
{"role": "user", "content": "JWTs. Show me the middleware setup."},
],
)
print(response.output[0].content[0].text)
curl -X POST "https://api.morphllm.com/v1/responses" \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "morph-compactor",
"input": [
{"role": "user", "content": "Hi, I need help with my project."},
{"role": "assistant", "content": "Sure! I'\''d be happy to help. What are you working on?"},
{"role": "user", "content": "A Node.js app. I set up the DB and frontend. Now I need JWT auth."},
{"role": "assistant", "content": "I can help with that. Are you using sessions or JWTs?"},
{"role": "user", "content": "JWTs. Show me the middleware setup."}
]
}'
Prompt-Aware Compaction
Pass prompt to tell the compactor what the user is about to ask. The model keeps context relevant to that question and drops the rest more aggressively:
const result = await morph.responses.compact({
input: longConversation,
prompt: "How do I validate JWT tokens in Express middleware?",
});
// DB setup context dropped, JWT/auth context preserved
Without prompt, the model does general compaction — drops obvious filler but preserves all substantive content equally. With prompt, it biases toward the upcoming question.
prompt maps to instructions in the raw Responses API.
Direct Usage
Use CompactClient directly when you don’t need the full MorphClient:
import { CompactClient } from '@morphllm/morphsdk';
const compact = new CompactClient({ morphApiKey: "YOUR_API_KEY" });
const result = await compact.compact({
input: longConversation,
prompt: "What was discussed about deployment?",
temperature: 0.1,
max_output_tokens: 4096,
});
Edge / Cloudflare Workers
import { CompactClient } from '@morphllm/morphsdk/edge';
export default {
async fetch(request: Request, env: Env) {
const compact = new CompactClient({ morphApiKey: env.MORPH_API_KEY });
const { input, prompt } = await request.json();
const result = await compact.compact({ input, prompt });
return Response.json({ output: result.output, usage: result.usage });
}
};
The edge entry point has zero Node.js dependencies.
Configuration
| Option | Type | Default | Description |
|---|
input | string or array | Required | Conversation to compact — plain string or {role, content}[] |
prompt | string | — | User’s upcoming message. Enables prompt-aware compaction |
model | string | morph-compactor | Model ID |
temperature | number | — | Sampling temperature (0-1) |
max_output_tokens | integer | — | Cap on output length |
API
import type { CompactInput, CompactResult, CompactConfig } from '@morphllm/morphsdk';
Error Handling
try {
const result = await morph.responses.compact({ input: conversationHistory });
console.log(result.output);
} catch (error) {
// Standard OpenAI SDK errors — same as any responses.create() call
console.error(error.message);
}
Use Cases
- Agent loops — compress 50+ turns of tool calls and reasoning before the next inference call
- Cost reduction — strip filler before sending to expensive models. Pay for signal, not noise
- Context window pressure — fit more relevant history when approaching model limits
- Multi-session chat — carry compacted history between sessions instead of raw transcripts