Skip to main content
Morph Compact
Compact is a context compaction tool. It takes long conversation histories and drops unnecessary sentences — greetings, filler, repeated information — while preserving everything else character-for-character. Unlike summarization, nothing gets rewritten. How it works: You pass a conversation to Compact. The model scans every sentence and decides: keep or drop. Filler like “Sure!”, “Got it”, duplicated facts, and superseded information get cut. Numbers, code blocks, first mentions, corrections, and unanswered questions stay. The output looks like the input with holes cut out.
Your Agent                       Compact
    │                              │
    │  50-turn conversation        │
    │  (12K tokens of context)     │
    │ ─────────────────────────────▶
    │                              │ ← drops greetings, filler
    │                              │ ← drops duplicate explanations
    │                              │ ← drops superseded statements
    │                              │ ← keeps code, numbers, questions
    │     [compacted context]      │
    │     (4K tokens, ~100ms)      │
    │ ◀─────────────────────────────
    │                              │
Why compact instead of summarize? Summarization rewrites content — the LLM downstream sees paraphrased context, which can introduce drift and hallucination. Compaction preserves exact original phrasing for everything that remains.
Uses the OpenAI Responses API format — any existing OpenAI SDK works out of the box.
morph-compactor runs at 3,300+ tok/s with 98% verbatim accuracy. Typical compaction reduces token count by 50-70%.

Quick Start

import { MorphClient } from '@morphllm/morphsdk';

const morph = new MorphClient({ apiKey: "YOUR_API_KEY" });

const result = await morph.responses.compact({
  input: [
    { role: "user", content: "Hi, I need help with my project." },
    { role: "assistant", content: "Sure! I'd be happy to help. What are you working on?" },
    { role: "user", content: "A Node.js app. I set up the DB and frontend. Now I need JWT auth." },
    { role: "assistant", content: "I can help with that. Are you using sessions or JWTs?" },
    { role: "user", content: "JWTs. Show me the middleware setup." },
  ],
});

console.log(result.output); // Compacted — filler removed
console.log(result.usage);  // { input_tokens, output_tokens, total_tokens }

Prompt-Aware Compaction

Pass prompt to tell the compactor what the user is about to ask. The model keeps context relevant to that question and drops the rest more aggressively:
const result = await morph.responses.compact({
  input: longConversation,
  prompt: "How do I validate JWT tokens in Express middleware?",
});

// DB setup context dropped, JWT/auth context preserved
Without prompt, the model does general compaction — drops obvious filler but preserves all substantive content equally. With prompt, it biases toward the upcoming question.
prompt maps to instructions in the raw Responses API.

Direct Usage

Use CompactClient directly when you don’t need the full MorphClient:
import { CompactClient } from '@morphllm/morphsdk';

const compact = new CompactClient({ morphApiKey: "YOUR_API_KEY" });

const result = await compact.compact({
  input: longConversation,
  prompt: "What was discussed about deployment?",
  temperature: 0.1,
  max_output_tokens: 4096,
});

Edge / Cloudflare Workers

import { CompactClient } from '@morphllm/morphsdk/edge';

export default {
  async fetch(request: Request, env: Env) {
    const compact = new CompactClient({ morphApiKey: env.MORPH_API_KEY });
    const { input, prompt } = await request.json();

    const result = await compact.compact({ input, prompt });
    return Response.json({ output: result.output, usage: result.usage });
  }
};
The edge entry point has zero Node.js dependencies.

Configuration

OptionTypeDefaultDescription
inputstring or arrayRequiredConversation to compact — plain string or {role, content}[]
promptstringUser’s upcoming message. Enables prompt-aware compaction
modelstringmorph-compactorModel ID
temperaturenumberSampling temperature (0-1)
max_output_tokensintegerCap on output length

API

CompactInput:
{
  input: string | ResponseInputItem[],
  prompt?: string,
  model?: string,
  temperature?: number,
  max_output_tokens?: number,
}
import type { CompactInput, CompactResult, CompactConfig } from '@morphllm/morphsdk';

Error Handling

try {
  const result = await morph.responses.compact({ input: conversationHistory });
  console.log(result.output);
} catch (error) {
  // Standard OpenAI SDK errors — same as any responses.create() call
  console.error(error.message);
}

Use Cases

  • Agent loops — compress 50+ turns of tool calls and reasoning before the next inference call
  • Cost reduction — strip filler before sending to expensive models. Pay for signal, not noise
  • Context window pressure — fit more relevant history when approaching model limits
  • Multi-session chat — carry compacted history between sessions instead of raw transcripts