Skip to main content
Generic compressors (LLMLingua, summarizers) treat all context equally, compressing based on perplexity or rewriting everything shorter. morph-compactor is query-conditioned. Pass the upcoming user question, and it keeps lines relevant to that question while dropping the rest. The same source file compressed with query="auth middleware" vs query="database schema" produces different output. Nothing gets rewritten. Every surviving line is byte-for-byte identical to your input.
Model IDmorph-compactor
APIsPOST /v1/responses (OpenAI-compatible), POST /v1/compress (with line ranges)
Speed15,000 tok/s
Context window262K tokens
Typical reduction50-70% fewer tokens
Verbatim accuracy98%
Query-conditionedYes, optional query/prompt/instructions parameter

Try the API

API reference with examples

Quick Start

import { MorphClient } from '@morphllm/morphsdk';

const morph = new MorphClient({ apiKey: "YOUR_API_KEY" });

const result = await morph.responses.compact({ input: chatHistory });

// compressed history -> your LLM
const response = await anthropic.messages.create({
  model: "claude-sonnet-4-5-20250929",
  system: result.output,
  messages: [{ role: "user", content: nextUserMessage }],
});
Or with the standard OpenAI SDK:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.morphllm.com/v1",
});

const response = await client.responses.create({
  model: "morph-compactor",
  input: chatHistory,
});

const compressed = response.output[0].content[0].text;

Query-Conditioned Compression

Pass the upcoming user question as prompt. The model tokenizes the query, scores every token in the input for relevance to that query, and drops low-scoring lines. Different queries on the same input produce different output.
// Auth question: keeps auth code, drops DB setup
const forAuth = await morph.responses.compact({
  input: chatHistory,
  prompt: "How do I validate JWT tokens in Express middleware?",
});

// DB question: keeps DB setup, drops auth code
const forDB = await morph.responses.compact({
  input: chatHistory,
  prompt: "How is the connection pool configured?",
});
Without prompt, the model auto-detects from the last user message.

Two Endpoints

/v1/responses returns compressed text. Works with any OpenAI SDK. /v1/compress returns compressed messages plus compacted_line_ranges per message, showing the exact 1-indexed line ranges that were filtered out. Use this when you need to track what was removed.
const result = await morph.responses.compress({
  messages: [{ role: "user", content: codeFile }],
  compression_ratio: 0.5,
  query: "auth middleware",
});

// result.messages[0].compacted_line_ranges
// [{ start: 7, end: 12 }, { start: 15, end: 18 }]

Parameters

ParameterTypeDefaultDescription
inputstring or arrayrequiredChat history or text
prompt / instructionsstring-Upcoming question for prompt-aware mode
modelstringmorph-compactorModel ID
prompt in the SDK maps to instructions in the raw API.

FAQ

Summarization rewrites. Compaction deletes. Every sentence that survives compaction is character-for-character identical to your original. No drift, no hallucinated context.
Agent loops past 50 turns. Multi-session chat where you carry history forward. Any pipeline where you pay for tokens that are greetings and filler.
Both work. A conversation array preserves role structure through compaction.
Yes. Point baseURL at https://api.morphllm.com/v1 and call client.responses.create() with model: "morph-compactor".
15,000 tok/s. 10K tokens compresses in under a second. 140K tokens in about 9 seconds.