> ## Documentation Index
> Fetch the complete documentation index at: https://docs.morphllm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Compact

> Drop filler from chat history at 33,000 tok/s. No rewriting, no paraphrasing.

<Frame>
  <img src="https://mintcdn.com/morph-555d6c14/syIyP5jrtLcFhUaU/images/compact.webp?fit=max&auto=format&n=syIyP5jrtLcFhUaU&q=85&s=2552095c70f6dda9738406c3f99f9880" alt="Morph Compact" width="1376" height="768" data-path="images/compact.webp" />
</Frame>

Drop filler from chat history and code context at **33,000 tok/s**. 50-70% reduction, every surviving line byte-for-byte identical to input.

<Note>
  Compaction works by **deleting entire lines** from the input — it never rewrites or paraphrases. This means if more than \~10% of the context you feed in lives on a single line, compaction cannot selectively trim within that line and results will be poor. Split long single-line payloads (e.g., minified code or giant JSON blobs) into multiple lines before compacting.
</Note>

|                       |                                          |
| --------------------- | ---------------------------------------- |
| **Model**             | `morph-compactor`                        |
| **Speed**             | 33,000 tok/s                             |
| **Context window**    | 1M tokens                                |
| **Typical reduction** | 50-70% fewer tokens                      |
| **Output**            | Verbatim lines from input (no rewriting) |

<a href="https://www.morphllm.com/dashboard/playground/compact" target="_blank">
  <Button variant="primary">Try it in the Playground</Button>
</a>

## Quick Start

<Info>
  **Logged in?** Your API key auto-fills in the code blocks below. Otherwise, get it from your [dashboard](https://morphllm.com/dashboard/api-keys).
</Info>

<Tabs>
  <Tab title="Morph SDK">
    ```typescript theme={null}
    import { MorphClient } from '@morphllm/morphsdk';

    const morph = new MorphClient({ apiKey: "YOUR_API_KEY" });

    const result = await morph.compact({
      input: chatHistory,
      query: "How do I validate JWT tokens?",
    });

    // Pass compressed history to your LLM
    const response = await anthropic.messages.create({
      model: "claude-sonnet-4-5-20250929",
      messages: [
        { role: "user", content: result.output },
        { role: "user", content: "How do I validate JWT tokens?" },
      ],
    });
    ```
  </Tab>

  <Tab title="HTTP">
    ```bash theme={null}
    curl -X POST "https://api.morphllm.com/v1/compact" \
      -H "Authorization: Bearer YOUR_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "input": "def hello():\n    return 1\n\ndef unused():\n    pass\n\ndef world():\n    return 2",
        "query": "hello function",
        "compression_ratio": 0.5,
        "preserve_recent": 0
      }'
    ```
  </Tab>

  <Tab title="OpenAI SDK (TS)">
    ```typescript theme={null}
    import OpenAI from 'openai';

    const client = new OpenAI({
      apiKey: "YOUR_API_KEY",
      baseURL: "https://api.morphllm.com/v1",
    });

    const response = await client.chat.completions.create({
      model: "morph-compactor",
      messages: [{ role: "user", content: chatHistory }],
    });

    const compressed = response.choices[0].message.content;
    ```
  </Tab>

  <Tab title="OpenAI SDK (Python)">
    ```python theme={null}
    from openai import OpenAI

    client = OpenAI(
        api_key="YOUR_API_KEY",
        base_url="https://api.morphllm.com/v1",
    )

    response = client.chat.completions.create(
        model="morph-compactor",
        messages=[{"role": "user", "content": chat_history}],
    )

    compressed = response.choices[0].message.content
    ```
  </Tab>

  <Tab title="Python (requests)">
    ```python theme={null}
    import requests

    response = requests.post(
        "https://api.morphllm.com/v1/compact",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        json={
            "input": source_code,
            "query": "authentication",
            "compression_ratio": 0.5,
            "preserve_recent": 0,
        },
    )

    data = response.json()
    print(data["output"])
    ```
  </Tab>
</Tabs>

## Query-Conditioned Compression

The `query` parameter tells the model what matters. The model scores every line's relevance to that query, then drops lines below the threshold.

```typescript theme={null}
// Same chat history, different queries, different output
const forAuth = await morph.compact({
  input: chatHistory,
  query: "JWT token validation",
});
// DB setup and CSS discussion dropped, auth code kept

const forDB = await morph.compact({
  input: chatHistory,
  query: "database connection pooling",
});
// Auth code dropped, DB setup kept
```

Without `query`, the model auto-detects from the last user message. Explicit queries give tighter compression.

## Line Ranges and Markers

By default, each message includes `compacted_line_ranges` (which lines were removed) and `(filtered N lines)` markers in the text. Both are configurable:

```typescript theme={null}
// Default: markers + ranges
const result = await morph.compact({
  input: codeFile,
  query: "auth middleware",
  compressionRatio: 0.5,
  preserveRecent: 0,
});

console.log(result.output);
// def authenticate():
//     ...
// (filtered 12 lines)
// def handle_request():
//     ...

for (const r of result.messages[0].compacted_line_ranges) {
  console.log(`lines ${r.start}-${r.end} removed`);
}

// No markers: empty lines instead of "(filtered N lines)"
await morph.compact({ input: codeFile, includeMarkers: false });

// No line ranges: skip tracking removed ranges
await morph.compact({ input: codeFile, includeLineRanges: false });
// result.messages[0].compacted_line_ranges === []
```

## Preserving Critical Context

Wrap sections you never want compressed in `<keepContext>` / `</keepContext>` tags. Tagged content survives compression verbatim regardless of the compression ratio.

```typescript theme={null}
const input = `
// Database connection setup
const pool = new Pool({ host: 'localhost', port: 5432 });

<keepContext>
// CRITICAL: Auth middleware - do not compress
function authenticate(req, res, next) {
  const token = req.headers.authorization?.split(' ')[1];
  if (!token) return res.status(401).json({ error: 'No token' });
  const decoded = jwt.verify(token, process.env.JWT_SECRET);
  req.user = decoded;
  next();
}
</keepContext>

// Logging utilities
function logRequest(req) { console.log(req.method, req.path); }
function logError(err) { console.error(err.stack); }
// ... 200 more lines of helpers
`;

const result = await morph.compact({
  input,
  query: "authentication",
  compressionRatio: 0.3,
});

// The authenticate() function is fully preserved.
// DB setup and logging helpers are compressed.
// The <keepContext> tags themselves are stripped from output.
```

**Rules:**

* Tags must be on their own line (no inline `code() <keepContext>`)
* Tags must open and close within the same message
* Kept content counts against the `compression_ratio` budget. If you keep 40% and request 0.5, the remaining 60% compresses harder to hit the target.
* Unclosed `<keepContext>` preserves everything from the tag to the end of the message

The response includes `kept_line_ranges` showing which lines were force-preserved:

```typescript theme={null}
for (const r of result.messages[0].kept_line_ranges) {
  console.log(`lines ${r.start}-${r.end} preserved via keepContext`);
}
```

## API Reference

### `POST /v1/compact`

The primary endpoint. Accepts string input or message arrays.

**Parameters**

| Parameter                  | Type            | Default           | Description                                                                               |
| -------------------------- | --------------- | ----------------- | ----------------------------------------------------------------------------------------- |
| `input`                    | string or array | -                 | Text or `{role, content}` array. One of `input`/`messages` required.                      |
| `messages`                 | array           | -                 | `{role, content}` messages. Takes priority over `input`.                                  |
| `query`                    | string          | auto-detected     | Focus query for relevance-based pruning                                                   |
| `compression_ratio`        | float           | `0.5`             | Fraction to keep. `0.3` = aggressive, `0.7` = light                                       |
| `preserve_recent`          | int             | `2`               | Keep last N messages uncompressed                                                         |
| `compress_system_messages` | bool            | `false`           | When `true`, system messages are also compressed. By default they are preserved verbatim. |
| `include_line_ranges`      | bool            | `true`            | Include `compacted_line_ranges` in response                                               |
| `include_markers`          | bool            | `true`            | Include `(filtered N lines)` text markers. When `false`, gaps become empty lines          |
| `model`                    | string          | `morph-compactor` | Model ID                                                                                  |

**Response**

```json theme={null}
{
  "id": "cmpr-7373faf8af65",
  "object": "compact",
  "model": "morph-compactor",
  "output": "def hello():\n    print(\"hello world\")\n(filtered 6 lines)\ndef world():\n    return 42",
  "messages": [
    {
      "role": "user",
      "content": "def hello():\n    print(\"hello world\")\n(filtered 6 lines)\ndef world():\n    return 42",
      "compacted_line_ranges": [{ "start": 5, "end": 10 }],
      "kept_line_ranges": []
    }
  ],
  "usage": {
    "input_tokens": 101,
    "output_tokens": 65,
    "compression_ratio": 0.644,
    "processing_time_ms": 109
  }
}
```

### `POST /v1/chat/completions`

OpenAI Chat Completions format. Drop-in replacement for any OpenAI-compatible client pointed at `https://api.morphllm.com/v1`. Supports streaming via `stream: true`.

| Parameter           | Type   | Required | Description                             |
| ------------------- | ------ | -------- | --------------------------------------- |
| `model`             | string | Yes      | `morph-compactor`                       |
| `messages`          | array  | Yes      | `{role, content}` message array         |
| `compression_ratio` | float  | No       | Fraction to keep (default 0.5)          |
| `query`             | string | No       | Focus query for relevance-based pruning |
| `stream`            | bool   | No       | Enable SSE streaming                    |

```json theme={null}
{
  "id": "cmpr-def456",
  "object": "chat.completion",
  "model": "morph-compactor",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "compressed text..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 4200, "completion_tokens": 1800, "total_tokens": 6000 }
}
```

### `POST /v1/responses`

OpenAI Responses API format. Works with OpenAI SDK v5+ (TS) or v1.66+ (Python) pointed at `https://api.morphllm.com/v1`.

| Parameter | Type            | Required | Description                             |
| --------- | --------------- | -------- | --------------------------------------- |
| `model`   | string          | Yes      | `morph-compactor`                       |
| `input`   | string or array | Yes      | Text or `{role, content}` array         |
| `query`   | string          | No       | Focus query for relevance-based pruning |

```json theme={null}
{
  "id": "cmpr-abc123",
  "object": "response",
  "model": "morph-compactor",
  "output": [{
    "type": "message",
    "role": "assistant",
    "content": [{ "type": "output_text", "text": "compressed text..." }]
  }],
  "usage": { "input_tokens": 4200, "output_tokens": 1800 }
}
```

### Errors

| Status | Meaning                              |
| ------ | ------------------------------------ |
| `400`  | Malformed request or input too large |
| `401`  | Invalid API key                      |
| `503`  | Model not loaded                     |
| `504`  | Request timed out                    |

## SDK Reference

**`CompactInput`**

```typescript theme={null}
{
  input?: string | Array<{ role: string, content: string }>,
  messages?: Array<{ role: string, content: string }>,
  query?: string,
  compressionRatio?: number,    // 0.05-1.0, default 0.5
  preserveRecent?: number,      // default 2
  includeLineRanges?: boolean,  // default true
  includeMarkers?: boolean,     // default true
  model?: string,
}
```

**`CompactResult`**

```typescript theme={null}
{
  id: string,
  output: string,              // all messages joined
  messages: Array<{
    role: string,
    content: string,
    compacted_line_ranges: Array<{ start: number, end: number }>,
    kept_line_ranges: Array<{ start: number, end: number }>,  // force-preserved via <keepContext>
  }>,
  usage: { input_tokens, output_tokens, compression_ratio, processing_time_ms },
  model: string,
}
```

**`CompactConfig`**

```typescript theme={null}
{
  morphApiKey?: string,     // defaults to MORPH_API_KEY env
  morphApiUrl?: string,
  timeout?: number,         // defaults to 120000 (2 min)
  retryConfig?: RetryConfig,
  debug?: boolean,
}
```

### Edge / Cloudflare Workers

```typescript theme={null}
import { CompactClient } from '@morphllm/morphsdk/edge';

export default {
  async fetch(request: Request, env: Env) {
    const compact = new CompactClient({ morphApiKey: env.MORPH_API_KEY });
    const { input, query } = await request.json();

    const result = await compact.compact({ input, query });
    return Response.json({ output: result.output, usage: result.usage });
  }
};
```

## Best Practices

<CardGroup cols={2}>
  <Card title="Keep recent messages verbatim" icon="shield">
    Set `preserve_recent` to at least **3**. Recent turns contain the user's active intent and the assistant's latest reasoning. Compacting them risks dropping context the LLM needs right now.
  </Card>

  <Card title="Always pass a query" icon="bullseye">
    Without it, the model falls back to auto-detection from the last user message. An explicit query gives tighter, more relevant compression because the model knows exactly which lines to score.
  </Card>

  <Card title="Compact before the LLM call" icon="arrow-right">
    The value is in reducing what you **send** to your LLM. Compacting a response after generation saves storage but doesn't cut inference cost.
  </Card>

  <Card title="Tune compression_ratio" icon="sliders">
    Default `0.5` is a good starting point. Agent loops past 100 turns: try `0.3`. Shorter conversations where nuance matters: try `0.7`.
  </Card>
</CardGroup>

```typescript theme={null}
const result = await morph.compact({
  input: chatHistory,
  query: "rate limiting",
  preserveRecent: 3,  // last 3 messages pass through untouched
});
```

## FAQ

<AccordionGroup>
  <Accordion title="How is this different from summarization?">
    Summarization rewrites. Compaction deletes. Every sentence that survives compaction is character-for-character identical to your original. No drift, no hallucinated context.
  </Accordion>

  <Accordion title="When should I compact?">
    Agent loops past 50 turns. Multi-session chat where you carry history forward. Any pipeline where you pay for tokens that are greetings and filler.
  </Accordion>

  <Accordion title="String input or conversation array?">
    Both work. A conversation array preserves role structure through compaction.
  </Accordion>

  <Accordion title="Can I use the standard OpenAI SDK?">
    Yes. Point `baseURL` at `https://api.morphllm.com/v1` and call `client.chat.completions.create()` with `model: "morph-compactor"`. The Responses API (`client.responses.create()`) also works.
  </Accordion>

  <Accordion title="How fast?">
    33,000 tok/s. 100K tokens compresses in under 2 seconds. 180K tokens in about 3 seconds.
  </Accordion>
</AccordionGroup>

## Notes for AI Coding Agents

<AccordionGroup>
  <Accordion title="Integration guide for LLMs and coding agents">
    If you are an AI coding agent integrating Morph Compact into a project, follow these steps:

    **1. Understand the existing compaction logic.** Before writing any code, read the project's current context management. This might be in pre-commit hooks, middleware, a context window manager, or a custom prompt builder. Search for keywords like `compact`, `compress`, `truncate`, `summarize`, or `context` in the codebase.

    **2. Choose the right integration method.**

    * If the project uses **TypeScript/JavaScript**, use the `@morphllm/morphsdk` package — it handles retries, types, and edge runtime support out of the box.
    * If the project uses **Python with OpenAI SDK**, point the client at `https://api.morphllm.com/v1` and call `chat.completions.create()` with `model: "morph-compactor"`.
    * For **any other language**, call `POST /v1/compact` directly with a JSON body containing `input` (or `messages`) and optionally `query`.

    **3. Follow best practices.**

    * Always pass `query` — set it to the user's current question or task description for much better compression.
    * Set `preserve_recent` to at least 3 to keep the latest turns intact.
    * Compact **before** sending to the LLM, not after — the goal is to reduce inference cost.
    * Use `compression_ratio: 0.3` for long agent loops (100+ turns), `0.7` for shorter conversations.
    * Wrap critical context in `<keepContext>` tags if certain sections must never be removed.
  </Accordion>
</AccordionGroup>
