> ## Documentation Index
> Fetch the complete documentation index at: https://docs.morphllm.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Tab Next Action Prediction API

> Tab Next Action Prediction API endpoints

## Base URL

```
http://192.222.50.238:8080
```

faster proxy endpoint: (in progress)

```
http://192.222.50.238:9000
```

***

## Health Check

Check server status and cache performance.

```http theme={null}
GET /health
```

<CodeGroup>
  ```bash cURL theme={null}
  curl http://192.222.50.238:8080/health
  ```

  ```python Python theme={null}
  import requests

  response = requests.get("http://192.222.50.238:8080/health")
  print(response.json())
  ```

  ```javascript JavaScript theme={null}
  const response = await fetch('http://192.222.50.238:8080/health');
  const data = await response.json();
  ```
</CodeGroup>

### Response

```json theme={null}
{
  "status": "healthy",
  "server_role": "standalone",
  "model": "morph-test",
  "gpu_available": true,
  "cache_enabled": true,
  "cache_stats": {
    "enabled": true,
    "hit_rate": 0.92,
    "num_cached_tokens": 15420
  },
  "uptime_seconds": 3847.2
}
```

<ResponseField name="status" type="string">
  Service status: `healthy` or `degraded`
</ResponseField>

<ResponseField name="server_role" type="string">
  Server role: `standalone`, `prefiller`, or `decoder`
</ResponseField>

<ResponseField name="model" type="string">
  Model name being served
</ResponseField>

<ResponseField name="gpu_available" type="boolean">
  Whether GPU is available and initialized
</ResponseField>

<ResponseField name="cache_enabled" type="boolean">
  Whether prefix caching is enabled
</ResponseField>

<ResponseField name="cache_stats" type="object">
  Cache performance statistics (if caching enabled)

  <Expandable title="properties">
    <ResponseField name="enabled" type="boolean">
      Cache status
    </ResponseField>

    <ResponseField name="hit_rate" type="float">
      Cache hit rate (0.0 - 1.0)
    </ResponseField>

    <ResponseField name="num_cached_tokens" type="integer">
      Number of tokens currently cached
    </ResponseField>
  </Expandable>
</ResponseField>

<ResponseField name="uptime_seconds" type="float">
  Server uptime in seconds
</ResponseField>

***

## Generate Prediction

Generate next action prediction from a prompt.

```http theme={null}
POST /v1/predict
```

<CodeGroup>
  ```bash cURL theme={null}
  curl -X POST http://192.222.50.238:8080/v1/predict \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "{\"type\":3,\"data\":{\"source\":2,\"type\":6,\"id\":42,\"x\":385,\"y\":127}}\n{\"type\":3,\"data\":{\"source\":2,\"type\":2,\"id\":42,\"x\":385,\"y\":127,\"pointerType\":0}}\n{\"type\":3,\"data\":{\"source\":2,\"type\":1,\"id\":56}}\n{\"type\":3,\"data\":{\"source\":5,\"text\":\"user@example.com\",\"isChecked\":false,\"id\":56}}",
      "max_tokens": 50,
      "temperature": 0.3
    }'
  ```

  ```python Python theme={null}
  import requests

  # rrweb events as prompt
  rrweb_events = """{"type":3,"data":{"source":2,"type":6,"id":42,"x":385,"y":127}}
  {"type":3,"data":{"source":2,"type":2,"id":42,"x":385,"y":127,"pointerType":0}}
  {"type":3,"data":{"source":2,"type":1,"id":56}}
  {"type":3,"data":{"source":5,"text":"user@example.com","isChecked":false,"id":56}}"""

  response = requests.post(
      "http://192.222.50.238:8080/v1/predict",
      json={
          "prompt": rrweb_events,
          "max_tokens": 50,
          "temperature": 0.3
      }
  )
  print(response.json())
  ```

  ```javascript JavaScript theme={null}
  // rrweb events as prompt
  const rrwebEvents = `{"type":3,"data":{"source":2,"type":6,"id":42,"x":385,"y":127}}
  {"type":3,"data":{"source":2,"type":2,"id":42,"x":385,"y":127,"pointerType":0}}
  {"type":3,"data":{"source":2,"type":1,"id":56}}
  {"type":3,"data":{"source":5,"text":"user@example.com","isChecked":false,"id":56}}`;

  const response = await fetch('http://192.222.50.238:8080/v1/predict', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      prompt: rrwebEvents,
      max_tokens: 50,
      temperature: 0.3
    })
  });
  const data = await response.json();
  ```

  ```python Python (batch events) theme={null}
  import requests

  # Send batch of rrweb events
  rrweb_batch = [
      {"type": 3, "data": {"source": 2, "type": 6, "id": 42, "x": 385, "y": 127}},
      {"type": 3, "data": {"source": 2, "type": 2, "id": 42, "x": 385, "y": 127, "pointerType": 0}},
      {"type": 3, "data": {"source": 2, "type": 1, "id": 56}},
      {"type": 3, "data": {"source": 5, "text": "user@example.com", "isChecked": False, "id": 56}}
  ]

  # Convert to newline-delimited JSON string
  prompt = "\n".join([str(event) for event in rrweb_batch])

  response = requests.post(
      "http://192.222.50.238:8080/v1/predict",
      json={
          "prompt": prompt,
          "max_tokens": 50,
          "temperature": 0.3
      }
  )
  ```
</CodeGroup>

### Request Body

<ParamField body="prompt" type="string" required>
  rrweb event data as newline-delimited JSON. Each line should be a valid rrweb event object
</ParamField>

<ParamField body="max_tokens" type="integer" default={50}>
  Maximum number of tokens to generate (range: 1-512)
</ParamField>

<ParamField body="temperature" type="float" default={0.3}>
  Sampling temperature (range: 0.0-2.0). Lower values produce more deterministic outputs
</ParamField>

<ParamField body="stream" type="boolean" default={false}>
  Enable streaming response (currently not implemented)
</ParamField>

### Response

```json theme={null}
{
  "text": "{\"type\":3,\"data\":{\"source\":2,\"type\":1,\"id\":67}}\n{\"type\":3,\"data\":{\"source\":5,\"text\":\"password123\",\"isChecked\":false,\"id\":67}}",
  "latency_ms": 287,
  "tokens_generated": 42
}
```

<ResponseField name="text" type="string">
  Generated rrweb event predictions as newline-delimited JSON
</ResponseField>

<ResponseField name="latency_ms" type="integer">
  Request processing latency in milliseconds
</ResponseField>

<ResponseField name="tokens_generated" type="integer">
  Number of tokens generated in the response
</ResponseField>

***

## Error Responses

All errors return JSON with a standard format:

```json theme={null}
{
  "detail": "Error message describing what went wrong"
}
```

### Status Codes

<ResponseField name="200" type="Success">
  Request completed successfully
</ResponseField>

<ResponseField name="400" type="Bad Request">
  Invalid request parameters (e.g., temperature out of range)
</ResponseField>

<ResponseField name="503" type="Service Unavailable">
  Model not ready or server not initialized
</ResponseField>

<ResponseField name="500" type="Internal Server Error">
  Unexpected server error during prediction
</ResponseField>

***

## Performance Tips

<Tip>
  **Optimize Cache Hits**: Send rrweb events in consistent session sequences to maximize prefix cache reuse. Events from the same session with consistent ordering will achieve higher cache hit rates and lower latency.
</Tip>

<Note>
  **Typical Latency**:

  * Single-node: \~800ms (P50), \~1.5s (P99)
  * Disaggregated: \~250ms (P50), \~450ms (P99) (in progress)
  * Cache hit rate of 90%+ dramatically reduces latency for similar event sequences
</Note>

## rrweb Event Format

The API expects rrweb events as newline-delimited JSON strings. Common event types:

* **Type 2 (Meta)**: Page metadata and viewport info
* **Type 3 (Incremental)**: User interactions (clicks, input, scroll, etc.)
  * `source: 2` = MouseInteraction
  * `source: 5` = Input
  * `source: 3` = MouseMove
* **Type 4 (IncrementalSnapshot)**: DOM mutations

Example event structure:

```json theme={null}
{
  "type": 3,
  "data": {
    "source": 2,
    "type": 2,
    "id": 42,
    "x": 385,
    "y": 127,
    "pointerType": 0
  }
}
```
