> ## Documentation Index > Fetch the complete documentation index at: https://docs.morphllm.com/llms.txt > Use this file to discover all available pages before exploring further. # Tab Next Action Prediction API > Tab Next Action Prediction API endpoints ## Base URL ``` http://192.222.50.238:8080 ``` faster proxy endpoint: (in progress) ``` http://192.222.50.238:9000 ``` *** ## Health Check Check server status and cache performance. ```http theme={null} GET /health ``` ```bash cURL theme={null} curl http://192.222.50.238:8080/health ``` ```python Python theme={null} import requests response = requests.get("http://192.222.50.238:8080/health") print(response.json()) ``` ```javascript JavaScript theme={null} const response = await fetch('http://192.222.50.238:8080/health'); const data = await response.json(); ``` ### Response ```json theme={null} { "status": "healthy", "server_role": "standalone", "model": "morph-test", "gpu_available": true, "cache_enabled": true, "cache_stats": { "enabled": true, "hit_rate": 0.92, "num_cached_tokens": 15420 }, "uptime_seconds": 3847.2 } ``` Service status: `healthy` or `degraded` Server role: `standalone`, `prefiller`, or `decoder` Model name being served Whether GPU is available and initialized Whether prefix caching is enabled Cache performance statistics (if caching enabled) Cache status Cache hit rate (0.0 - 1.0) Number of tokens currently cached Server uptime in seconds *** ## Generate Prediction Generate next action prediction from a prompt. ```http theme={null} POST /v1/predict ``` ```bash cURL theme={null} curl -X POST http://192.222.50.238:8080/v1/predict \ -H "Content-Type: application/json" \ -d '{ "prompt": "{\"type\":3,\"data\":{\"source\":2,\"type\":6,\"id\":42,\"x\":385,\"y\":127}}\n{\"type\":3,\"data\":{\"source\":2,\"type\":2,\"id\":42,\"x\":385,\"y\":127,\"pointerType\":0}}\n{\"type\":3,\"data\":{\"source\":2,\"type\":1,\"id\":56}}\n{\"type\":3,\"data\":{\"source\":5,\"text\":\"user@example.com\",\"isChecked\":false,\"id\":56}}", "max_tokens": 50, "temperature": 0.3 }' ``` ```python Python theme={null} import requests # rrweb events as prompt rrweb_events = """{"type":3,"data":{"source":2,"type":6,"id":42,"x":385,"y":127}} {"type":3,"data":{"source":2,"type":2,"id":42,"x":385,"y":127,"pointerType":0}} {"type":3,"data":{"source":2,"type":1,"id":56}} {"type":3,"data":{"source":5,"text":"user@example.com","isChecked":false,"id":56}}""" response = requests.post( "http://192.222.50.238:8080/v1/predict", json={ "prompt": rrweb_events, "max_tokens": 50, "temperature": 0.3 } ) print(response.json()) ``` ```javascript JavaScript theme={null} // rrweb events as prompt const rrwebEvents = `{"type":3,"data":{"source":2,"type":6,"id":42,"x":385,"y":127}} {"type":3,"data":{"source":2,"type":2,"id":42,"x":385,"y":127,"pointerType":0}} {"type":3,"data":{"source":2,"type":1,"id":56}} {"type":3,"data":{"source":5,"text":"user@example.com","isChecked":false,"id":56}}`; const response = await fetch('http://192.222.50.238:8080/v1/predict', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt: rrwebEvents, max_tokens: 50, temperature: 0.3 }) }); const data = await response.json(); ``` ```python Python (batch events) theme={null} import requests # Send batch of rrweb events rrweb_batch = [ {"type": 3, "data": {"source": 2, "type": 6, "id": 42, "x": 385, "y": 127}}, {"type": 3, "data": {"source": 2, "type": 2, "id": 42, "x": 385, "y": 127, "pointerType": 0}}, {"type": 3, "data": {"source": 2, "type": 1, "id": 56}}, {"type": 3, "data": {"source": 5, "text": "user@example.com", "isChecked": False, "id": 56}} ] # Convert to newline-delimited JSON string prompt = "\n".join([str(event) for event in rrweb_batch]) response = requests.post( "http://192.222.50.238:8080/v1/predict", json={ "prompt": prompt, "max_tokens": 50, "temperature": 0.3 } ) ``` ### Request Body rrweb event data as newline-delimited JSON. Each line should be a valid rrweb event object Maximum number of tokens to generate (range: 1-512) Sampling temperature (range: 0.0-2.0). Lower values produce more deterministic outputs Enable streaming response (currently not implemented) ### Response ```json theme={null} { "text": "{\"type\":3,\"data\":{\"source\":2,\"type\":1,\"id\":67}}\n{\"type\":3,\"data\":{\"source\":5,\"text\":\"password123\",\"isChecked\":false,\"id\":67}}", "latency_ms": 287, "tokens_generated": 42 } ``` Generated rrweb event predictions as newline-delimited JSON Request processing latency in milliseconds Number of tokens generated in the response *** ## Error Responses All errors return JSON with a standard format: ```json theme={null} { "detail": "Error message describing what went wrong" } ``` ### Status Codes Request completed successfully Invalid request parameters (e.g., temperature out of range) Model not ready or server not initialized Unexpected server error during prediction *** ## Performance Tips **Optimize Cache Hits**: Send rrweb events in consistent session sequences to maximize prefix cache reuse. Events from the same session with consistent ordering will achieve higher cache hit rates and lower latency. **Typical Latency**: * Single-node: \~800ms (P50), \~1.5s (P99) * Disaggregated: \~250ms (P50), \~450ms (P99) (in progress) * Cache hit rate of 90%+ dramatically reduces latency for similar event sequences ## rrweb Event Format The API expects rrweb events as newline-delimited JSON strings. Common event types: * **Type 2 (Meta)**: Page metadata and viewport info * **Type 3 (Incremental)**: User interactions (clicks, input, scroll, etc.) * `source: 2` = MouseInteraction * `source: 5` = Input * `source: 3` = MouseMove * **Type 4 (IncrementalSnapshot)**: DOM mutations Example event structure: ```json theme={null} { "type": 3, "data": { "source": 2, "type": 2, "id": 42, "x": 385, "y": 127, "pointerType": 0 } } ```