Skip to main content
Your agent runs thousands of turns a day. The ones worth reading — the jailbreak attempt, the loop it never broke out of, the user who gave up after three bad answers — are a handful of rows buried in logs you’ll never scroll through. LLM-as-a-judge over 100% of traffic is too slow and too expensive to leave running. The fix is two pieces you already have: tracing ships each turn to Morph as a span, and Reflexes put a label on every turn in ~90ms. Wire them together and every turn gets classified automatically, async, adding nothing to your latency. This guide takes you from an uninstrumented app to labeled traces you can alert on and mine for training data.

Open the Traces dashboard

Where labeled turns land. Browse conversations, run Reflexes by hand, and export the raw turns.

How the pieces fit

PieceWhat it doesWhere it lives
Tracing SDKOne morph_tracing() call instruments OpenAI / Anthropic / LangChain and exports spans to Morph.Your app, at startup.
begin() / finish()Wraps a turn so it gets a stable event_id — the join key labels attach to.Around each turn you want classified.
evalsNames which Reflexes run on which role (user / assistant). Morph classifies after the span lands.On begin(), or as a default.
Read-backThe dashboard and GET /v1/reflex/traces return each turn with its reflex_results.Dashboard + API.
The rule that drives everything below: a turn is only classified if you wrap it in begin(). Auto-instrumented LLM calls outside an interaction still get traced, but they have no event_id, so no label can attach.

Wire it up

1

Instrument your app

Install the SDK with the OpenTelemetry extra and initialize once at startup. After this, calls to instrumented SDKs are traced with no further changes.
# pip install 'morphsdk[otel]'
from morphsdk.tracing import morph_tracing

morph = morph_tracing({"api_key": "sk-..."})  # or set MORPH_API_KEY
2

Wrap the turns you want classified

Open an interaction with begin(), set the input, nest any tool calls, and close it with finish(). The event_id it mints is what every label links back to.
turn = morph.begin({"user_id": "u1", "convo_id": "c1", "event": "chat"})
turn.set_input(user_message)
answer = turn.with_tool({"name": "get_weather"}, lambda: get_weather("SF"))
turn.finish({"output": answer})
Keep convo_id stable across a conversation so the dashboard threads turns together, and user_id consistent so you can slice labels by user later.
3

Turn on automatic classification

Pass evals on the turn. You choose which role each Reflex reads: user for the incoming message, assistant for the agent’s output. Morph classifies each role after the span lands, off your request path.
turn = morph.begin({
    "user_id": "u1",
    "convo_id": "c1",
    "event": "chat",
    "evals": {
        "user": ["jailbreak", "guardrail", "user-frustrated"],
        "assistant": ["leaked-thinking"],
    },
})
Set a default for every turn by passing evals to morph_tracing / morphTracing; a per-begin value overrides it. Omit both and nothing runs.
4

Read the labels back

Open the Traces dashboard to browse turns with their labels, or pull them with the API. A freshly-traced turn shows “Classifying…” for a moment, then carries its reflex_results.
Python
import requests

res = requests.get(
    "https://api.morphllm.com/v1/reflex/traces",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    params={"convo_id": "c1", "limit": 100},
).json()

for turn in res["data"]:
    fired = [r["model"] for r in turn["reflex_results"] if r["selected"]]
    if fired:
        print(turn["event_id"], turn["input_text"][:60], "→", fired)
GET /v1/reflex/traces returns LLM turns that carry text, newest first, each with the labels attached whether they ran from the SDK or by hand in the dashboard. Filter to one conversation with convo_id, page with limit / offset. See the field reference.

Which Reflex on which role

Most safety and intent classifiers read the user’s message; response-quality ones read the agent’s output. A sensible starting set for a chat or coding agent:
ReflexRoleCatches
jailbreakuserPrompt-injection and jailbreak attempts before they shape the response.
guardrailuserHarassment or NSFW content in the incoming message.
user-frustrateduserThe user losing patience — the signal that your agent is failing in a way tests won’t show.
incomplete-thoughtuserTruncated or underspecified prompts, so you can tell “bad answer” from “bad question.”
leaked-thinkingassistantThe agent spilling internal reasoning or system instructions into its reply.
stuck-in-a-loopassistantThe agent repeating itself instead of trying something new.
Start with two or three that map to a failure you actually care about, watch the dashboard for a day, then add more. Custom Reflexes you’ve trained drop into the same evals arrays by name.

Worked example: a chat agent

End to end — instrument once, then label jailbreaks and frustration on the user side and leaked thinking on the assistant side for every turn.
from morphsdk.tracing import morph_tracing

# Default evals apply to every begin() turn.
morph = morph_tracing({
    "api_key": "sk-...",
    "evals": {
        "user": ["jailbreak", "user-frustrated"],
        "assistant": ["leaked-thinking"],
    },
})

def handle_message(user_id, convo_id, user_message):
    turn = morph.begin({"user_id": user_id, "convo_id": convo_id, "event": "chat"})
    turn.set_input(user_message)
    answer = run_agent(user_message)  # your instrumented OpenAI/Anthropic call
    turn.finish({"output": answer})
    return answer
Nothing in the request path changed: no extra round-trip, no blocking on a judge. The labels appear on the trace shortly after each turn lands.

What to do with the labels

A label is only worth collecting if you act on it.
  • Alert. Poll GET /v1/reflex/traces (or wire the dashboard) and page on-call when jailbreak or guardrail fires, or when user-frustrated crosses a rate you set for a conversation.
  • Build training sets. Filter traces by label to pull the exact turns you want — every stuck-in-a-loop turn, every frustrated exchange — and feed them into evals or fine-tuning. The list endpoint returns input_text and output_text directly.
  • Track trends. Watch a label’s rate over time to know whether a prompt change actually reduced frustration or just moved it.
Classification rides the trace export, so it adds nothing to your latency, and it’s billed per event ($0.001/event realtime, stepping down past 1M/month). You pay only for the turns you put in evals, not every traced span. See Reflex pricing.

Backfill traces you already have

Turning on evals only labels turns going forward. To classify a backlog — every conversation from last month, scanned for jailbreaks and loops — run it from the Traces dashboard: select conversations, pick the Reflexes, and the labels land back on each trace. Under the hood that’s an asynchronous batch over the text of each turn, billed at the discounted batch rate. No code required.

Next steps

Tracing reference

Every config field, direct OTLP ingest, and the /v1/reflex/traces schema.

Reflexes overview

The nine default classifiers, response shape, and realtime /predict.

Train a Custom Reflex

When the defaults don’t match your failure modes, train one in ~30s and drop it into evals.

Batch classification

Label a backlog of up to 10,000 rows offline at the discounted rate.