Skip to main content

Reflex Trace Pass Plan

One Sentence

Build Reflexes into Traces as a semantic batch pass: users choose signals, choose or inherit trace transforms, run a batch over historical traffic, and get labeled conversations back in the traces UI.

Why This Exists

The current traces UI answers:
  • What requests happened?
  • Which conversations errored?
  • How many tokens and models were used?
  • What was the raw input and output?
Reflexes answer a different question:
  • What did the trace mean?
  • Was the user frustrated?
  • Was the agent looping?
  • Did the assistant leak private reasoning?
  • Did a custom product-specific failure happen?
The product should not feel like a second observability dashboard. It should feel like grep for semantic events over agent traces.

Product Thesis

Do not expose Reflexes as simple “model checkboxes”. A Reflex result depends on the exact input shape used at inference time. Some models want a single user message. Some want assistant output only. Some want a rolling transcript window. Some custom signals may want user messages plus tool calls, tool errors, or every other user turn. So the core abstraction is:
Signal = Reflex model + trace transform + threshold + result placement
The UI should make this visible enough that users trust the output, without forcing every user into transform configuration.

Terminology

Reflex

A classifier model that returns labels and scores for text. Examples:
  • jailbreak
  • guardrail
  • leaked-thinking
  • stuck-in-a-loop
  • user-frustration
  • custom trained models

Signal

A configured Reflex ready to run on traces. Signals include:
  • model alias
  • display name
  • default transform
  • threshold overrides
  • enabled state for a scan or monitor
  • ownership scope

Transform

A deterministic projection from trace data into one or more text inputs for Reflex inference. Examples:
  • user message only
  • assistant message only
  • rolling 8-turn transcript
  • user plus previous assistant
  • user plus tool calls
  • tool failure window

Reflex Pass

One historical batch run over a set of traces using selected signals and transforms.

Reflex Monitor

A future continuous mode that applies selected signals to newly ingested traces.

Current Repo Context

Relevant landing repo surfaces:
  • src/app/dashboard/traces/page.tsx
  • src/app/dashboard/traces/TracesListClient.tsx
  • src/app/dashboard/traces/[convoId]/page.tsx
  • src/app/dashboard/traces/[convoId]/ConversationView.tsx
  • src/app/dashboard/traces/actions.ts
  • src/app/dashboard/reflex/page.tsx
  • src/lib/reflex-directory.ts
  • src/app/products/reflex/page.tsx
Current traces data source:
  • ClickHouse view morph.ai_spans
  • Helper: src/lib/clickhouse.ts
  • Scope helper: traceScopeWhere(scope)
Current trace viewer shape:
  • list conversations
  • filter by search, end-user id, errors only
  • detail page groups spans into turns
  • shows input, output, spans, finish reason, latency, tokens, models
Current Reflex dashboard shape:
  • square, bordered UI
  • default Reflex directory
  • custom Reflex projects
  • per-model playground and API copy
  • custom model metadata is already user/org scoped

Batch API Context

From morphllm/tab#141:

Upload

POST /v1/reflex/asynchronous_batches/upload
Request:
{
  "requests": [
    {
      "id": "row-1",
      "model": ["guardrail", "jailbreak"],
      "text": "text to classify"
    }
  ]
}
Important constraints:
  • requests is required.
  • max 10,000 requests per upload.
  • each id must be unique.
  • model is an array of one or more model names.
  • text is required.
  • max text length is 350,000 chars.
  • upload returns quickly after queueing rows.
  • async batch rows are durable in Postgres in the Reflex service.
Response:
{
  "id": "rbatch-...",
  "object": "reflex.batch",
  "status": "queued",
  "request_counts": {
    "total": 3,
    "completed": 0,
    "failed": 0
  },
  "created_at": 1780000000
}

Poll

GET /v1/reflex/asynchronous_batches/{batch_id}
Returns same batch status shape.

Results

GET /v1/reflex/asynchronous_batches/{batch_id}/results
Response:
{
  "id": "rbatch-...",
  "object": "reflex.batch.results",
  "status": "completed",
  "request_counts": {
    "total": 3,
    "completed": 2,
    "failed": 1
  },
  "results": [
    {
      "id": "row-1",
      "status": "completed",
      "predictions": [
        {
          "model": "guardrail",
          "mode": "single_label",
          "classes": [
            { "label": "clean", "score": 0.8, "selected": true }
          ]
        }
      ]
    },
    {
      "id": "row-2",
      "status": "pending"
    },
    {
      "id": "row-3",
      "status": "failed",
      "error": {
        "type": "input_too_long",
        "message": "too many tokens"
      }
    }
  ]
}

Core UX

Primary Action

Add selection controls to /dashboard/traces:
[ ] Select visible        0 selected        [ Run selected ]
Users can select individual conversations or every visible conversation, then run a Reflex pass on that explicit set. If any selected conversation already has Reflex results, show an in-app dialog:
Rerun existing Reflex results?

3 of 9 selected conversations already have labels.

[Cancel] [Run only new] [Rerun all]
The run action opens a right-side panel or full-height drawer.

Reflex Pass Panel

Suggested structure:
Run Reflex Pass

Scope
[ Current filters ] [ This conversation ] [ Last 24h ] [ Custom range ]

Built-in signals
[x] Jailbreak             User chat
[x] Guardrail             User chat
[x] User Frustration      User chat
[ ] Incomplete Thought    User chat
[ ] Difficulty            User chat
[ ] Ambiguity             User chat
[ ] Domain                User chat
[x] Leaked Thinking       Agent chat
[x] Stuck-in-a-loop       Agent chat

Custom signals
[ ] Bad handoff           Full trace
[ ] Escalation needed     User chat
[ ] Wrong policy          Agent chat

Estimate
9,412 trace inputs
16,423 predictions
Batch mode
Estimated cost: $0.49

[ Run pass ]

Why Group By Transform Family

The batch API lets one row include many models:
{
  "id": "user:event_123",
  "model": ["jailbreak", "guardrail", "user-frustration"],
  "text": "user text"
}
But only models that share a compatible input transform should be batched together on the same row. Do:
one user-chat row -> user-chat models
one agent-chat row -> agent-chat models
one full-trace row -> full-trace models
Do not:
one raw trace row -> every selected model
That creates train/inference mismatch and bad labels.

Default Transform Map

Initial defaults:
jailbreak             user_message
guardrail             user_message
incomplete-thought    user_message
difficulty            user_message
ambiguity             user_message
domain                user_message
user-frustration      user_message
leaked-thinking       assistant_message
stuck-in-a-loop       assistant_message
Notes:
  • user-frustration defaults to user chat for v1 so it sees the same data shape as the user-state classifier was trained on.
  • stuck-in-a-loop defaults to agent chat because it is primarily about repeated agent output or action descriptions.
  • leaked-thinking should run on assistant-visible output, not tool output or hidden spans.
  • custom signals should carry their own default transform.

Transform Catalog

user_message

Product label: User chat. One row per user-authored chat message. Input:
{turn.inputText}
Best for:
  • jailbreak
  • guardrail
  • incomplete thought
  • difficulty
  • ambiguity
  • domain

assistant_message

Product label: Agent chat. One row per agent-authored chat message. Input:
{turn.outputText}
Best for:
  • leaked thinking
  • stuck in a loop
  • bad response style
  • unsafe assistant output

full_conversation

Product label: Full trace. One row per complete trace or conversation. Input:
Full conversation:
...
Best for:
  • coarse conversation quality
  • support outcome
  • final resolution status
Warning:
  • can be expensive
  • may exceed context window
  • not appropriate for per-turn labeling

Transform Definition Shape

Represent transforms as data, not scattered switch statements. Proposed TypeScript type:
type ReflexTraceTransformKind =
  | 'user_message'
  | 'assistant_message'
  | 'full_conversation';

type ReflexTraceTransform = {
  id: string;
  kind: ReflexTraceTransformKind;
  name: string;
  description: string;
  inputKind:
    | 'user_chat'
    | 'agent_chat'
    | 'full_trace';
  format: 'plain_transcript' | 'json_messages' | 'compact_trace';
  maxTokens?: number;
  includeUserMessages?: boolean;
  includeAssistantMessages?: boolean;
  includeToolCalls?: boolean;
  includeToolResults?: boolean;
};
Proposed signal config:
type TraceSignalConfig = {
  signalId: string;
  modelAlias: string;
  displayName: string;
  source: 'default' | 'custom';
  category: string;
  labels: string[];
  defaultTransformId: string;
  thresholdOverrides?: Record<string, number>;
  enabledByDefault?: boolean;
};

Custom Signals

Custom Reflexes must include transform metadata. When creating or editing a custom Reflex, ask:
What should this signal inspect?
Options:
  • User chat
  • Agent chat
  • Full trace
Store this on the custom Reflex record. Suggested fields in landing Postgres reflex_chats or a related metadata table:
trace_default_transform_id
trace_transform_config_json
trace_transform_version
trace_result_threshold_json
If modifying reflex_chats is too broad, add a landing-owned table:
reflex_trace_signal_configs
- id
- user_id
- org_id
- reflex_chat_id
- model_alias
- display_name
- transform_id
- transform_config
- threshold_config
- created_at
- updated_at
Default Reflexes can be represented in code first, then moved to DB if needed.

Batch Row IDs

Batch request IDs should encode enough information to route results back to ClickHouse rows. Suggested format:
{inputKind}:{convoId}:{eventId}:{spanId?}:{transformId}:{ordinal}
Examples:
user_turn:convo_123:event_5::user_message:0
assistant_turn:convo_123:event_5::assistant_message:0
full_trace:convo_123:::full_conversation:0
For safer parsing, also keep a run-local mapping table in Postgres:
reflex_trace_pass_items
- id
- run_id
- request_id
- convo_id
- event_id
- span_id
- input_kind
- transform_id
- source_start_time
- selected_models
This avoids relying on string parsing if IDs contain unexpected characters.

ClickHouse Storage

Create a new ClickHouse table for Reflex outputs. Do not reuse morph.signals. That table is for explicit feedback and edit signals. Reflex outputs are derived semantic labels. Proposed table:
CREATE TABLE IF NOT EXISTS morph.trace_reflex_results
(
    account_user_id String,
    account_org_id String DEFAULT '',

    run_id String,
    batch_id String,
    request_id String,

    convo_id String,
    event_id String,
    span_id String DEFAULT '',

    input_kind LowCardinality(String),
    transform_id LowCardinality(String),
    transform_version UInt16 DEFAULT 1,

    reflex_model String,
    status LowCardinality(String),

    mode LowCardinality(String) DEFAULT '',
    top_label String DEFAULT '',
    top_score Float32 DEFAULT 0,
    selected_labels Array(String) DEFAULT [],
    classes_json String DEFAULT '',

    error_type String DEFAULT '',
    error_message String DEFAULT '',

    source_start_time DateTime64(3),
    created_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (account_org_id, account_user_id, convo_id, source_start_time, reflex_model, created_at)
TTL toDateTime(created_at) + INTERVAL 365 DAY;
Notes:
  • no raw input text
  • raw text stays in ai_spans
  • classes_json keeps full score details without over-normalizing v1
  • top_label, top_score, and selected_labels support fast dashboard filters
  • include source_start_time so results can be aligned to trace rows
Optional materialized rollup later:
morph.trace_reflex_conversation_rollups
- account_user_id
- account_org_id
- convo_id
- last_result_at
- fired_count
- top_labels
- max_scores_by_model
But start with query-time aggregation unless performance requires it.

Landing Postgres State

Use Postgres for run lifecycle and UI state. Proposed tables:
reflex_trace_passes
- id uuid
- user_id text
- org_id text nullable
- batch_id text nullable
- status text
- scope_json jsonb
- selected_signals_json jsonb
- transform_plan_json jsonb
- total_inputs int
- total_predictions int
- completed_inputs int
- failed_inputs int
- error_message text nullable
- created_at timestamp
- started_at timestamp nullable
- completed_at timestamp nullable
reflex_trace_pass_items
- id uuid
- pass_id uuid
- request_id text
- convo_id text
- event_id text
- span_id text nullable
- input_kind text
- transform_id text
- source_start_time timestamp
- selected_models jsonb
Why Postgres plus ClickHouse:
  • Postgres tracks the app workflow.
  • ClickHouse stores queryable per-trace results next to spans.
  • The Reflex service already owns async batch internal state, but landing needs user-facing run state and mapping.

Server Flow

1. Build Scope

From the traces page filters:
type TracePassScope =
  | { type: 'current_filters'; filters: TraceFilters }
  | { type: 'conversation'; convoId: string }
  | { type: 'date_range'; from: string; to: string; filters?: TraceFilters }
  | { type: 'since_last_run'; filters?: TraceFilters };
Use the same resolveScope() and traceScopeWhere() patterns as src/app/dashboard/traces/actions.ts.

2. Load Candidate Trace Data

Query ai_spans, grouped into conversations and turns using the existing logic from getConversationDetail. For large historical scans, do not load unlimited rows into memory. Approach:
  • cap first version at 10,000 transformed inputs per batch
  • show “large scan will be split into N batches” later
  • build rows server-side
  • keep raw text only long enough to upload to Reflex batch API

3. Build Transform Plan

Group selected signals by transform. Example:
user_message:
  models: [jailbreak, guardrail, ambiguity]
  rows: one per user turn

assistant_message:
  models: [leaked-thinking, stuck-in-a-loop]
  rows: one per assistant turn

full_conversation:
  models: [custom-bad-handoff]
  rows: one per trace

4. Create Pass Record

Insert reflex_trace_passes. Insert reflex_trace_pass_items for each outgoing batch row.

5. Upload Batch

Call:
POST https://api.morphllm.com/v1/reflex/asynchronous_batches/upload
Use user or org API key according to the active dashboard context. Use Idempotency-Key:
trace-pass:{passId}
Store returned batch_id.

6. Poll

Poll batch status from a route/server action:
GET /v1/reflex/asynchronous_batches/{batch_id}
Update completed_inputs and failed_inputs.

7. Fetch Results

When complete, call:
GET /v1/reflex/asynchronous_batches/{batch_id}/results
Join results to reflex_trace_pass_items by request_id. Explode:
one batch result row with multiple predictions
-> many ClickHouse trace_reflex_results rows

8. Write Results To ClickHouse

Insert into morph.trace_reflex_results. Each prediction becomes one row. Pseudo:
for (const result of results) {
  const item = itemByRequestId[result.id];

  if (result.status === 'completed') {
    for (const prediction of result.predictions) {
      insertResult({
        runId,
        batchId,
        requestId: result.id,
        convoId: item.convoId,
        eventId: item.eventId,
        spanId: item.spanId ?? '',
        inputKind: item.inputKind,
        transformId: item.transformId,
        reflexModel: prediction.model,
        status: 'completed',
        mode: prediction.mode,
        topLabel: topClass(prediction)?.label ?? '',
        topScore: topClass(prediction)?.score ?? 0,
        selectedLabels: selectedLabels(prediction),
        classesJson: JSON.stringify(prediction.classes),
        sourceStartTime: item.sourceStartTime,
      });
    }
  } else if (result.status === 'failed') {
    for (const model of item.selectedModels) {
      insertResult({
        reflexModel: model,
        status: 'failed',
        errorType: result.error.type,
        errorMessage: result.error.message,
      });
    }
  }
}

Dashboard Query Changes

Conversation List

Current list query aggregates from ai_spans. Add optional aggregation from trace_reflex_results:
per convo:
- fired_count
- top selected labels
- max score by model
- latest pass timestamp
UI row:
convo_123
14 turns | 8.2k input | 2.1k output

[frustrated 97%] [looping 88%] [leaked-thinking clean]
Add filters:
All
Flagged
Frustrated
Looping
Jailbreak
Leaked Thinking
Custom signal...

Conversation Detail

For a convo detail page, query Reflex results:
WHERE convo_id = {convoId}
Join by:
  • event_id
  • span_id when present
  • source_start_time as fallback
Render:
  • inline chips on turn cards
  • a right rail with fired results
  • a summary band near the page title
Suggested right rail:
Reflexes
3 fired

Turn 4
user-frustration 0.97

Turn 7
stuck-in-a-loop 0.88

Turn 9
leaked-thinking 0.91
Suggested inline chip:
[frustrated 97%]
Use muted chips for non-fired or clean labels only when the user expands details. Do not clutter the main trace with every negative result.

UI States

Empty

No Reflex passes yet
Run a semantic pass over these traces to find frustration, loops, jailbreaks, and custom product failures.

[Select visible] [Run selected]

Estimating

Counting trace inputs...
No spinner if possible. Use skeleton rows or a small inline status.

Ready To Run

9,412 inputs
16,423 predictions
Batch mode
Estimated cost: $0.49

Running

rbatch-...
4,120 / 9,412 inputs complete
43.8%
Use request_counts from batch status.

Complete

Pass complete
317 conversations flagged
1,204 selected labels
CTA:
View flagged conversations

Partial Failure

Pass complete with failed rows
8,920 completed
492 failed
CTA:
View failed inputs
Retry failed

Cost Model

Docs say async batch is priced per event, where one classification is one event. If a batch row has multiple models, cost should count predictions, not rows:
total_predictions = sum(row.model.length)
Display both:
9,412 inputs
16,423 predictions
Estimate:
estimated_cost = total_predictions * batch_rate
If usage tier changes after 1M events/month, show:
Estimated before volume discounts
or use billing API if available.

Permissions

Use getDataContext() everywhere. Rules:
  • org context sees org traces and org custom Reflexes
  • personal context sees personal traces and personal custom Reflexes
  • org admins can run passes
  • members may view results if they can view traces
  • member ability to run passes should be a product decision
Conservative default:
  • admins can run Reflex passes
  • members can view existing results
Use requireOrgAdmin() for the initial run action if cost is meaningful. Do not create a separate top-level “Reflex Trace Dashboard” yet. Use:
  • /dashboard/traces for trace list and pass action
  • /dashboard/traces/[convoId] for result inspection
  • /dashboard/reflex for model creation and custom signal configuration
Possible future route:
/dashboard/traces/reflex-passes
Only add once pass history needs its own page.

Landing Page Updates

Home

Add Reflexes as the fourth specialized model in SpecializedModelsPanel. Suggested copy:
Reflexes
Semantic trace classifiers
<90ms
per event
Link:
/products/reflex

Product Reflex Page

Make the product flow explicit:
Trace traffic
-> choose signals
-> transform inputs
-> batch pass
-> flagged conversations
Current page already says the right high-level thing:
Catch what your traces can't see
But it should show the actual product surface:
  • trace list with “Run selected”
  • selectable trace rows
  • rerun confirmation for conversations with existing Reflex labels
  • transform-aware signal picker
  • labeled conversations

Implementation Phases

Phase 0.5: Live Preview Transport

Before the async batch queue is ready, the dashboard can run a capped live preview through:
POST /v1/reflex/predict
This should stay server-side:
  • fetch trace details by conversation id
  • build User chat, Agent chat, or Full trace inputs
  • call the public Reflex API with the signed-in user’s Morph API key
  • return the same pass/result shape the UI will later receive from batch results
Limits for this preview:
  • cap conversations
  • cap transformed inputs per transform
  • cap total predictions
  • do not persist raw trace text
The async batch transport should replace this implementation behind the same UI action once it supports upload, queue/poll, and result download.

Phase 0: Product Data Definitions

Add local definitions:
  • transform catalog
  • default Reflex to transform mapping
  • helper to group signals by transform
  • helper to estimate inputs and predictions
Files likely:
src/lib/reflex-traces/transforms.ts
src/lib/reflex-traces/signals.ts
src/lib/reflex-traces/estimate.ts
No UI yet.

Phase 1: ClickHouse Table

Add ClickHouse schema file:
morphcli/tracing/clickhouse/reflex-results.sql
Include:
  • morph.trace_reflex_results
  • comments documenting that raw text is not stored
  • account scope columns

Phase 2: Postgres Run Tables

Add Drizzle migration:
drizzle/00XX_add_reflex_trace_passes.sql
Add schema to src/lib/db/schema location used by this repo. Tables:
  • reflex_trace_passes
  • reflex_trace_pass_items

Phase 3: Trace Pass Server Actions

Add actions:
src/app/dashboard/traces/reflex-actions.ts
Actions:
  • estimateReflexTracePass(scope, selectedSignals)
  • createReflexTracePass(scope, selectedSignals)
  • getReflexTracePass(passId)
  • syncReflexTracePassResults(passId)
Make sure large scans are bounded in v1.

Phase 4: Run Panel UI

Add components:
src/app/dashboard/traces/ReflexPassPanel.tsx
src/app/dashboard/traces/ReflexSignalPicker.tsx
src/app/dashboard/traces/ReflexPassProgress.tsx
Keep Reflex-owned surfaces square:
  • no rounded cards
  • bordered panels
  • dense controls
  • muted explanatory copy

Phase 5: Result Chips In Trace List

Extend TraceConversation type with:
reflexSummary?: {
  firedCount: number;
  labels: Array<{
    model: string;
    label: string;
    score: number;
  }>;
};
Show max 3 chips per row.

Phase 6: Result Rail In Detail View

Extend ConversationDetail with Reflex results. Render:
  • summary near title
  • inline turn chips
  • right-side result rail on desktop
  • collapsible section on mobile

Phase 7: Custom Signal Transform Config

Add UI to custom Reflex detail page:
Trace input
[User chat v]
Advanced:
  • max tokens
  • include tool context for Full trace
This should be optional in v1. Default custom transform can be user_message until configured.

Phase 8: Monitor Mode

After historical batch pass works, add:
Enable monitor
Monitor mode applies selected signals to future traces. This likely needs ingest-time or cron processing and should not block the first batch UX.

Edge Cases

Zero Data Retention

If trace content was not retained:
  • transforms may produce no inputs
  • show “content not retained” in estimate
  • skip rows with empty text

Very Large Conversations

Rolling windows may exceed model context. Mitigations:
  • cap by max tokens
  • truncate oldest content first
  • keep the target turn
  • show “truncated” metadata in transform mapping if needed

Tool Output Size

Tool results can be huge. For tool transforms:
  • include tool name
  • include args
  • include status
  • include error message
  • truncate result body by default

Multiple Runs

A conversation may have results from multiple passes. Default display:
  • show latest completed pass results
Advanced filter:
  • all passes
  • specific pass

Re-running Same Pass

Use idempotency key per pass creation, not per arbitrary scope. A deliberate rerun should create a new pass.

Partial Batch Results

Results endpoint can return pending rows. Do not write pending rows to ClickHouse. Only write:
  • completed prediction rows
  • failed rows with errors

Failed Rows With Multiple Models

If an entire row fails, write one failed result per intended model so UI can show model-specific failure counts.

Per-Model Error Inside Completed Row

A completed row can contain prediction entries where one model has an error. Handle:
{
  "model": "some-model",
  "error": {
    "type": "inference_error",
    "message": "..."
  }
}
Write as failed result for that model.

Open Product Questions

  1. Can non-admin org members run paid batch passes, or only view results?
  2. Should default signals be enabled by default in the panel?
  3. Is user-frustration live now, or still coming soon in the directory?
  4. Should clean labels be stored and queryable, or only selected labels plus errors?
  5. Should the first version support multi-batch scans above 10,000 transformed inputs?
  6. Should transform config be exposed during custom Reflex training, or only after the model is ready?
  7. Should historical pass results be included in billing usage details separately from normal API usage?
Build the smallest coherent version:
  1. Add ClickHouse trace_reflex_results.
  2. Add Postgres pass and pass item tables.
  3. Add default transform catalog in code.
  4. Add selectable rows and /dashboard/traces “Run selected” panel.
  5. Support current filters and single conversation scope.
  6. Support default Reflexes and ready custom Reflexes.
  7. For custom Reflexes, default to user_message with visible transform override in the run panel.
  8. Upload one async batch.
  9. Poll progress.
  10. Write completed results to ClickHouse.
  11. Show result chips in trace list and detail page.
This gives users the core product loop:
Pick traces
Pick signals
Run pass
See flagged conversations
Open trace
Understand what happened
Add:
  • saved signal presets
  • transform config on custom Reflex detail pages
  • pass history page
  • multi-batch scans over 10,000 inputs
  • retry failed inputs
  • export labels
  • monitor mode for future traces
  • Slack/webhook alerts on selected labels

UX Copy Bank

Primary CTA:
Run selected
Panel subtitle:
Turn traces into classifier inputs, then run semantic signals in batch.
Transform helper:
Each signal runs on the input shape it was trained for.
Estimate helper:
Inputs are transformed trace snippets. Predictions are inputs multiplied by selected models.
Empty state:
Your traces show what happened. Reflexes label what it meant.
Completion:
Pass complete. Flagged conversations are ready.
No retained text:
Some traces have no retained content, so Reflexes cannot classify them.

Design Notes

Keep the Reflex trace UI:
  • square
  • bordered
  • dense
  • low motion
  • no chat metaphor
  • no big decorative hero inside dashboard
  • one primary action per surface
This should feel closer to a command runner than a BI dashboard. Good mental model:
rg --semantic "frustrated|looping|jailbreak" traces/
Bad mental model:
another analytics dashboard with charts everywhere

Final Recommendation

Make transforms first-class, but defaults strong. The default user path should be:
Select conversations
Select signals
Review estimate
Run
See labels
The advanced path should be:
Open signal
Change transform
Preview transformed inputs
Run
That gives Morph a sharp product story:
Traces tell you what happened.
Reflexes tell you what it meant.
Transforms make the classifier see the trace the right way.