Reflex Trace Pass Plan

One Sentence

Build Reflexes into Traces as a semantic batch pass: users choose signals, choose or inherit trace transforms, run a batch over historical traffic, and get labeled conversations back in the traces UI.

Why This Exists

The current traces UI answers:

What requests happened?
Which conversations errored?
How many tokens and models were used?
What was the raw input and output?

Reflexes answer a different question:

What did the trace mean?
Was the user frustrated?
Was the agent looping?
Did the assistant leak private reasoning?
Did a custom product-specific failure happen?

The product should not feel like a second observability dashboard. It should feel like grep for semantic events over agent traces.

Product Thesis

Do not expose Reflexes as simple “model checkboxes”. A Reflex result depends on the exact input shape used at inference time. Some models want a single user message. Some want assistant output only. Some want a rolling transcript window. Some custom signals may want user messages plus tool calls, tool errors, or every other user turn. So the core abstraction is:

Signal = Reflex model + trace transform + threshold + result placement

The UI should make this visible enough that users trust the output, without forcing every user into transform configuration.

Terminology

Reflex

A classifier model that returns labels and scores for text. Examples:

jailbreak
guardrail
leaked-thinking
stuck-in-a-loop
user-frustration
custom trained models

Signal

A configured Reflex ready to run on traces. Signals include:

model alias
display name
default transform
threshold overrides
enabled state for a scan or monitor
ownership scope

Transform

A deterministic projection from trace data into one or more text inputs for Reflex inference. Examples:

user message only
assistant message only
rolling 8-turn transcript
user plus previous assistant
user plus tool calls
tool failure window

Reflex Pass

One historical batch run over a set of traces using selected signals and transforms.

Reflex Monitor

A future continuous mode that applies selected signals to newly ingested traces.

Current Repo Context

Relevant landing repo surfaces:

src/app/dashboard/traces/page.tsx
src/app/dashboard/traces/TracesListClient.tsx
src/app/dashboard/traces/[convoId]/page.tsx
src/app/dashboard/traces/[convoId]/ConversationView.tsx
src/app/dashboard/traces/actions.ts
src/app/dashboard/reflex/page.tsx
src/lib/reflex-directory.ts
src/app/products/reflex/page.tsx

Current traces data source:

ClickHouse view morph.ai_spans
Helper: src/lib/clickhouse.ts
Scope helper: traceScopeWhere(scope)

Current trace viewer shape:

list conversations
filter by search, end-user id, errors only
detail page groups spans into turns
shows input, output, spans, finish reason, latency, tokens, models

Current Reflex dashboard shape:

square, bordered UI
default Reflex directory
custom Reflex projects
per-model playground and API copy
custom model metadata is already user/org scoped

Batch API Context

From morphllm/tab#141:

Upload

POST /v1/reflex/asynchronous_batches/upload

Request:

{
  "requests": [
    {
      "id": "row-1",
      "model": ["guardrail", "jailbreak"],
      "text": "text to classify"
    }
  ]
}

Important constraints:

requests is required.
max 10,000 requests per upload.
each id must be unique.
model is an array of one or more model names.
text is required.
max text length is 350,000 chars.
upload returns quickly after queueing rows.
async batch rows are durable in Postgres in the Reflex service.

Response:

{
  "id": "rbatch-...",
  "object": "reflex.batch",
  "status": "queued",
  "request_counts": {
    "total": 3,
    "completed": 0,
    "failed": 0
  },
  "created_at": 1780000000
}

Poll

GET /v1/reflex/asynchronous_batches/{batch_id}

Returns same batch status shape.

Results

GET /v1/reflex/asynchronous_batches/{batch_id}/results

Response:

{
  "id": "rbatch-...",
  "object": "reflex.batch.results",
  "status": "completed",
  "request_counts": {
    "total": 3,
    "completed": 2,
    "failed": 1
  },
  "results": [
    {
      "id": "row-1",
      "status": "completed",
      "predictions": [
        {
          "model": "guardrail",
          "mode": "single_label",
          "classes": [
            { "label": "clean", "score": 0.8, "selected": true }
          ]
        }
      ]
    },
    {
      "id": "row-2",
      "status": "pending"
    },
    {
      "id": "row-3",
      "status": "failed",
      "error": {
        "type": "input_too_long",
        "message": "too many tokens"
      }
    }
  ]
}

Core UX

Primary Action

Add selection controls to /dashboard/traces:

[ ] Select visible        0 selected        [ Run selected ]

Users can select individual conversations or every visible conversation, then run a Reflex pass on that explicit set. If any selected conversation already has Reflex results, show an in-app dialog:

Rerun existing Reflex results?

3 of 9 selected conversations already have labels.

[Cancel] [Run only new] [Rerun all]

The run action opens a right-side panel or full-height drawer.

Reflex Pass Panel

Suggested structure:

Run Reflex Pass

Scope
[ Current filters ] [ This conversation ] [ Last 24h ] [ Custom range ]

Built-in signals
[x] Jailbreak             User chat
[x] Guardrail             User chat
[x] User Frustration      User chat
[ ] Incomplete Thought    User chat
[ ] Difficulty            User chat
[ ] Ambiguity             User chat
[ ] Domain                User chat
[x] Leaked Thinking       Agent chat
[x] Stuck-in-a-loop       Agent chat

Custom signals
[ ] Bad handoff           Full trace
[ ] Escalation needed     User chat
[ ] Wrong policy          Agent chat

Estimate
9,412 trace inputs
16,423 predictions
Batch mode
Estimated cost: $0.49

[ Run pass ]

Why Group By Transform Family

The batch API lets one row include many models:

{
  "id": "user:event_123",
  "model": ["jailbreak", "guardrail", "user-frustration"],
  "text": "user text"
}

But only models that share a compatible input transform should be batched together on the same row. Do:

one user-chat row -> user-chat models
one agent-chat row -> agent-chat models
one full-trace row -> full-trace models

Do not:

one raw trace row -> every selected model

That creates train/inference mismatch and bad labels.

Default Transform Map

Initial defaults:

jailbreak             user_message
guardrail             user_message
incomplete-thought    user_message
difficulty            user_message
ambiguity             user_message
domain                user_message
user-frustration      user_message
leaked-thinking       assistant_message
stuck-in-a-loop       assistant_message

Notes:

user-frustration defaults to user chat for v1 so it sees the same data shape as the user-state classifier was trained on.
stuck-in-a-loop defaults to agent chat because it is primarily about repeated agent output or action descriptions.
leaked-thinking should run on assistant-visible output, not tool output or hidden spans.
custom signals should carry their own default transform.

Transform Catalog

`user_message`

Product label: User chat. One row per user-authored chat message. Input:

{turn.inputText}

Best for:

jailbreak
guardrail
incomplete thought
difficulty
ambiguity
domain

`assistant_message`

Product label: Agent chat. One row per agent-authored chat message. Input:

{turn.outputText}

Best for:

leaked thinking
stuck in a loop
bad response style
unsafe assistant output

`full_conversation`

Product label: Full trace. One row per complete trace or conversation. Input:

Full conversation:
...

Best for:

coarse conversation quality
support outcome
final resolution status

Warning:

can be expensive
may exceed context window
not appropriate for per-turn labeling

Transform Definition Shape

Represent transforms as data, not scattered switch statements. Proposed TypeScript type:

type ReflexTraceTransformKind =
  | 'user_message'
  | 'assistant_message'
  | 'full_conversation';

type ReflexTraceTransform = {
  id: string;
  kind: ReflexTraceTransformKind;
  name: string;
  description: string;
  inputKind:
    | 'user_chat'
    | 'agent_chat'
    | 'full_trace';
  format: 'plain_transcript' | 'json_messages' | 'compact_trace';
  maxTokens?: number;
  includeUserMessages?: boolean;
  includeAssistantMessages?: boolean;
  includeToolCalls?: boolean;
  includeToolResults?: boolean;
};

Proposed signal config:

type TraceSignalConfig = {
  signalId: string;
  modelAlias: string;
  displayName: string;
  source: 'default' | 'custom';
  category: string;
  labels: string[];
  defaultTransformId: string;
  thresholdOverrides?: Record<string, number>;
  enabledByDefault?: boolean;
};

Custom Signals

Custom Reflexes must include transform metadata. When creating or editing a custom Reflex, ask:

What should this signal inspect?

Options:

User chat
Agent chat
Full trace

Store this on the custom Reflex record. Suggested fields in landing Postgres reflex_chats or a related metadata table:

trace_default_transform_id
trace_transform_config_json
trace_transform_version
trace_result_threshold_json

If modifying reflex_chats is too broad, add a landing-owned table:

reflex_trace_signal_configs
- id
- user_id
- org_id
- reflex_chat_id
- model_alias
- display_name
- transform_id
- transform_config
- threshold_config
- created_at
- updated_at

Default Reflexes can be represented in code first, then moved to DB if needed.

Batch Row IDs

Batch request IDs should encode enough information to route results back to ClickHouse rows. Suggested format:

{inputKind}:{convoId}:{eventId}:{spanId?}:{transformId}:{ordinal}

Examples:

user_turn:convo_123:event_5::user_message:0
assistant_turn:convo_123:event_5::assistant_message:0
full_trace:convo_123:::full_conversation:0

For safer parsing, also keep a run-local mapping table in Postgres:

reflex_trace_pass_items
- id
- run_id
- request_id
- convo_id
- event_id
- span_id
- input_kind
- transform_id
- source_start_time
- selected_models

This avoids relying on string parsing if IDs contain unexpected characters.

ClickHouse Storage

Create a new ClickHouse table for Reflex outputs. Do not reuse morph.signals. That table is for explicit feedback and edit signals. Reflex outputs are derived semantic labels. Proposed table:

CREATE TABLE IF NOT EXISTS morph.trace_reflex_results
(
    account_user_id String,
    account_org_id String DEFAULT '',

    run_id String,
    batch_id String,
    request_id String,

    convo_id String,
    event_id String,
    span_id String DEFAULT '',

    input_kind LowCardinality(String),
    transform_id LowCardinality(String),
    transform_version UInt16 DEFAULT 1,

    reflex_model String,
    status LowCardinality(String),

    mode LowCardinality(String) DEFAULT '',
    top_label String DEFAULT '',
    top_score Float32 DEFAULT 0,
    selected_labels Array(String) DEFAULT [],
    classes_json String DEFAULT '',

    error_type String DEFAULT '',
    error_message String DEFAULT '',

    source_start_time DateTime64(3),
    created_at DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(created_at)
ORDER BY (account_org_id, account_user_id, convo_id, source_start_time, reflex_model, created_at)
TTL toDateTime(created_at) + INTERVAL 365 DAY;

Notes:

no raw input text
raw text stays in ai_spans
classes_json keeps full score details without over-normalizing v1
top_label, top_score, and selected_labels support fast dashboard filters
include source_start_time so results can be aligned to trace rows

Optional materialized rollup later:

morph.trace_reflex_conversation_rollups
- account_user_id
- account_org_id
- convo_id
- last_result_at
- fired_count
- top_labels
- max_scores_by_model

But start with query-time aggregation unless performance requires it.

Landing Postgres State

Use Postgres for run lifecycle and UI state. Proposed tables:

reflex_trace_passes
- id uuid
- user_id text
- org_id text nullable
- batch_id text nullable
- status text
- scope_json jsonb
- selected_signals_json jsonb
- transform_plan_json jsonb
- total_inputs int
- total_predictions int
- completed_inputs int
- failed_inputs int
- error_message text nullable
- created_at timestamp
- started_at timestamp nullable
- completed_at timestamp nullable

reflex_trace_pass_items
- id uuid
- pass_id uuid
- request_id text
- convo_id text
- event_id text
- span_id text nullable
- input_kind text
- transform_id text
- source_start_time timestamp
- selected_models jsonb

Why Postgres plus ClickHouse:

Postgres tracks the app workflow.
ClickHouse stores queryable per-trace results next to spans.
The Reflex service already owns async batch internal state, but landing needs user-facing run state and mapping.

Server Flow

1. Build Scope

From the traces page filters:

type TracePassScope =
  | { type: 'current_filters'; filters: TraceFilters }
  | { type: 'conversation'; convoId: string }
  | { type: 'date_range'; from: string; to: string; filters?: TraceFilters }
  | { type: 'since_last_run'; filters?: TraceFilters };

Use the same resolveScope() and traceScopeWhere() patterns as src/app/dashboard/traces/actions.ts.

2. Load Candidate Trace Data

Query ai_spans, grouped into conversations and turns using the existing logic from getConversationDetail. For large historical scans, do not load unlimited rows into memory. Approach:

cap first version at 10,000 transformed inputs per batch
show “large scan will be split into N batches” later
build rows server-side
keep raw text only long enough to upload to Reflex batch API

3. Build Transform Plan

Group selected signals by transform. Example:

user_message:
  models: [jailbreak, guardrail, ambiguity]
  rows: one per user turn

assistant_message:
  models: [leaked-thinking, stuck-in-a-loop]
  rows: one per assistant turn

full_conversation:
  models: [custom-bad-handoff]
  rows: one per trace

4. Create Pass Record

Insert reflex_trace_passes. Insert reflex_trace_pass_items for each outgoing batch row.

5. Upload Batch

Call:

POST https://api.morphllm.com/v1/reflex/asynchronous_batches/upload

Use user or org API key according to the active dashboard context. Use Idempotency-Key:

trace-pass:{passId}

Store returned batch_id.

6. Poll

Poll batch status from a route/server action:

GET /v1/reflex/asynchronous_batches/{batch_id}

Update completed_inputs and failed_inputs.

7. Fetch Results

When complete, call:

GET /v1/reflex/asynchronous_batches/{batch_id}/results

Join results to reflex_trace_pass_items by request_id. Explode:

one batch result row with multiple predictions
-> many ClickHouse trace_reflex_results rows

8. Write Results To ClickHouse

Insert into morph.trace_reflex_results. Each prediction becomes one row. Pseudo:

for (const result of results) {
  const item = itemByRequestId[result.id];

  if (result.status === 'completed') {
    for (const prediction of result.predictions) {
      insertResult({
        runId,
        batchId,
        requestId: result.id,
        convoId: item.convoId,
        eventId: item.eventId,
        spanId: item.spanId ?? '',
        inputKind: item.inputKind,
        transformId: item.transformId,
        reflexModel: prediction.model,
        status: 'completed',
        mode: prediction.mode,
        topLabel: topClass(prediction)?.label ?? '',
        topScore: topClass(prediction)?.score ?? 0,
        selectedLabels: selectedLabels(prediction),
        classesJson: JSON.stringify(prediction.classes),
        sourceStartTime: item.sourceStartTime,
      });
    }
  } else if (result.status === 'failed') {
    for (const model of item.selectedModels) {
      insertResult({
        reflexModel: model,
        status: 'failed',
        errorType: result.error.type,
        errorMessage: result.error.message,
      });
    }
  }
}

Dashboard Query Changes

Conversation List

Current list query aggregates from ai_spans. Add optional aggregation from trace_reflex_results:

per convo:
- fired_count
- top selected labels
- max score by model
- latest pass timestamp

UI row:

convo_123
14 turns | 8.2k input | 2.1k output

[frustrated 97%] [looping 88%] [leaked-thinking clean]

Add filters:

All
Flagged
Frustrated
Looping
Jailbreak
Leaked Thinking
Custom signal...

Conversation Detail

For a convo detail page, query Reflex results:

WHERE convo_id = {convoId}

Join by:

event_id
span_id when present
source_start_time as fallback

Render:

inline chips on turn cards
a right rail with fired results
a summary band near the page title

Suggested right rail:

Reflexes
3 fired

Turn 4
user-frustration 0.97

Turn 7
stuck-in-a-loop 0.88

Turn 9
leaked-thinking 0.91

Suggested inline chip:

[frustrated 97%]

Use muted chips for non-fired or clean labels only when the user expands details. Do not clutter the main trace with every negative result.

UI States

Empty

No Reflex passes yet
Run a semantic pass over these traces to find frustration, loops, jailbreaks, and custom product failures.

[Select visible] [Run selected]

Estimating

Counting trace inputs...

No spinner if possible. Use skeleton rows or a small inline status.

Ready To Run

9,412 inputs
16,423 predictions
Batch mode
Estimated cost: $0.49

Running

rbatch-...
4,120 / 9,412 inputs complete
43.8%

Use request_counts from batch status.

Complete

Pass complete
317 conversations flagged
1,204 selected labels

CTA:

View flagged conversations

Partial Failure

Pass complete with failed rows
8,920 completed
492 failed

CTA:

View failed inputs
Retry failed

Cost Model

Docs say async batch is priced per event, where one classification is one event. If a batch row has multiple models, cost should count predictions, not rows:

total_predictions = sum(row.model.length)

Display both:

9,412 inputs
16,423 predictions

Estimate:

estimated_cost = total_predictions * batch_rate

If usage tier changes after 1M events/month, show:

Estimated before volume discounts

or use billing API if available.

Permissions

Use getDataContext() everywhere. Rules:

org context sees org traces and org custom Reflexes
personal context sees personal traces and personal custom Reflexes
org admins can run passes
members may view results if they can view traces
member ability to run passes should be a product decision

Conservative default:

admins can run Reflex passes
members can view existing results

Use requireOrgAdmin() for the initial run action if cost is meaningful. Do not create a separate top-level “Reflex Trace Dashboard” yet. Use:

/dashboard/traces for trace list and pass action
/dashboard/traces/[convoId] for result inspection
/dashboard/reflex for model creation and custom signal configuration

Possible future route:

/dashboard/traces/reflex-passes

Only add once pass history needs its own page.

Landing Page Updates

Home

Add Reflexes as the fourth specialized model in SpecializedModelsPanel. Suggested copy:

Reflexes
Semantic trace classifiers
<90ms
per event

Link:

/products/reflex

Product Reflex Page

Make the product flow explicit:

Trace traffic
-> choose signals
-> transform inputs
-> batch pass
-> flagged conversations

Current page already says the right high-level thing:

Catch what your traces can't see

But it should show the actual product surface:

trace list with “Run selected”
selectable trace rows
rerun confirmation for conversations with existing Reflex labels
transform-aware signal picker
labeled conversations

Implementation Phases

Phase 0.5: Live Preview Transport

Before the async batch queue is ready, the dashboard can run a capped live preview through:

POST /v1/reflex/predict

This should stay server-side:

fetch trace details by conversation id
build User chat, Agent chat, or Full trace inputs
call the public Reflex API with the signed-in user’s Morph API key
return the same pass/result shape the UI will later receive from batch results

Limits for this preview:

cap conversations
cap transformed inputs per transform
cap total predictions
do not persist raw trace text

The async batch transport should replace this implementation behind the same UI action once it supports upload, queue/poll, and result download.

Phase 0: Product Data Definitions

Add local definitions:

transform catalog
default Reflex to transform mapping
helper to group signals by transform
helper to estimate inputs and predictions

Files likely:

src/lib/reflex-traces/transforms.ts
src/lib/reflex-traces/signals.ts
src/lib/reflex-traces/estimate.ts

No UI yet.

Phase 1: ClickHouse Table

Add ClickHouse schema file:

morphcli/tracing/clickhouse/reflex-results.sql

Include:

morph.trace_reflex_results
comments documenting that raw text is not stored
account scope columns

Phase 2: Postgres Run Tables

Add Drizzle migration:

drizzle/00XX_add_reflex_trace_passes.sql

Add schema to src/lib/db/schema location used by this repo. Tables:

reflex_trace_passes
reflex_trace_pass_items

Phase 3: Trace Pass Server Actions

Add actions:

src/app/dashboard/traces/reflex-actions.ts

Actions:

estimateReflexTracePass(scope, selectedSignals)
createReflexTracePass(scope, selectedSignals)
getReflexTracePass(passId)
syncReflexTracePassResults(passId)

Make sure large scans are bounded in v1.

Phase 4: Run Panel UI

Add components:

src/app/dashboard/traces/ReflexPassPanel.tsx
src/app/dashboard/traces/ReflexSignalPicker.tsx
src/app/dashboard/traces/ReflexPassProgress.tsx

Keep Reflex-owned surfaces square:

no rounded cards
bordered panels
dense controls
muted explanatory copy

Phase 5: Result Chips In Trace List

Extend TraceConversation type with:

reflexSummary?: {
  firedCount: number;
  labels: Array<{
    model: string;
    label: string;
    score: number;
  }>;
};

Show max 3 chips per row.

Phase 6: Result Rail In Detail View

Extend ConversationDetail with Reflex results. Render:

summary near title
inline turn chips
right-side result rail on desktop
collapsible section on mobile

Phase 7: Custom Signal Transform Config

Add UI to custom Reflex detail page:

Trace input
[User chat v]

Advanced:

max tokens
include tool context for Full trace

This should be optional in v1. Default custom transform can be user_message until configured.

Phase 8: Monitor Mode

After historical batch pass works, add:

Enable monitor

Monitor mode applies selected signals to future traces. This likely needs ingest-time or cron processing and should not block the first batch UX.

Edge Cases

Zero Data Retention

If trace content was not retained:

transforms may produce no inputs
show “content not retained” in estimate
skip rows with empty text

Very Large Conversations

Rolling windows may exceed model context. Mitigations:

cap by max tokens
truncate oldest content first
keep the target turn
show “truncated” metadata in transform mapping if needed

Tool Output Size

Tool results can be huge. For tool transforms:

include tool name
include args
include status
include error message
truncate result body by default

Multiple Runs

A conversation may have results from multiple passes. Default display:

show latest completed pass results

Advanced filter:

all passes
specific pass

Re-running Same Pass

Use idempotency key per pass creation, not per arbitrary scope. A deliberate rerun should create a new pass.

Partial Batch Results

Results endpoint can return pending rows. Do not write pending rows to ClickHouse. Only write:

completed prediction rows
failed rows with errors

Failed Rows With Multiple Models

If an entire row fails, write one failed result per intended model so UI can show model-specific failure counts.

Per-Model Error Inside Completed Row

A completed row can contain prediction entries where one model has an error. Handle:

{
  "model": "some-model",
  "error": {
    "type": "inference_error",
    "message": "..."
  }
}

Write as failed result for that model.

Open Product Questions

Can non-admin org members run paid batch passes, or only view results?
Should default signals be enabled by default in the panel?
Is user-frustration live now, or still coming soon in the directory?
Should clean labels be stored and queryable, or only selected labels plus errors?
Should the first version support multi-batch scans above 10,000 transformed inputs?
Should transform config be exposed during custom Reflex training, or only after the model is ready?
Should historical pass results be included in billing usage details separately from normal API usage?

Recommended V1

Build the smallest coherent version:

Add ClickHouse trace_reflex_results.
Add Postgres pass and pass item tables.
Add default transform catalog in code.
Add selectable rows and /dashboard/traces “Run selected” panel.
Support current filters and single conversation scope.
Support default Reflexes and ready custom Reflexes.
For custom Reflexes, default to user_message with visible transform override in the run panel.
Upload one async batch.
Poll progress.
Write completed results to ClickHouse.
Show result chips in trace list and detail page.

This gives users the core product loop:

Pick traces
Pick signals
Run pass
See flagged conversations
Open trace
Understand what happened

Recommended V2

Add:

saved signal presets
transform config on custom Reflex detail pages
pass history page
multi-batch scans over 10,000 inputs
retry failed inputs
export labels
monitor mode for future traces
Slack/webhook alerts on selected labels

UX Copy Bank

Primary CTA:

Run selected

Panel subtitle:

Turn traces into classifier inputs, then run semantic signals in batch.

Transform helper:

Each signal runs on the input shape it was trained for.

Estimate helper:

Inputs are transformed trace snippets. Predictions are inputs multiplied by selected models.

Empty state:

Your traces show what happened. Reflexes label what it meant.

Completion:

Pass complete. Flagged conversations are ready.

No retained text:

Some traces have no retained content, so Reflexes cannot classify them.

Design Notes

Keep the Reflex trace UI:

square
bordered
dense
low motion
no chat metaphor
no big decorative hero inside dashboard
one primary action per surface

This should feel closer to a command runner than a BI dashboard. Good mental model:

rg --semantic "frustrated|looping|jailbreak" traces/

Bad mental model:

another analytics dashboard with charts everywhere

Final Recommendation

Make transforms first-class, but defaults strong. The default user path should be:

Select conversations
Select signals
Review estimate
Run
See labels

The advanced path should be:

Open signal
Change transform
Preview transformed inputs
Run

That gives Morph a sharp product story:

Traces tell you what happened.
Reflexes tell you what it meant.
Transforms make the classifier see the trace the right way.

​Reflex Trace Pass Plan

​One Sentence

​Why This Exists

​Product Thesis

​Terminology

​Reflex

​Signal

​Transform

​Reflex Pass

​Reflex Monitor

​Current Repo Context

​Batch API Context

​Upload

​Poll

​Results

​Core UX

​Primary Action

​Reflex Pass Panel

​Why Group By Transform Family

​Default Transform Map

​Transform Catalog

​user_message

​assistant_message

​full_conversation

​Transform Definition Shape

​Custom Signals

​Batch Row IDs

​ClickHouse Storage

​Landing Postgres State

​Server Flow

​1. Build Scope

​2. Load Candidate Trace Data

​3. Build Transform Plan

​4. Create Pass Record

​5. Upload Batch

​6. Poll

​7. Fetch Results

​8. Write Results To ClickHouse

​Dashboard Query Changes

​Conversation List

​Conversation Detail

​UI States

​Empty

​Estimating

​Ready To Run

​Running

​Complete

​Partial Failure

​Cost Model

​Permissions

​Navigation

​Landing Page Updates

​Home

​Product Reflex Page

​Implementation Phases

​Phase 0.5: Live Preview Transport

​Phase 0: Product Data Definitions

​Phase 1: ClickHouse Table

​Phase 2: Postgres Run Tables

​Phase 3: Trace Pass Server Actions

​Phase 4: Run Panel UI

​Phase 5: Result Chips In Trace List

​Phase 6: Result Rail In Detail View

​Phase 7: Custom Signal Transform Config

​Phase 8: Monitor Mode

​Edge Cases

​Zero Data Retention

​Very Large Conversations

​Tool Output Size

​Multiple Runs

​Re-running Same Pass

​Partial Batch Results

​Failed Rows With Multiple Models

​Per-Model Error Inside Completed Row

​Open Product Questions

​Recommended V1

​Recommended V2

​UX Copy Bank

​Design Notes

​Final Recommendation

Reflex Trace Pass Plan

One Sentence

Why This Exists

Product Thesis

Terminology

Reflex

Signal

Transform

Reflex Pass

Reflex Monitor

Current Repo Context

Batch API Context

Upload

Poll

Results

Core UX

Primary Action

Reflex Pass Panel

Why Group By Transform Family

Default Transform Map

Transform Catalog

`user_message`

`assistant_message`

`full_conversation`

Transform Definition Shape

Custom Signals

Batch Row IDs

ClickHouse Storage

Landing Postgres State

Server Flow

1. Build Scope

2. Load Candidate Trace Data

3. Build Transform Plan

4. Create Pass Record

5. Upload Batch

6. Poll

7. Fetch Results

8. Write Results To ClickHouse

Dashboard Query Changes

Conversation List

Conversation Detail

UI States

Empty

Estimating

Ready To Run

Running

Complete

Partial Failure

Cost Model

Permissions

Navigation

Landing Page Updates

Home

Product Reflex Page

Implementation Phases

Phase 0.5: Live Preview Transport

Phase 0: Product Data Definitions

Phase 1: ClickHouse Table

Phase 2: Postgres Run Tables

Phase 3: Trace Pass Server Actions

Phase 4: Run Panel UI

Phase 5: Result Chips In Trace List

Phase 6: Result Rail In Detail View

Phase 7: Custom Signal Transform Config

Phase 8: Monitor Mode

Edge Cases

Zero Data Retention

Very Large Conversations

Tool Output Size

Multiple Runs

Re-running Same Pass

Partial Batch Results

Failed Rows With Multiple Models

Per-Model Error Inside Completed Row

Open Product Questions

Recommended V1

Recommended V2

UX Copy Bank

Design Notes

Final Recommendation