Reflex Trace Pass Plan
One Sentence
Build Reflexes into Traces as a semantic batch pass: users choose signals, choose or inherit trace transforms, run a batch over historical traffic, and get labeled conversations back in the traces UI.Why This Exists
The current traces UI answers:- What requests happened?
- Which conversations errored?
- How many tokens and models were used?
- What was the raw input and output?
- What did the trace mean?
- Was the user frustrated?
- Was the agent looping?
- Did the assistant leak private reasoning?
- Did a custom product-specific failure happen?
grep for semantic events over agent traces.
Product Thesis
Do not expose Reflexes as simple “model checkboxes”. A Reflex result depends on the exact input shape used at inference time. Some models want a single user message. Some want assistant output only. Some want a rolling transcript window. Some custom signals may want user messages plus tool calls, tool errors, or every other user turn. So the core abstraction is:Terminology
Reflex
A classifier model that returns labels and scores for text. Examples:jailbreakguardrailleaked-thinkingstuck-in-a-loopuser-frustration- custom trained models
Signal
A configured Reflex ready to run on traces. Signals include:- model alias
- display name
- default transform
- threshold overrides
- enabled state for a scan or monitor
- ownership scope
Transform
A deterministic projection from trace data into one or more text inputs for Reflex inference. Examples:- user message only
- assistant message only
- rolling 8-turn transcript
- user plus previous assistant
- user plus tool calls
- tool failure window
Reflex Pass
One historical batch run over a set of traces using selected signals and transforms.Reflex Monitor
A future continuous mode that applies selected signals to newly ingested traces.Current Repo Context
Relevant landing repo surfaces:src/app/dashboard/traces/page.tsxsrc/app/dashboard/traces/TracesListClient.tsxsrc/app/dashboard/traces/[convoId]/page.tsxsrc/app/dashboard/traces/[convoId]/ConversationView.tsxsrc/app/dashboard/traces/actions.tssrc/app/dashboard/reflex/page.tsxsrc/lib/reflex-directory.tssrc/app/products/reflex/page.tsx
- ClickHouse view
morph.ai_spans - Helper:
src/lib/clickhouse.ts - Scope helper:
traceScopeWhere(scope)
- list conversations
- filter by search, end-user id, errors only
- detail page groups spans into turns
- shows input, output, spans, finish reason, latency, tokens, models
- square, bordered UI
- default Reflex directory
- custom Reflex projects
- per-model playground and API copy
- custom model metadata is already user/org scoped
Batch API Context
Frommorphllm/tab#141:
Upload
requestsis required.- max 10,000 requests per upload.
- each
idmust be unique. modelis an array of one or more model names.textis required.- max text length is 350,000 chars.
- upload returns quickly after queueing rows.
- async batch rows are durable in Postgres in the Reflex service.
Poll
Results
Core UX
Primary Action
Add selection controls to/dashboard/traces:
Reflex Pass Panel
Suggested structure:Why Group By Transform Family
The batch API lets one row include many models:Default Transform Map
Initial defaults:user-frustrationdefaults to user chat for v1 so it sees the same data shape as the user-state classifier was trained on.stuck-in-a-loopdefaults to agent chat because it is primarily about repeated agent output or action descriptions.leaked-thinkingshould run on assistant-visible output, not tool output or hidden spans.- custom signals should carry their own default transform.
Transform Catalog
user_message
Product label: User chat.
One row per user-authored chat message.
Input:
- jailbreak
- guardrail
- incomplete thought
- difficulty
- ambiguity
- domain
assistant_message
Product label: Agent chat.
One row per agent-authored chat message.
Input:
- leaked thinking
- stuck in a loop
- bad response style
- unsafe assistant output
full_conversation
Product label: Full trace.
One row per complete trace or conversation.
Input:
- coarse conversation quality
- support outcome
- final resolution status
- can be expensive
- may exceed context window
- not appropriate for per-turn labeling
Transform Definition Shape
Represent transforms as data, not scattered switch statements. Proposed TypeScript type:Custom Signals
Custom Reflexes must include transform metadata. When creating or editing a custom Reflex, ask:- User chat
- Agent chat
- Full trace
reflex_chats or a related metadata table:
reflex_chats is too broad, add a landing-owned table:
Batch Row IDs
Batch request IDs should encode enough information to route results back to ClickHouse rows. Suggested format:ClickHouse Storage
Create a new ClickHouse table for Reflex outputs. Do not reusemorph.signals. That table is for explicit feedback and edit signals. Reflex outputs are derived semantic labels.
Proposed table:
- no raw input text
- raw text stays in
ai_spans classes_jsonkeeps full score details without over-normalizing v1top_label,top_score, andselected_labelssupport fast dashboard filters- include
source_start_timeso results can be aligned to trace rows
Landing Postgres State
Use Postgres for run lifecycle and UI state. Proposed tables:- Postgres tracks the app workflow.
- ClickHouse stores queryable per-trace results next to spans.
- The Reflex service already owns async batch internal state, but landing needs user-facing run state and mapping.
Server Flow
1. Build Scope
From the traces page filters:resolveScope() and traceScopeWhere() patterns as src/app/dashboard/traces/actions.ts.
2. Load Candidate Trace Data
Queryai_spans, grouped into conversations and turns using the existing logic from getConversationDetail.
For large historical scans, do not load unlimited rows into memory.
Approach:
- cap first version at 10,000 transformed inputs per batch
- show “large scan will be split into N batches” later
- build rows server-side
- keep raw text only long enough to upload to Reflex batch API
3. Build Transform Plan
Group selected signals by transform. Example:4. Create Pass Record
Insertreflex_trace_passes.
Insert reflex_trace_pass_items for each outgoing batch row.
5. Upload Batch
Call:Idempotency-Key:
batch_id.
6. Poll
Poll batch status from a route/server action:completed_inputs and failed_inputs.
7. Fetch Results
When complete, call:reflex_trace_pass_items by request_id.
Explode:
8. Write Results To ClickHouse
Insert intomorph.trace_reflex_results.
Each prediction becomes one row.
Pseudo:
Dashboard Query Changes
Conversation List
Current list query aggregates fromai_spans.
Add optional aggregation from trace_reflex_results:
Conversation Detail
For a convo detail page, query Reflex results:event_idspan_idwhen presentsource_start_timeas fallback
- inline chips on turn cards
- a right rail with fired results
- a summary band near the page title
UI States
Empty
Estimating
Ready To Run
Running
request_counts from batch status.
Complete
Partial Failure
Cost Model
Docs say async batch is priced per event, where one classification is one event. If a batch row has multiple models, cost should count predictions, not rows:Permissions
UsegetDataContext() everywhere.
Rules:
- org context sees org traces and org custom Reflexes
- personal context sees personal traces and personal custom Reflexes
- org admins can run passes
- members may view results if they can view traces
- member ability to run passes should be a product decision
- admins can run Reflex passes
- members can view existing results
requireOrgAdmin() for the initial run action if cost is meaningful.
Navigation
Do not create a separate top-level “Reflex Trace Dashboard” yet. Use:/dashboard/tracesfor trace list and pass action/dashboard/traces/[convoId]for result inspection/dashboard/reflexfor model creation and custom signal configuration
Landing Page Updates
Home
Add Reflexes as the fourth specialized model inSpecializedModelsPanel.
Suggested copy:
Product Reflex Page
Make the product flow explicit:- trace list with “Run selected”
- selectable trace rows
- rerun confirmation for conversations with existing Reflex labels
- transform-aware signal picker
- labeled conversations
Implementation Phases
Phase 0.5: Live Preview Transport
Before the async batch queue is ready, the dashboard can run a capped live preview through:- fetch trace details by conversation id
- build
User chat,Agent chat, orFull traceinputs - call the public Reflex API with the signed-in user’s Morph API key
- return the same pass/result shape the UI will later receive from batch results
- cap conversations
- cap transformed inputs per transform
- cap total predictions
- do not persist raw trace text
Phase 0: Product Data Definitions
Add local definitions:- transform catalog
- default Reflex to transform mapping
- helper to group signals by transform
- helper to estimate inputs and predictions
Phase 1: ClickHouse Table
Add ClickHouse schema file:morph.trace_reflex_results- comments documenting that raw text is not stored
- account scope columns
Phase 2: Postgres Run Tables
Add Drizzle migration:src/lib/db/schema location used by this repo.
Tables:
reflex_trace_passesreflex_trace_pass_items
Phase 3: Trace Pass Server Actions
Add actions:estimateReflexTracePass(scope, selectedSignals)createReflexTracePass(scope, selectedSignals)getReflexTracePass(passId)syncReflexTracePassResults(passId)
Phase 4: Run Panel UI
Add components:- no rounded cards
- bordered panels
- dense controls
- muted explanatory copy
Phase 5: Result Chips In Trace List
ExtendTraceConversation type with:
Phase 6: Result Rail In Detail View
ExtendConversationDetail with Reflex results.
Render:
- summary near title
- inline turn chips
- right-side result rail on desktop
- collapsible section on mobile
Phase 7: Custom Signal Transform Config
Add UI to custom Reflex detail page:- max tokens
- include tool context for Full trace
user_message until configured.
Phase 8: Monitor Mode
After historical batch pass works, add:Edge Cases
Zero Data Retention
If trace content was not retained:- transforms may produce no inputs
- show “content not retained” in estimate
- skip rows with empty text
Very Large Conversations
Rolling windows may exceed model context. Mitigations:- cap by max tokens
- truncate oldest content first
- keep the target turn
- show “truncated” metadata in transform mapping if needed
Tool Output Size
Tool results can be huge. For tool transforms:- include tool name
- include args
- include status
- include error message
- truncate result body by default
Multiple Runs
A conversation may have results from multiple passes. Default display:- show latest completed pass results
- all passes
- specific pass
Re-running Same Pass
Use idempotency key per pass creation, not per arbitrary scope. A deliberate rerun should create a new pass.Partial Batch Results
Results endpoint can return pending rows. Do not write pending rows to ClickHouse. Only write:- completed prediction rows
- failed rows with errors
Failed Rows With Multiple Models
If an entire row fails, write one failed result per intended model so UI can show model-specific failure counts.Per-Model Error Inside Completed Row
A completed row can contain prediction entries where one model has an error. Handle:Open Product Questions
- Can non-admin org members run paid batch passes, or only view results?
- Should default signals be enabled by default in the panel?
- Is
user-frustrationlive now, or still coming soon in the directory? - Should clean labels be stored and queryable, or only selected labels plus errors?
- Should the first version support multi-batch scans above 10,000 transformed inputs?
- Should transform config be exposed during custom Reflex training, or only after the model is ready?
- Should historical pass results be included in billing usage details separately from normal API usage?
Recommended V1
Build the smallest coherent version:- Add ClickHouse
trace_reflex_results. - Add Postgres pass and pass item tables.
- Add default transform catalog in code.
- Add selectable rows and
/dashboard/traces“Run selected” panel. - Support current filters and single conversation scope.
- Support default Reflexes and ready custom Reflexes.
- For custom Reflexes, default to
user_messagewith visible transform override in the run panel. - Upload one async batch.
- Poll progress.
- Write completed results to ClickHouse.
- Show result chips in trace list and detail page.
Recommended V2
Add:- saved signal presets
- transform config on custom Reflex detail pages
- pass history page
- multi-batch scans over 10,000 inputs
- retry failed inputs
- export labels
- monitor mode for future traces
- Slack/webhook alerts on selected labels
UX Copy Bank
Primary CTA:Design Notes
Keep the Reflex trace UI:- square
- bordered
- dense
- low motion
- no chat metaphor
- no big decorative hero inside dashboard
- one primary action per surface