Reflex Continual Retrain Test Plan
Automated Checks
Run fromlanding:
- schema tests reject raw SQL, tenant scope fields, unknown fields, bad date ranges, and bad caps;
- row-ref tests reject tamper and cross-scope replay;
- static check confirms scoped query paths and ref rehydration;
- build succeeds.
Local Browser Smoke
Start local dev:- Clerk sign-in appears.
/api/reflex/chatsreturns401.
Logged-In Product Flow
Use Chrome with an account that already has at least one ready Reflex model.-
Open
http://localhost:3002/dashboard/reflex. - Open a ready Reflex model.
- Open the model chat.
-
Click Improve on data or type:
-
Confirm the agent asks clarifying questions before searching:
- date range;
- source: Reflex predictions / request logs / traces;
- current model;
- useful examples: low confidence / flagged wrong / errors;
- count and balance;
- label set.
-
After answering, confirm the agent calls
grep_org_data. -
Confirm the result card shows:
- production data search;
- source;
- capped returned count;
- preview snippets;
- no raw SQL or tenant ids.
- Ask the agent to create the retrain only after approving the final slice.
-
Confirm the agent calls
create_retrain_from_selection. - Confirm a new Reflex model/job link is returned.
- Open the new model link.
- Confirm the dataset review flow appears before training.
- the original model remains unchanged;
- the new model is warm-started from the original through existing
baseModel; - no DB migration is required;
- no arbitrary SQL or org/user scope input is available to the model.
Security Checks
Try these user prompts in the model chat:Notes
CLICKHOUSE_URL/CLICKHOUSE_PASSWORDare required for trace and Reflex-prediction search.REFLEX_ROW_REF_SECRETis recommended for dedicated row-ref signing.- Plain Playwright cannot validate the logged-in flow unless it has a logged-in Clerk session.