dr-sandbox

Author	SHA1	Message	Date
Michael Pilosov	e94d28b8fc	filenames + run names: J in sci notation (5E-3 not 0.005) Periods in filenames are avoidable and the Prefect UI dislikes them in run names. Uses a shared sci_notation helper in main.py mirrored in the flow. Stem regex (main + parser) now matches J<digits.Ee+-> to accept both old decimal-J and new sci-J filenames so the two transition together. J tag in Prefect tag list also uses the sci form, so chip filters stay consistent. Backfill script extended to find pre-transition (decimal-J) files on disk via a second base-stem variant, then rename them to the sci form. backfill_tags re-patches existing runs so their J tag matches the new canonical form. All 13 existing figs + runs renamed / retagged in-place.	2026-04-22 17:54:46 -06:00
Michael Pilosov	c12d2cda6c	flow: hash user-supplied generator_kwargs, not the merged dict The flow previously merged _DEFAULT_GENERATOR_KWARGS={random_state:0} and n_samples=num_points into generator_kwargs BEFORE hashing. Prefect only records the user-supplied form, so the web app's synth_output_paths disagreed with the flow's output name — a plain swiss_roll run showed 'embedding: n/a' in the runs list despite completing, because the web looked for the hash that excluded those defaults. Now we keep the user-supplied generator_kwargs around for hashing + metadata, and use the merged dict only for the actual generator call. n_samples is already captured in the stem as 'N<n>', and random_state=0 is a flow constant — neither belongs in the semantic identity.	2026-04-22 17:04:50 -06:00
Michael Pilosov	b744c48348	stems: fold generator_kwargs into the hash; fix swiss_roll vs hole ambiguity - run_args_hash now covers (embed_args, generator_kwargs). When gen_kwargs is empty we still hash embed_args alone — so plain generators (s_curve, plain swiss_roll) keep their stems and no existing plain-gen figs need renaming. Kwargs-bearing variants (swiss_roll_hole, blobs, gaussian_quantiles, classification) now disambiguate properly. - Flow persists generator_kwargs into metrics.json meta AND into the frames.json sidecar meta, so the label-enrichment path can find it without another lookup. - _enrich_with_labels discovers gen_kwargs in priority: payload meta --> sibling metrics.json --> DATASET_META first-match. It matches the DATASET_META entry by (path, kwargs) so swiss_roll_hole is no longer confused for plain swiss_roll. - _cached_frames overrides meta.stem with the URL-requested stem before enrichment — after a backfill rename the sidecar's baked-in stem is stale, and we were then failing to find the sibling metrics.json. - Submit duplicate-check uses the new hash and keeps the hashless-legacy check as a safety net. - backfill_hashes.py rewritten: queries Prefect for each recent run's full params, finds the matching fig under any of (current, legacy, hashless) names, renames to the current scheme and patches generator_kwargs into metrics.json.	2026-04-22 16:30:42 -06:00
Michael Pilosov	47f56b57c8	flow: name each Prefect run after its output stem (gen_emb_N_T_J_s_hash) Replaces Prefect's auto adjective-animal names with the same stem that addresses the run's figs on disk, so runs are hoverable/searchable in the Prefect UI by their identifying params. flow_run_name is a callable that reads runtime.flow_run.parameters at scheduling time.	2026-04-22 16:01:49 -06:00
Michael Pilosov	fe49565651	stems: include embed_args hash in output filename + emit frames.json sidecar Stem grows an 8-hex sha1 digest of the (keys-sorted) embed_args dict, so runs differing only in embed_args (e.g. UMAP n_neighbors=5 vs 15) now produce distinct figs. The stem regex and parser both accept an optional _<hash> tail so pre-hash figs still render in the runs list and compare page; legacy filename is resolved on disk fallback. Duplicate-submission check now rejects against BOTH the hashed and the legacy hashless variant so users can't accidentally duplicate an old run either. Flow additionally writes a <stem>.frames.json sidecar next to the plotly HTML (same shape as app/web/plotly_parse returns). Server prefers the sidecar when present; falls back to parsing HTML for older runs. Sidecar emission is non-critical — any failure just logs and keeps going.	2026-04-22 15:52:39 -06:00
Michael Pilosov	a33f8f07cb	cap embedding-flow runner at 1 concurrent run	2026-04-21 22:19:55 -06:00
Michael Pilosov	3280410405	metrics stored (2x)	2026-04-21 20:41:17 -06:00
Michael Pilosov	230c3032e5	parallelism limits	2026-04-21 20:16:33 -06:00
Michael Pilosov	92069a3c91	rename snapshots -> timesteps	2026-04-21 19:55:01 -06:00
Michael Pilosov	7a6e92b31c	normalization for relative jitter	2026-04-21 18:22:51 -06:00
Michael Pilosov	708157c1ef	some minor upgrades to prefect syntax	2026-04-21 18:02:39 -06:00

11 Commits