Periods in filenames are avoidable and the Prefect UI dislikes them in
run names. Uses a shared sci_notation helper in main.py mirrored in the
flow. Stem regex (main + parser) now matches J<digits.Ee+-> to accept
both old decimal-J and new sci-J filenames so the two transition
together. J tag in Prefect tag list also uses the sci form, so chip
filters stay consistent.
Backfill script extended to find pre-transition (decimal-J) files on
disk via a second base-stem variant, then rename them to the sci form.
backfill_tags re-patches existing runs so their J tag matches the new
canonical form.
All 13 existing figs + runs renamed / retagged in-place.
The flow previously merged _DEFAULT_GENERATOR_KWARGS={random_state:0} and
n_samples=num_points into generator_kwargs BEFORE hashing. Prefect only
records the user-supplied form, so the web app's synth_output_paths
disagreed with the flow's output name — a plain swiss_roll run showed
'embedding: n/a' in the runs list despite completing, because the web
looked for the hash that excluded those defaults.
Now we keep the user-supplied generator_kwargs around for hashing +
metadata, and use the merged dict only for the actual generator call.
n_samples is already captured in the stem as 'N<n>', and random_state=0
is a flow constant — neither belongs in the semantic identity.
- run_args_hash now covers (embed_args, generator_kwargs). When gen_kwargs
is empty we still hash embed_args alone — so plain generators (s_curve,
plain swiss_roll) keep their stems and no existing plain-gen figs need
renaming. Kwargs-bearing variants (swiss_roll_hole, blobs,
gaussian_quantiles, classification) now disambiguate properly.
- Flow persists generator_kwargs into metrics.json meta AND into the
frames.json sidecar meta, so the label-enrichment path can find it
without another lookup.
- _enrich_with_labels discovers gen_kwargs in priority: payload meta -->
sibling metrics.json --> DATASET_META first-match. It matches the
DATASET_META entry by (path, kwargs) so swiss_roll_hole is no longer
confused for plain swiss_roll.
- _cached_frames overrides meta.stem with the URL-requested stem before
enrichment — after a backfill rename the sidecar's baked-in stem is
stale, and we were then failing to find the sibling metrics.json.
- Submit duplicate-check uses the new hash and keeps the hashless-legacy
check as a safety net.
- backfill_hashes.py rewritten: queries Prefect for each recent run's
full params, finds the matching fig under any of (current, legacy,
hashless) names, renames to the current scheme and patches
generator_kwargs into metrics.json.
Replaces Prefect's auto adjective-animal names with the same stem that
addresses the run's figs on disk, so runs are hoverable/searchable in the
Prefect UI by their identifying params. flow_run_name is a callable that
reads runtime.flow_run.parameters at scheduling time.
Stem grows an 8-hex sha1 digest of the (keys-sorted) embed_args dict, so
runs differing only in embed_args (e.g. UMAP n_neighbors=5 vs 15) now
produce distinct figs. The stem regex and parser both accept an optional
_<hash> tail so pre-hash figs still render in the runs list and compare
page; legacy filename is resolved on disk fallback.
Duplicate-submission check now rejects against BOTH the hashed and the
legacy hashless variant so users can't accidentally duplicate an old run
either.
Flow additionally writes a <stem>.frames.json sidecar next to the plotly
HTML (same shape as app/web/plotly_parse returns). Server prefers the
sidecar when present; falls back to parsing HTML for older runs. Sidecar
emission is non-critical — any failure just logs and keeps going.