Commit Graph

9 Commits

Author SHA1 Message Date
Michael Pilosov
b744c48348 stems: fold generator_kwargs into the hash; fix swiss_roll vs hole ambiguity
- run_args_hash now covers (embed_args, generator_kwargs). When gen_kwargs
  is empty we still hash embed_args alone — so plain generators (s_curve,
  plain swiss_roll) keep their stems and no existing plain-gen figs need
  renaming. Kwargs-bearing variants (swiss_roll_hole, blobs,
  gaussian_quantiles, classification) now disambiguate properly.
- Flow persists generator_kwargs into metrics.json meta AND into the
  frames.json sidecar meta, so the label-enrichment path can find it
  without another lookup.
- _enrich_with_labels discovers gen_kwargs in priority: payload meta -->
  sibling metrics.json --> DATASET_META first-match. It matches the
  DATASET_META entry by (path, kwargs) so swiss_roll_hole is no longer
  confused for plain swiss_roll.
- _cached_frames overrides meta.stem with the URL-requested stem before
  enrichment — after a backfill rename the sidecar's baked-in stem is
  stale, and we were then failing to find the sibling metrics.json.
- Submit duplicate-check uses the new hash and keeps the hashless-legacy
  check as a safety net.
- backfill_hashes.py rewritten: queries Prefect for each recent run's
  full params, finds the matching fig under any of (current, legacy,
  hashless) names, renames to the current scheme and patches
  generator_kwargs into metrics.json.
2026-04-22 16:30:42 -06:00
Michael Pilosov
47f56b57c8 flow: name each Prefect run after its output stem (gen_emb_N_T_J_s_hash)
Replaces Prefect's auto adjective-animal names with the same stem that
addresses the run's figs on disk, so runs are hoverable/searchable in the
Prefect UI by their identifying params. flow_run_name is a callable that
reads runtime.flow_run.parameters at scheduling time.
2026-04-22 16:01:49 -06:00
Michael Pilosov
fe49565651 stems: include embed_args hash in output filename + emit frames.json sidecar
Stem grows an 8-hex sha1 digest of the (keys-sorted) embed_args dict, so
runs differing only in embed_args (e.g. UMAP n_neighbors=5 vs 15) now
produce distinct figs. The stem regex and parser both accept an optional
_<hash> tail so pre-hash figs still render in the runs list and compare
page; legacy filename is resolved on disk fallback.

Duplicate-submission check now rejects against BOTH the hashed and the
legacy hashless variant so users can't accidentally duplicate an old run
either.

Flow additionally writes a <stem>.frames.json sidecar next to the plotly
HTML (same shape as app/web/plotly_parse returns). Server prefers the
sidecar when present; falls back to parsing HTML for older runs. Sidecar
emission is non-critical — any failure just logs and keeps going.
2026-04-22 15:52:39 -06:00
Michael Pilosov
a33f8f07cb cap embedding-flow runner at 1 concurrent run 2026-04-21 22:19:55 -06:00
Michael Pilosov
3280410405 metrics stored (2x) 2026-04-21 20:41:17 -06:00
Michael Pilosov
230c3032e5 parallelism limits 2026-04-21 20:16:33 -06:00
Michael Pilosov
92069a3c91 rename snapshots -> timesteps 2026-04-21 19:55:01 -06:00
Michael Pilosov
7a6e92b31c normalization for relative jitter 2026-04-21 18:22:51 -06:00
Michael Pilosov
708157c1ef some minor upgrades to prefect syntax 2026-04-21 18:02:39 -06:00