- run_args_hash now covers (embed_args, generator_kwargs). When gen_kwargs is empty we still hash embed_args alone — so plain generators (s_curve, plain swiss_roll) keep their stems and no existing plain-gen figs need renaming. Kwargs-bearing variants (swiss_roll_hole, blobs, gaussian_quantiles, classification) now disambiguate properly. - Flow persists generator_kwargs into metrics.json meta AND into the frames.json sidecar meta, so the label-enrichment path can find it without another lookup. - _enrich_with_labels discovers gen_kwargs in priority: payload meta --> sibling metrics.json --> DATASET_META first-match. It matches the DATASET_META entry by (path, kwargs) so swiss_roll_hole is no longer confused for plain swiss_roll. - _cached_frames overrides meta.stem with the URL-requested stem before enrichment — after a backfill rename the sidecar's baked-in stem is stale, and we were then failing to find the sibling metrics.json. - Submit duplicate-check uses the new hash and keeps the hashless-legacy check as a safety net. - backfill_hashes.py rewritten: queries Prefect for each recent run's full params, finds the matching fig under any of (current, legacy, hashless) names, renames to the current scheme and patches generator_kwargs into metrics.json. |
||
|---|---|---|
| app | ||
| flows | ||
| scripts | ||
| .gitignore | ||
| clean.sh | ||
| makefile | ||
| pyproject.toml | ||
| README.md | ||
| requirements-frozen.txt | ||
| uv.lock | ||
Dimension Reduction Lab
A Python project exploring various dimension reduction techniques using Prefect for workflow orchestration.
Overview
This project serves as an experimental sandbox for studying dimensionality reduction and embedding algorithms within a reproducible environment. The primary goal is to evaluate and compare different techniques (like UMAP, t-SNE, PaCMAP, and TriMap) while focusing on their stability characteristics, particularly in the context of changing or drifting data distributions. By leveraging Prefect's workflow management capabilities, we can systematically analyze how these algorithms perform across arbitrary datasets, track their behavior over time, and measure their sensitivity to various hyperparameters and data perturbations.
Requirements
The project uses several key dependencies (as seen in requirements.frozen.txt):
Package Management
This project uses UV (μv) as its package manager, a fast Python package installer and resolver written in Rust. The requirements.frozen.txt file was generated using UV to ensure reproducible dependencies.
To update dependencies:
uv pip compile pyproject.toml (--all-extras) -o requirements.frozen.txt
Modifying --all-extras to include either an individual optional dependency group or all of them. See the pyproject.toml file for more information.
This project uses Prefect for workflow orchestration, for it's lightweight approach to running experiments from a UI and compatibility with single-node deployments.