embedding notebook

§ 0 introduction scope impact of data drift on dimension reduction

What this is. Dimensionality reduction is a workhorse for both exploratory visualization and downstream prediction, yet the stability of its output under small perturbations of the input is rarely examined directly. This notebook takes a narrow, empirical approach: a three-dimensional point cloud (§ 1) is perturbed by a controlled amount at each of a short sequence of timesteps, the selected reducer (§ 2) is applied independently to every snapshot, and the resulting trajectory of two-dimensional embeddings is recorded.

What it measures. Two stability views are logged alongside each run and plotted on the metrics page. Per-timestep travel — ‖ y(t) − y(t−1) ‖ — captures how much the 2-D layout moves between consecutive frames. kNN retention captures how much of the input-space neighborhood graph survives projection. Together they separate reducers that are globally stable but locally noisy from those with the opposite failure mode.

Why this matters. A reducer that looks well-behaved on a single snapshot is not automatically the right tool for a streaming or longitudinal setting. Used as the substrate for a visualization, frame-to-frame motion will read as change the user did not request; used as a feature-extraction step inside a classification pipeline, drift between training and inference will quietly erode accuracy. The aim here is to build intuition for those regimes before committing the reducer to either role.

§ 1 input dataset generator —

Six candidate generators for the embedding pipeline. Drag to rotate, scroll to zoom, ← → or 1 … 6 to select.

loading samples…

§ 4 recent runs {{ runs|length }} / 10 · refresh 3s ●

pick 2–8 embeddings → side-by-side animation in a new tab

{% include "_runs.html" with context %}

§ 5 stability metrics view travel · drift · kNN@10

0 / 0 runs

frame-to-frame travel ‖ y(t) − y(t−1) ‖ · output 2-D space

vs-initial travel ‖ y(t) − y(0) ‖ · drift from first timestep

kNN@10 retention fraction of each point's 10 nearest input-space neighbors preserved in 2-D (higher = more faithful)

No metrics to show. Dispatch a run above — sidecar JSONs appear in figs/ after the flow completes.