embedding notebook — drift & projection

metrics ↓
§ 0 introduction scope impact of data drift on dimension reduction

What this is. Dimensionality reduction is a workhorse for both exploratory visualization and downstream prediction, yet the stability of its output under small perturbations of the input is rarely examined directly. This notebook takes a narrow, empirical approach: a three-dimensional point cloud (§ 1) is perturbed by a controlled amount at each of a short sequence of timesteps, the selected reducer (§ 2) is applied independently to every snapshot, and the resulting trajectory of two-dimensional embeddings is recorded.

What it measures. Two stability views are logged alongside each run and plotted on the metrics page. Per-timestep travel — ‖ y(t) − y(t−1) ‖ — captures how much the 2-D layout moves between consecutive frames. kNN retention captures how much of the input-space neighborhood graph survives projection. Together they separate reducers that are globally stable but locally noisy from those with the opposite failure mode.

Why this matters. A reducer that looks well-behaved on a single snapshot is not automatically the right tool for a streaming or longitudinal setting. Used as the substrate for a visualization, frame-to-frame motion will read as change the user did not request; used as a feature-extraction step inside a classification pipeline, drift between training and inference will quietly erode accuracy. The aim here is to build intuition for those regimes before committing the reducer to either role.

t = 0, 1, … T φ per snapshot snapshots · Xₜ ⊂ ℝ³ embedded trajectory · Yₜ ⊂ ℝ²
§ 1 input dataset generator

Six candidate generators for the embedding pipeline. Drag to rotate, scroll to zoom,   or 1 … 6 to select.

n samples
noise σ
timesteps

Dimensionality reduction applied to each snapshot. Only reducers whose Python package is importable are shown.

    {% for r in reducers %}
  • {% endfor %}
{% include "_reducer_form.html" with context %}
dispatching…
pick 2–8 embeddings → side-by-side animation in a new tab
dataset
algorithm
N
T
J
{% include "_runs.html" with context %}
§ 5 stability metrics view travel · drift · kNN@10
dataset
algorithm
travel stat
0 / 0 runs
frame-to-frame travel ‖ y(t) − y(t−1) ‖  ·  output 2-D space
vs-initial travel ‖ y(t) − y(0) ‖  ·  drift from first timestep
kNN@10 retention fraction of each point's 10 nearest input-space neighbors preserved in 2-D (higher = more faithful)
embedding