KL Divergence, Wasserstein Distance, FID, and KID in Generative Models
Mathematical intuition plus practical evaluation guidance for image generation pipelines.
Knowledge Hub
Mathematical intuition plus practical evaluation guidance for image generation pipelines.
A research-oriented systems article on objective functions, reproducibility, and deployment transfer.
A generative model is successful only if samples from its learned distribution \(p_g(x)\) are close to the true data distribution \(p_r(x)\). In research terms, this is a two-part problem: estimate divergence in high dimensions, and preserve semantic fidelity under finite-sample evaluation.
KL divergence is asymmetric, which creates different optimization biases in training dynamics.
Forward KL is mode-covering; reverse KL is mode-seeking. This maps directly to coverage versus sharpness tradeoffs in generated samples.
When support overlap is weak, Wasserstein distance provides smoother optimization geometry than KL-like divergences.
FID compares Gaussian approximations of feature embeddings; KID estimates kernel MMD with a finite-sample unbiased estimator.
Do not optimize one metric in isolation. Use triangulation and guard against hidden regressions.
Industry system design and PhD-style systems research solve similar architecture problems, but they are optimized for different outcomes. Industry optimizes for continuity, scale, and operational efficiency. Research optimizes for explainability, novelty, and defensible evidence.
Industry pipelines prioritize changing distributions and imperfect labels. Research pipelines prioritize controlled comparability and ablation interpretability.
Industry emphasizes SLA and A/B outcomes. Research emphasizes confidence intervals, effect sizes, and baseline rigor.
Industry optimizes for shorter release cycles and sustained uptime. Research accepts slower cycles when needed to isolate causality and produce reproducible claims.
In research, reproducibility is first-class: results should remain stable across random seeds, different splits, and hardware variation. In industry, the same discipline improves rollback confidence and deployment trust.
Start with a specific hypothesis, build the smallest system that can falsify it, stress the design under realistic perturbations, and then update architecture using a documented failure taxonomy. This gives research depth without losing engineering relevance.