From Gradients to Hessians: How Optimization Shapes Vision & ML
Interactive Figures
Spin, zoom, and inspect these plots to see how gradient and Hessian information shapes the local geometry and degeneracies we meet in vision pipelines.
Positive definite, indefinite, and semidefinite Hessians produce very different local landscapes.
Tangency captures the constrained optimum: the level set just kisses the constraint line when gradients align.
Flat directions show up as near-zero eigenvalues—ubiquitous in bundle adjustment, optical flow, and deep nets.
From Gradients to Hessians: How Optimization Shapes Vision & ML
Anchor text: Avriel, Nonlinear Programming: Analysis and Methods (Dover), Chapter 2.
Optimization isn’t academic trivia—it is the control room behind 3D reconstruction, optical flow, and billion-parameter networks. This note stitches together the classic conditions for extrema from Avriel with concrete computer-vision cases, including the scenarios where the Hessian turns degenerate and why that matters.
First-Order Condition: Where Extrema Can Even Happen
For a smooth function $f: \mathbb{R}^n \to \mathbb{R}$, any local extremum $x^*$ must satisfy the first-order necessary condition
\[\nabla f(x^*) = 0.\]Avriel’s Theorem 2.3 states that if $x^$ is a local minimum, then the Hessian at $x^$ is positive semidefinite:
\[z^\top \nabla^2 f(x^*) z \ge 0 \quad \text{for all } z.\]This only gives candidate points. To tell what kind of critical point you have reached, you have to look at curvature.
Second-Order Condition: The Hessian Tells the Local Shape
Let $H = \nabla^2 f(x^*)$. The quadratic form $h^\top H h$ measures the second-order change of $f$ along direction $h$.
- Positive definite (all eigenvalues $> 0$) $\Rightarrow$ bowl $\Rightarrow$ strict local minimum.
- Negative definite $\Rightarrow$ dome $\Rightarrow$ strict local maximum.
- Indefinite (mixed signs) $\Rightarrow$ saddle.
- Semidefinite with zero eigenvalues $\Rightarrow$ flat directions / degeneracy.
Avriel’s Theorem 2.2 (sufficient condition) says: if $\nabla f(x^) = 0$ and $H$ is positive definite, then $x^$ is a strict local minimum. Flip the inequalities for maxima.
Why “necessary” vs “sufficient” matters: Theorem 2.2 guarantees a minimum but can be too strong (it fails when there are flat directions). Theorem 2.3 must hold at any minimum, but by itself it doesn’t prove you have one.
A One-Line Family That Shows the Difference (Avriel’s Example 2.1.1)
Consider $f(x) = x^{2p}$ with $p \in \mathbb{Z}_{>0}$.
- $p = 1$: $f(x) = x^2$. We have $\nabla f(0) = 0$ and $f’‘(0) = 2 > 0$. Theorem 2.2 applies, so $0$ is a strict (also global) minimum.
- $p > 1$: $f(x) = x^{2p}$. We still have $\nabla f(0) = 0$, but $f’‘(0) = 0$. Theorem 2.2 fails (the Hessian is not positive definite), yet Theorem 2.3 holds (the Hessian is semidefinite). It is still a minimum—just flatter at the bottom.
Tip: simplify the second derivative before evaluating at $x=0$. For $p=1$, $f’‘(x) = (2p)(2p-1) x^{2p-2}$ reduces to $2$; there is no $0/0$ ambiguity.
When Semidefinite Isn’t Enough: Degenerate Saddles
A point can satisfy Theorem 2.3 and still not be a minimum when higher-order terms matter.
- Strict saddle: $f(x,y) = x^2 - y^2$. The gradient vanishes at the origin, but $H = \operatorname{diag}(2, -2)$ is indefinite, so Theorem 2.3 fails—definitely not a minimum.
- Degenerate saddle: $f(x,y) = x^4 - y^4$. The gradient and Hessian are zero at the origin, so Theorem 2.3 passes, but the function still drops along the $y$-axis. Necessary doesn’t mean sufficient.
Where Degenerate Hessians Show Up in Vision
- Bundle adjustment (SfM/SLAM): gauge freedoms. Reprojection error is unchanged by global translation/rotation and, in monocular setups, by global scale. The Hessian is rank-deficient (zero eigenvalues). Fixing a camera, point, or scale removes the degeneracy. (Triggs et al., IJCV 2000.)
- Optical flow: the aperture problem. Along a clean edge, motion parallel to the edge is unobservable, so the data-term Hessian is almost rank-1. Smoothness priors increase the rank. (Horn & Schunck, AI 1981.)
- Photometric problems (shape-from-shading, photometric stereo). Certain lighting/geometry combinations create flat valleys of equally good explanations. Regularization or additional illumination disambiguates.
- Deep networks: flat minima. Over-parameterization yields many near-zero Hessian eigenvalues; wide, flat minima often generalize better. (Dauphin et al., NeurIPS 2014.)
Practical Compass
- Use Theorem 2.2 when you can show $H \succ 0$ to certify strict minima.
- Use Theorem 2.3 to screen candidates: any minimum must satisfy $H \succeq 0$, but check higher-order terms or structural invariances to rule out degenerate saddles.
- In real vision pipelines, expect degeneracy wherever there is invariance (gauge freedoms) or missing information (aperture problem).
Suggested Figures
- Bowl vs. saddle vs. flat-bottom surfaces.
- Level-set and constraint tangency for Lagrange multipliers.
- Hessian spectrum with many near-zero eigenvalues.
References
- M. Avriel, Nonlinear Programming: Analysis and Methods, Dover. See Theorems 2.2 (Sufficient) and 2.3 (Necessary).
- B. Triggs et al., “Bundle Adjustment—A Modern Synthesis,” International Journal of Computer Vision, 2000.
- B. K. P. Horn & B. G. Schunck, “Determining Optical Flow,” Artificial Intelligence, 1981.
- Y. N. Dauphin et al., “Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization,” NeurIPS, 2014.
TL;DR
- Gradient zero gets you to the door.
- Hessian sign tells you what room you are in: bowl, dome, saddle, or flat.
- Vision problems breed degeneracy; handle it with gauges, priors, or extra cues.