From Gradients to Hessians: How Optimization Shapes Vision & ML
Interactive Figures
Spin, zoom, and inspect these plots to see how gradient and Hessian information shapes the local geometry and degeneracies we meet in vision pipelines.
Table of Contents
- From Gradients to Hessians: How Optimization Shapes Vision & ML
- First-Order Condition: Where Extrema Can Even Happen
- Second-Order Condition: The Hessian Tells the Local Shape
- A One-Line Family That Shows the Difference (Avriel’s Example 2.1.1)
- When Semidefinite Isn’t Enough: Degenerate Saddles
- Where Degenerate Hessians Show Up in Vision
- Practical Compass
- Suggested Figures
- References
- TL;DR
Positive definite, indefinite, and semidefinite Hessians produce very different local landscapes.
Tangency captures the constrained optimum: the level set just kisses the constraint line when gradients align.
Flat directions show up as near-zero eigenvalues—ubiquitous in bundle adjustment, optical flow, and deep nets.
From Gradients to Hessians: How Optimization Shapes Vision & ML
Anchor text: Avriel, Nonlinear Programming: Analysis and Methods (Dover), Chapter 2.
Optimization isn’t academic trivia—it is the control room behind 3D reconstruction, optical flow, and billion-parameter networks. This note stitches together the classic conditions for extrema from Avriel with concrete computer-vision cases, including the scenarios where the Hessian turns degenerate and why that matters.
Related Posts:
- Matrix Determinants and Leibniz Theorem - The determinant of the Hessian matrix determines the type of critical point
- Signed Volume: Geometric Interpretation - Understanding what the sign of det(Hessian) tells us about curvature
- Bijective Functions: The Perfect Correspondence - Invertibility and bijection in the context of optimization
- Why Intersection Fails in Lagrange Multipliers - Related constrained optimization concepts
First-Order Condition: Where Extrema Can Even Happen
For a smooth function $f: \mathbb{R}^n \to \mathbb{R}$, any local extremum $x^*$ must satisfy the first-order necessary condition
\[\nabla f(x^*) = 0.\]Avriel’s Theorem 2.3 states that if $x^$ is a local minimum, then the Hessian at $x^$ is positive semidefinite:
\[z^\top \nabla^2 f(x^*) z \ge 0 \quad \text{for all } z.\]This only gives candidate points. To tell what kind of critical point you have reached, you have to look at curvature.
Second-Order Condition: The Hessian Tells the Local Shape
Let $H = \nabla^2 f(x^*)$. The quadratic form $h^\top H h$ measures the second-order change of $f$ along direction $h$.
- Positive definite (all eigenvalues $> 0$) $\Rightarrow$ bowl $\Rightarrow$ strict local minimum.
- Negative definite $\Rightarrow$ dome $\Rightarrow$ strict local maximum.
- Indefinite (mixed signs) $\Rightarrow$ saddle.
- Semidefinite with zero eigenvalues $\Rightarrow$ flat directions / degeneracy.
Avriel’s Theorem 2.2 (sufficient condition) says: if $\nabla f(x^) = 0$ and $H$ is positive definite, then $x^$ is a strict local minimum. Flip the inequalities for maxima.
Why “necessary” vs “sufficient” matters: Theorem 2.2 guarantees a minimum but can be too strong (it fails when there are flat directions). Theorem 2.3 must hold at any minimum, but by itself it doesn’t prove you have one.
A One-Line Family That Shows the Difference (Avriel’s Example 2.1.1)
Consider $f(x) = x^{2p}$ with $p \in \mathbb{Z}_{>0}$.
- $p = 1$: $f(x) = x^2$. We have $\nabla f(0) = 0$ and $f’‘(0) = 2 > 0$. Theorem 2.2 applies, so $0$ is a strict (also global) minimum.
- $p > 1$: $f(x) = x^{2p}$. We still have $\nabla f(0) = 0$, but $f’‘(0) = 0$. Theorem 2.2 fails (the Hessian is not positive definite), yet Theorem 2.3 holds (the Hessian is semidefinite). It is still a minimum—just flatter at the bottom.
Tip: simplify the second derivative before evaluating at $x=0$. For $p=1$, $f’‘(x) = (2p)(2p-1) x^{2p-2}$ reduces to $2$; there is no $0/0$ ambiguity.
When Semidefinite Isn’t Enough: Degenerate Saddles
A point can satisfy Theorem 2.3 and still not be a minimum when higher-order terms matter.
- Strict saddle: $f(x,y) = x^2 - y^2$. The gradient vanishes at the origin, but $H = \operatorname{diag}(2, -2)$ is indefinite, so Theorem 2.3 fails—definitely not a minimum.
- Degenerate saddle: $f(x,y) = x^4 - y^4$. The gradient and Hessian are zero at the origin, so Theorem 2.3 passes, but the function still drops along the $y$-axis. Necessary doesn’t mean sufficient.
Where Degenerate Hessians Show Up in Vision
- Bundle adjustment (SfM/SLAM): gauge freedoms. Reprojection error is unchanged by global translation/rotation and, in monocular setups, by global scale. The Hessian is rank-deficient (zero eigenvalues). Fixing a camera, point, or scale removes the degeneracy. (Triggs et al., IJCV 2000.)
- Optical flow: the aperture problem. Along a clean edge, motion parallel to the edge is unobservable, so the data-term Hessian is almost rank-1. Smoothness priors increase the rank. (Horn & Schunck, AI 1981.)
- Photometric problems (shape-from-shading, photometric stereo). Certain lighting/geometry combinations create flat valleys of equally good explanations. Regularization or additional illumination disambiguates.
- Deep networks: flat minima. Over-parameterization yields many near-zero Hessian eigenvalues; wide, flat minima often generalize better. (Dauphin et al., NeurIPS 2014.)
Practical Compass
- Use Theorem 2.2 when you can show $H \succ 0$ to certify strict minima.
- Use Theorem 2.3 to screen candidates: any minimum must satisfy $H \succeq 0$, but check higher-order terms or structural invariances to rule out degenerate saddles.
- In real vision pipelines, expect degeneracy wherever there is invariance (gauge freedoms) or missing information (aperture problem).
Suggested Figures
- Bowl vs. saddle vs. flat-bottom surfaces.
- Level-set and constraint tangency for Lagrange multipliers.
- Hessian spectrum with many near-zero eigenvalues.
References
- M. Avriel, Nonlinear Programming: Analysis and Methods, Dover. See Theorems 2.2 (Sufficient) and 2.3 (Necessary).
- B. Triggs et al., “Bundle Adjustment—A Modern Synthesis,” International Journal of Computer Vision, 2000.
- B. K. P. Horn & B. G. Schunck, “Determining Optical Flow,” Artificial Intelligence, 1981.
- Y. N. Dauphin et al., “Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization,” NeurIPS, 2014.
TL;DR
- Gradient zero gets you to the door.
- Hessian sign tells you what room you are in: bowl, dome, saddle, or flat.
- Vision problems breed degeneracy; handle it with gauges, priors, or extra cues.