Skip to the content.

Infinite Total Variation of Brownian Motion: Why the Path Length Diverges

This post provides a detailed mathematical analysis of total variation for Brownian motion. It builds on concepts from Mathematical Properties of Brownian Motion and connects to Itô Calculus.

Table of Contents

Introduction

One of the most striking properties of Brownian motion is its infinite total variation. Despite being continuous everywhere, a Brownian path has infinite arc length over any time interval, no matter how small.

This property is not just a mathematical curiosity—it’s the reason why:

  • Classical Riemann-Stieltjes integration fails for Brownian motion
  • We need specialized stochastic integrals (Itô, Stratonovich)
  • The $(dW)^2 = dt$ rule in stochastic calculus is fundamental

This post explores why total variation is infinite, what it means mathematically, and its implications for stochastic analysis.

What Is Total Variation?

Definition: The total variation of a function $f: [0, T] \to \mathbb{R}$ is:

\[\text{TV}(f) = \sup_{\Pi} \sum_{i=0}^{n-1} \lvert f(t_{i+1}) - f(t_i)\rvert\]

where the supremum is taken over all partitions $\Pi: 0 = t_0 < t_1 < \cdots < t_n = T$.

Intuition: Total variation measures the “total amount of change” or the “arc length” of the function’s graph.

For smooth functions: If $f$ is differentiable with bounded derivative:

\[\text{TV}(f) = \int_0^T \lvert f'(t)\rvert \, dt < \infty\]

For Brownian motion: With probability 1,

\[\text{TV}(W) = \lim_{\|\Delta\| \to 0} \sum_{i=0}^{n-1} \lvert W(t_{i+1}) - W(t_i)\rvert = \infty\]

where $|\Delta| = \max_i (t_{i+1} - t_i)$ is the mesh size of the partition.

Why Is Total Variation Infinite?

Intuitive Argument

Consider a partition of $[0, T]$ into $n$ equal intervals of size $\Delta t = T/n$.

The total variation over this partition is:

\[\text{TV}_n = \sum_{i=0}^{n-1} \lvert W(t_{i+1}) - W(t_i)\rvert = \sum_{i=0}^{n-1} \lvert \Delta W_i\rvert\]

Step 1: Distribution of each increment

Each $\Delta W_i = W(t_{i+1}) - W(t_i) \sim \mathcal{N}(0, \Delta t)$ is independent.

Step 2: Expected absolute value

For $Z \sim \mathcal{N}(0, \sigma^2)$:

\[\mathbb{E}[\lvert Z\rvert] = \sigma \sqrt{\frac{2}{\pi}}\]

So for our increment:

\[\mathbb{E}[\lvert \Delta W_i\rvert] = \sqrt{\Delta t} \cdot \sqrt{\frac{2}{\pi}} = \sqrt{\frac{2\Delta t}{\pi}}\]

Step 3: Expected total variation

By linearity of expectation:

\[\mathbb{E}[\text{TV}_n] = \sum_{i=0}^{n-1} \mathbb{E}[\lvert \Delta W_i\rvert] = n \cdot \sqrt{\frac{2\Delta t}{\pi}}\]

Substituting $\Delta t = T/n$:

\[\mathbb{E}[\text{TV}_n] = n \cdot \sqrt{\frac{2T}{n\pi}} = \sqrt{\frac{2nT}{\pi}}\]

Step 4: Limit as partition refines

As $n \to \infty$ (mesh size $\to 0$):

\[\mathbb{E}[\text{TV}_n] = \sqrt{\frac{2nT}{\pi}} \sim \sqrt{n} \to \infty\]

Conclusion: The expected total variation diverges as we refine the partition. With probability 1, the actual total variation also diverges.

Rigorous Proof Sketch

For a more rigorous argument, we need to show almost sure divergence, not just expectation.

Lemma (Lower bound): For any partition with mesh $|\Delta| \leq \delta$,

\[\mathbb{P}\left(\sum_{i=0}^{n-1} \lvert \Delta W_i\rvert \geq c\sqrt{n}\right) \geq 1 - \epsilon\]

for appropriate constants $c > 0$ and $\epsilon > 0$ depending on $\delta, T$.

Proof idea:

  1. Each $\lvert \Delta W_i\rvert$ has mean $\sqrt{\frac{2\Delta t}{\pi}}$ and variance $\Delta t(1 - \frac{2}{\pi})$
  2. By the law of large numbers, $\frac{1}{n}\sum \lvert \Delta W_i\rvert \to \sqrt{\frac{2\Delta t}{\pi}}$ in probability
  3. Therefore $\sum \lvert \Delta W_i\rvert \sim n\sqrt{\frac{2\Delta t}{\pi}} = \sqrt{\frac{2nT}{\pi}} \to \infty$

Stronger result: One can show using martingale techniques and concentration inequalities that:

\[\liminf_{n \to \infty} \frac{\text{TV}_n}{\sqrt{n}} > 0 \quad \text{(almost surely)}\]

This proves $\text{TV}_n \to \infty$ almost surely.

Contrast with Quadratic Variation

The contrast between total variation and quadratic variation is striking:

Property Formula Limit as mesh $\to 0$
Total Variation $\sum \lvert \Delta W\rvert$ $\infty$
Quadratic Variation $\sum (\Delta W)^2$ $T$ (finite!)

Why the Difference?

Scaling of increments: Each $\lvert \Delta W_i\rvert \sim \sqrt{\Delta t}$, so $(\Delta W_i)^2 \sim \Delta t$.

For a partition with $n$ intervals:

Total variation: \(\sum_{i=0}^{n-1} \lvert \Delta W_i\rvert \sim n \cdot \sqrt{\Delta t} = n \cdot \sqrt{\frac{T}{n}} = \sqrt{nT} \to \infty\)

Quadratic variation: \(\sum_{i=0}^{n-1} (\Delta W_i)^2 \sim n \cdot \Delta t = n \cdot \frac{T}{n} = T \quad \text{(converges!)}\)

The Competition

There’s a competition between:

  • Size of increments: $\lvert \Delta W\rvert \to 0$ as $\Delta t \to 0$
  • Number of increments: $n \to \infty$ as $\Delta t \to 0$

For total variation ($p = 1$):

  • Increments shrink like $(\Delta t)^{1/2}$
  • But we have $n \sim 1/\Delta t$ of them
  • Net effect: $n \cdot (\Delta t)^{1/2} \sim (\Delta t)^{-1/2} \to \infty$

For quadratic variation ($p = 2$):

  • Squared increments shrink like $\Delta t$
  • We have $n \sim 1/\Delta t$ of them
  • Net effect: $n \cdot \Delta t = 1$ (balanced!)

The p-Variation Framework

We can generalize to $p$-variation for any $p > 0$:

\[V_n^{(p)} = \sum_{i=0}^{n-1} \lvert W(t_{i+1}) - W(t_i)\rvert^p\]

Scaling Analysis

Since $\lvert \Delta W\rvert \sim (\Delta t)^{1/2}$:

\[V_n^{(p)} \sim \sum_{i=0}^{n-1} (\Delta t)^{p/2} = n \cdot (\Delta t)^{p/2}\]

Substituting $\Delta t = T/n$:

\[V_n^{(p)} \sim n \cdot \left(\frac{T}{n}\right)^{p/2} = T^{p/2} \cdot n^{1-p/2}\]

Behavior as $n \to \infty$

The exponent of $n$ is $1 - p/2$:

\[V_n^{(p)} \sim n^{1-p/2} \begin{cases} \to \infty & \text{if } 1 - p/2 > 0 \Leftrightarrow p < 2 \\ \to T^{p/2} & \text{if } 1 - p/2 = 0 \Leftrightarrow p = 2 \\ \to 0 & \text{if } 1 - p/2 < 0 \Leftrightarrow p > 2 \end{cases}\]

Summary table:

$p$ $p$-Variation Behavior
$0 < p < 2$ Infinite Diverges like $n^{1-p/2}$
$p = 2$ Finite Converges to $T$
$p > 2$ Zero Vanishes like $n^{1-p/2} \to 0$

Critical Exponent p = 2

Key observation: $p = 2$ is the unique critical exponent where Brownian motion has:

  • Finite (not infinite)
  • Non-zero (not vanishing)

variation.

This is deeply connected to:

  • Hölder continuity with exponent $1/2$
  • The $(dW)^2 = dt$ rule in Itô calculus
  • Quadratic variation being the “right” notion for rough paths

Mathematical interpretation: Brownian motion is exactly “rough enough” that its quadratic variation matters, but not so rough that higher-order variations contribute.

Comparison with Smooth Curves

Differentiable Functions

For a function $f: [0, T] \to \mathbb{R}$ with continuous derivative:

\[\text{TV}(f) = \int_0^T \lvert f'(t)\rvert \, dt < \infty\]

If $\lvert f’(t)\rvert \leq M$ for all $t$:

\[\text{TV}(f) \leq M \cdot T < \infty\]

Quadratic variation: For smooth $f$,

\[\sum (\Delta f)^2 \approx \sum [f'(t_i)]^2 (\Delta t)^2 \approx 0\]

as $\Delta t \to 0$.

Rectifiable Curves

A curve is rectifiable if it has finite arc length. This is equivalent to having bounded total variation.

Brownian motion is non-rectifiable—you cannot “straighten out” its graph to measure a finite length.

The Coastline Paradox

Physical analogy: Brownian paths exhibit the coastline paradox:

  • The more carefully you measure (finer partition), the longer the path becomes
  • There’s no well-defined “true length”
  • The measured length grows without bound as measurement precision increases

This is exactly like measuring a coastline: using a smaller ruler gives a longer measurement, with no convergence to a limiting value.

Summary Table

Property Smooth Curve Brownian Motion Discontinuous Jump
Continuous?
Differentiable?
Bounded variation? Maybe
Finite total variation? Maybe
Finite quadratic variation? ✓ (zero) ✓ (non-zero!)

Implications for Integration

Why Riemann-Stieltjes Integration Fails

The Riemann-Stieltjes integral $\int_0^T f(s) \, dg(s)$ can be defined pathwise when $g$ has bounded variation.

Construction: For a partition $\Pi$,

\[\sum_{i=0}^{n-1} f(\xi_i)[g(t_{i+1}) - g(t_i)]\]

converges as mesh $\to 0$ if $g$ has bounded variation.

Problem for Brownian motion: Since $\text{TV}(W) = \infty$, this construction fails.

Why?

  • The sum $\sum f(\xi_i) \Delta W_i$ does not converge pathwise
  • Different choices of $\xi_i \in [t_i, t_{i+1}]$ give different limits
  • There’s no well-defined pathwise integral

Example: Even for $f(s) = 1$ (constant function):

\[\sum_{i=0}^{n-1} 1 \cdot [W(t_{i+1}) - W(t_i)] = W(T) - W(0)\]

This converges, but:

\[\sum_{i=0}^{n-1} \lvert W(t_{i+1}) - W(t_i)\rvert \to \infty\]

The integral “barely” converges due to cancellation, but there’s no pathwise theory.

Need for Itô and Stratonovich Integrals

Because pathwise integration fails, we need new definitions:

Itô Integral: Defined via $L^2$ convergence, not pathwise:

\[\int_0^T f(s) \, dW(s) = \lim_{n \to \infty} \sum_{i=0}^{n-1} f(t_i)[W(t_{i+1}) - W(t_i)]\]

where the limit is in $L^2(\Omega)$ (mean-square convergence).

Key property: The Itô integral is a martingale.

Stratonovich Integral: Uses midpoint approximation:

\[\int_0^T f(s) \circ dW(s) = \lim_{n \to \infty} \sum_{i=0}^{n-1} f\left(\frac{t_i + t_{i+1}}{2}\right)[W(t_{i+1}) - W(t_i)]\]

Key property: Ordinary chain rule applies (no Itô correction).

Relationship: The two integrals differ by a correction term:

\[\int_0^T f(s) \circ dW(s) = \int_0^T f(s) \, dW(s) + \frac{1}{2}\int_0^T f'(s) \, ds\]

For details, see Itô Calculus: Why We Need New Rules for SDEs.

Why $(dW)^2 = dt$ Matters

The fact that quadratic variation is finite (and equals $T$) is what makes stochastic integration possible.

In Itô’s lemma, the term:

\[\frac{1}{2} g^2 \frac{\partial^2 h}{\partial x^2} \, dt\]

arises precisely because $(dW)^2 = dt$ is first-order, not negligible.

If total variation were finite: We could use ordinary calculus, and this term would vanish.

If quadratic variation were infinite: Even Itô calculus wouldn’t work—we’d need rough path theory or other advanced tools.

Goldilocks principle: Brownian motion is “just rough enough” that $(dW)^2$ matters but $(dW)^p$ for $p > 2$ doesn’t.

Practical Consequences

No Well-Defined Arc Length

Physical interpretation: A Brownian particle has traveled an infinite distance in any finite time interval, even though its displacement is finite.

Mathematical: You cannot meaningfully ask “how far has the particle moved?” only “what is its net displacement?”

Simulation Artifacts

When simulating Brownian motion with time step $\Delta t$:

\[W_{\text{sim}}(t_{i+1}) = W_{\text{sim}}(t_i) + \sqrt{\Delta t} \cdot Z_i\]

where $Z_i \sim \mathcal{N}(0, 1)$.

Computed path length:

\[\text{TV}_{\text{sim}} \approx \sum \lvert W_{\text{sim}}(t_{i+1}) - W_{\text{sim}}(t_i)\rvert \sim \sqrt{n} \sim \frac{1}{\sqrt{\Delta t}}\]

Key observation:

  • Smaller $\Delta t$ gives longer path length!
  • The simulated path never converges to a well-defined length
  • This is not a bug—it’s correctly capturing the infinite variation

Implication for visualization: Any plot of Brownian motion is necessarily a “smoothed” version. The true path is infinitely more wiggly.

Accumulation of Small Changes

Even though each increment $\lvert \Delta W\rvert$ is tiny ($\sim \sqrt{\Delta t}$), their sum diverges.

Intuition:

  • Individual changes are small: $\lvert \Delta W\rvert \sim \sqrt{\Delta t} \to 0$
  • But there are infinitely many of them: $n \sim 1/\Delta t \to \infty$
  • The accumulation outpaces the shrinking: $n \cdot \sqrt{\Delta t} \sim 1/\sqrt{\Delta t} \to \infty$

This is the essence of Brownian roughness: Continuous, but with unbounded accumulated change.

The Bigger Picture

Brownian Motion’s Special Status

Brownian motion occupies a unique position in the hierarchy of functions:

Smoothness hierarchy:

  1. $C^\infty$ (smooth) → Finite total variation, zero quadratic variation
  2. $C^1$ (differentiable) → Finite total variation, zero quadratic variation
  3. Lipschitz (Hölder-1) → Finite total variation, zero quadratic variation
  4. Brownian motion (Hölder-1/2)Infinite total variation, finite non-zero quadratic variation
  5. Hölder-$\alpha$ for $\alpha < 1/2$ → More variation
  6. Jump discontinuities → Finite or infinite variation depending on jumps

Critical threshold: The boundary at Hölder-1/2 is where total variation changes from finite to infinite.

Why p = 2 Is Special

The fact that $p = 2$ is the critical exponent is deeply connected to:

1. Gaussian scaling: $\text{Var}[W(t)] = t$ gives quadratic growth

2. Central Limit Theorem: Brownian motion as a limit of random walks

3. Harmonic analysis: The Fourier transform of Brownian motion involves $\omega^{-1}$ (one derivative)

4. Energy considerations: Quadratic variation corresponds to “energy” in physics

5. Itô isometry: $\mathbb{E}\left[\left(\int f \, dW\right)^2\right] = \mathbb{E}\left[\int f^2 \, dt\right]$

Philosophical Takeaway

Infinite total variation is not a pathology—it’s the mathematical signature of true randomness accumulated continuously over time.

Key insights:

  • Continuous functions can have infinite arc length
  • Randomness creates roughness even in continuous processes
  • The “right” notion of variation for random processes is quadratic, not linear
  • Classical calculus breaks down precisely when variation becomes infinite
  • New tools (Itô calculus) are necessary when dealing with genuine randomness

Further Reading

Books on Brownian Motion:

  • Mörters & Peres, Brownian Motion (Chapter 1: path properties, includes rigorous proof of infinite variation)
  • Karatzas & Shreve, Brownian Motion and Stochastic Calculus (Section 1.6: quadratic variation)
  • Revuz & Yor, Continuous Martingales and Brownian Motion (Chapter 1: general theory)

Rough Path Theory (for $p$-variation with $p < 2$):

  • Friz & Hairer, A Course on Rough Paths
  • Lyons, Differential Equations Driven by Rough Signals

Stochastic Integration:

  • Øksendal, Stochastic Differential Equations (Chapter 3: Itô integral)
  • Protter, Stochastic Integration and Differential Equations (comprehensive treatment)

Original Papers:

  • Paley & Wiener (1934), “Fourier Transforms in the Complex Domain” (path properties)
  • Lévy (1940), “Le mouvement brownien plan” (quadratic variation)

Related Posts:


The bottom line: Infinite total variation is what distinguishes Brownian motion from ordinary smooth functions. It’s the reason we need stochastic calculus, the source of the $(dW)^2 = dt$ rule, and the mathematical manifestation of genuine continuous-time randomness. Understanding this property is essential for anyone working with stochastic processes, from pure mathematics to financial modeling to machine learning with diffusion models.

Keep Reading