Nonlinear Extensions: EKF, UKF, and Particle Filters
This is Part 7 of an 8-part series on Kalman Filtering. Part 6 explored real-world applications.
Beyond Linear-Gaussian: The Real World
The standard Kalman filter assumes linear dynamics and Gaussian noise – elegant mathematical assumptions that make optimal estimation tractable. But real-world systems are often stubbornly nonlinear:
- Robot motion: Turning involves trigonometric functions
- Satellite orbits: Governed by nonlinear gravitational dynamics
- Chemical processes: Exponential reaction rates
- Economic systems: Nonlinear feedback loops
This post explores three major approaches for handling nonlinearity while preserving the recursive, real-time nature that makes Kalman filtering so valuable.
The Challenge of Nonlinearity
When Linearity Breaks Down
Consider a robot turning with angular velocity ω:
True nonlinear dynamics:
x(k+1) = x(k) + v*cos(θ(k))*dt
y(k+1) = y(k) + v*sin(θ(k))*dt
θ(k+1) = θ(k) + ω*dt
Measurement model (bearing and range to landmark):
range = √[(x_landmark - x)² + (y_landmark - y)²]
bearing = atan2(y_landmark - y, x_landmark - x) - θ
These equations can’t be expressed as linear matrix operations x = Fx + w
and z = Hx + v
.
Why Standard Kalman Filter Fails
- Gaussian distributions aren’t preserved through nonlinear transformations
- Covariance propagation requires linearization or approximation
- Optimality guarantees no longer hold
- Divergence can occur from accumulated linearization errors
Extended Kalman Filter (EKF)
The Core Idea: Linearize Locally
The EKF linearizes nonlinear functions around the current state estimate using Taylor series approximation.
Mathematical Framework
Nonlinear system model:
x(k) = f(x(k-1), u(k-1)) + w(k-1)
z(k) = h(x(k)) + v(k)
Linearization:
F(k) = ∂f/∂x |_(x̂(k-1|k-1)) # Jacobian of state transition
H(k) = ∂h/∂x |_(x̂(k|k-1)) # Jacobian of measurement function
EKF Algorithm
Prediction Step:
def ekf_predict(x_est, P, u, Q, f_function, F_jacobian, dt):
"""EKF Prediction step"""
# Nonlinear state prediction
x_pred = f_function(x_est, u, dt)
# Linearize around current estimate
F = F_jacobian(x_est, u, dt)
# Linear covariance prediction
P_pred = F @ P @ F.T + Q
return x_pred, P_pred
def ekf_update(x_pred, P_pred, z, R, h_function, H_jacobian):
"""EKF Update step"""
# Nonlinear measurement prediction
z_pred = h_function(x_pred)
# Linearize around predicted state
H = H_jacobian(x_pred)
# Innovation and covariance
y = z - z_pred
S = H @ P_pred @ H.T + R
# Kalman gain and update (same as linear case)
K = P_pred @ H.T @ np.linalg.inv(S)
x_est = x_pred + K @ y
P_est = (np.eye(len(x_pred)) - K @ H) @ P_pred
return x_est, P_est
Robot Localization Example
class RobotEKF:
def __init__(self):
# State: [x, y, θ] - position and heading
self.state_dim = 3
def motion_model(self, state, control, dt):
"""Nonlinear motion model"""
x, y, theta = state[0], state[1], state[2]
v, w = control[0], control[1] # velocity and angular rate
return np.array([
x + v * np.cos(theta) * dt,
y + v * np.sin(theta) * dt,
theta + w * dt
])
def motion_jacobian(self, state, control, dt):
"""Jacobian of motion model"""
x, y, theta = state[0], state[1], state[2]
v, w = control[0], control[1]
return np.array([
[1, 0, -v * np.sin(theta) * dt],
[0, 1, v * np.cos(theta) * dt],
[0, 0, 1]
])
def measurement_model(self, state, landmark_pos):
"""Range and bearing to landmark"""
x, y, theta = state[0], state[1], state[2]
lx, ly = landmark_pos[0], landmark_pos[1]
dx, dy = lx - x, ly - y
r = np.sqrt(dx**2 + dy**2)
phi = np.arctan2(dy, dx) - theta
return np.array([r, phi])
def measurement_jacobian(self, state, landmark_pos):
"""Jacobian of measurement model"""
x, y, theta = state[0], state[1], state[2]
lx, ly = landmark_pos[0], landmark_pos[1]
dx, dy = lx - x, ly - y
r = np.sqrt(dx**2 + dy**2)
return np.array([
[-dx/r, -dy/r, 0],
[ dy/r**2, -dx/r**2, -1]
])
EKF Advantages
- Familiar structure: Similar to linear Kalman filter
- Computational efficiency: O(n³) complexity like linear case
- Well-established: Decades of successful applications
- Good performance: Works well for mildly nonlinear systems
EKF Limitations
- Linearization errors: Can accumulate and cause divergence
- Jacobian computation: Requires analytical derivatives
- Local optimality only: Can get stuck in local minima
- Gaussian assumption: Still assumes Gaussian distributions
Unscented Kalman Filter (UKF)
The Innovation: Deterministic Sampling
Rather than linearizing nonlinear functions, the UKF uses the Unscented Transform – a deterministic sampling technique that propagates carefully chosen “sigma points” through nonlinear transformations.
Key Insight
“It’s easier to approximate a probability distribution than it is to approximate an arbitrary nonlinear function.” - Jeffrey Uhlmann
The Unscented Transform
Step 1: Generate Sigma Points
def generate_sigma_points(mean, cov, alpha=1e-3, beta=2, kappa=None):
"""Generate sigma points for unscented transform"""
n = len(mean)
if kappa is None:
kappa = 3 - n
lambda_ = alpha**2 * (n + kappa) - n
# Compute square root of covariance
sqrt_cov = np.linalg.cholesky((n + lambda_) * cov)
# Generate sigma points
sigma_points = np.zeros((2*n + 1, n))
sigma_points[0] = mean
for i in range(n):
sigma_points[i + 1] = mean + sqrt_cov[i]
sigma_points[n + i + 1] = mean - sqrt_cov[i]
# Compute weights
wm = np.zeros(2*n + 1) # weights for means
wc = np.zeros(2*n + 1) # weights for covariances
wm[0] = lambda_ / (n + lambda_)
wc[0] = lambda_ / (n + lambda_) + 1 - alpha**2 + beta
for i in range(1, 2*n + 1):
wm[i] = wc[i] = 0.5 / (n + lambda_)
return sigma_points, wm, wc
Step 2: Transform Sigma Points
def unscented_transform(sigma_points, wm, wc, noise_cov, transform_func):
"""Apply unscented transform"""
# Transform each sigma point
transformed_points = np.array([transform_func(sp) for sp in sigma_points])
# Compute transformed mean
transformed_mean = np.sum(wm[:, np.newaxis] * transformed_points, axis=0)
# Compute transformed covariance
centered_points = transformed_points - transformed_mean
transformed_cov = noise_cov.copy()
for i in range(len(wc)):
transformed_cov += wc[i] * np.outer(centered_points[i], centered_points[i])
return transformed_mean, transformed_cov, transformed_points
UKF Algorithm
class UnscentedKalmanFilter:
def __init__(self, state_dim, measurement_dim, alpha=1e-3, beta=2, kappa=None):
self.n = state_dim
self.m = measurement_dim
self.alpha = alpha
self.beta = beta
self.kappa = kappa if kappa is not None else 3 - state_dim
def predict(self, mean, cov, process_noise, process_model, dt):
"""UKF Prediction step"""
# Generate sigma points
sigma_points, wm, wc = generate_sigma_points(mean, cov, self.alpha, self.beta, self.kappa)
# Propagate sigma points through process model
predicted_mean, predicted_cov, _ = unscented_transform(
sigma_points, wm, wc, process_noise,
lambda sp: process_model(sp, dt)
)
return predicted_mean, predicted_cov
def update(self, predicted_mean, predicted_cov, measurement, measurement_noise, measurement_model):
"""UKF Update step"""
# Generate sigma points from prediction
sigma_points, wm, wc = generate_sigma_points(predicted_mean, predicted_cov,
self.alpha, self.beta, self.kappa)
# Transform through measurement model
z_pred, S, transformed_points = unscented_transform(
sigma_points, wm, wc, measurement_noise, measurement_model
)
# Compute cross-covariance
cross_cov = np.zeros((self.n, self.m))
for i in range(len(wc)):
cross_cov += wc[i] * np.outer(sigma_points[i] - predicted_mean,
transformed_points[i] - z_pred)
# Kalman gain and update
K = cross_cov @ np.linalg.inv(S)
updated_mean = predicted_mean + K @ (measurement - z_pred)
updated_cov = predicted_cov - K @ S @ K.T
return updated_mean, updated_cov
UKF Advantages
- No Jacobians required: Uses sigma point sampling instead of derivatives
- Higher-order accuracy: Captures nonlinearities up to 3rd order
- Same computational complexity: O(n³) like EKF and linear KF
- More robust: Better handling of strong nonlinearities
- Easy implementation: Straightforward algorithm structure
UKF Limitations
- Still assumes Gaussian: Underlying distributions assumed Gaussian
- Parameter tuning: Alpha, beta, kappa parameters need tuning
- Numerical stability: Can have issues with ill-conditioned covariances
- Memory overhead: Requires storing multiple sigma points
Particle Filters
When Gaussians Aren’t Enough
For highly nonlinear systems with non-Gaussian noise, we need a more general approach. Particle filters use Monte Carlo methods to represent arbitrary probability distributions.
Core Concept: Weighted Particles
Instead of representing distributions with mean and covariance, use a set of weighted samples (particles):
p(x) ≈ ∑(i=1 to N) w_i * δ(x - x_i)
Where:
x_i
are particle locationsw_i
are particle weightsN
is the number of particlesδ()
is the Dirac delta function
Sequential Importance Resampling (SIR)
class ParticleFilter:
def __init__(self, num_particles, state_dim):
self.N = num_particles
self.particles = np.random.randn(num_particles, state_dim)
self.weights = np.ones(num_particles) / num_particles
def predict(self, process_model, process_noise, dt):
"""Prediction step: propagate particles"""
for i in range(self.N):
# Add process noise to each particle
noise = np.random.multivariate_normal(np.zeros(len(self.particles[i])),
process_noise)
self.particles[i] = process_model(self.particles[i], dt) + noise
def update(self, measurement, measurement_model, measurement_noise):
"""Update step: reweight particles based on likelihood"""
for i in range(self.N):
# Predicted measurement for this particle
z_pred = measurement_model(self.particles[i])
# Compute likelihood of actual measurement
residual = measurement - z_pred
likelihood = multivariate_gaussian_pdf(residual, np.zeros(len(residual)),
measurement_noise)
self.weights[i] *= likelihood
# Normalize weights
self.weights /= np.sum(self.weights)
# Resample if effective sample size is too low
if 1.0 / np.sum(self.weights**2) < self.N / 2:
self.resample()
def resample(self):
"""Systematic resampling"""
indices = systematic_resample(self.weights)
self.particles = self.particles[indices]
self.weights = np.ones(self.N) / self.N
def get_estimate(self):
"""Compute weighted mean and covariance"""
mean = np.average(self.particles, weights=self.weights, axis=0)
# Weighted covariance calculation
diff = self.particles - mean
cov = np.average(diff[:, :, np.newaxis] * diff[:, np.newaxis, :],
weights=self.weights, axis=0)
return mean, cov
def systematic_resample(weights):
"""Systematic resampling algorithm"""
N = len(weights)
positions = (np.arange(N) + np.random.random()) / N
indices = np.zeros(N, dtype=int)
cumulative_sum = np.cumsum(weights)
i, j = 0, 0
while i < N:
if positions[i] < cumulative_sum[j]:
indices[i] = j
i += 1
else:
j += 1
return indices
Particle Filter Applications
Robot Localization in Unknown Environments:
class RobotParticleFilter:
def __init__(self, num_particles, map_data):
self.pf = ParticleFilter(num_particles, 3) # [x, y, θ]
self.map = map_data
def motion_model(self, particle, control, dt):
"""Robot motion with noise"""
x, y, theta = particle
v, w = control
# Add motion noise
v_noise = v + np.random.normal(0, 0.1)
w_noise = w + np.random.normal(0, 0.05)
return np.array([
x + v_noise * np.cos(theta) * dt,
y + v_noise * np.sin(theta) * dt,
theta + w_noise * dt
])
def sensor_model(self, particle, laser_scan):
"""Likelihood of laser scan given particle pose"""
# Ray-cast from particle pose through map
# Compare predicted scan with actual scan
# Return likelihood score
pass
Particle Filter Advantages
- No distributional assumptions: Can handle arbitrary probability distributions
- Handles multimodality: Multiple hypotheses represented naturally
- Robust to nonlinearity: No linearization required
- Conceptually simple: Easy to understand and implement
- Flexible: Easy to incorporate complex noise models
Particle Filter Limitations
- Computational cost: O(N) where N can be large (thousands of particles)
- Particle depletion: All particles can converge to single mode
- Curse of dimensionality: Exponentially more particles needed for high dimensions
- Parameter tuning: Number of particles and resampling thresholds need tuning
Comparison and Selection Criteria
Method | Computational Cost | Nonlinearity Handling | Distributional Assumptions | Best Use Cases |
---|---|---|---|---|
Linear KF | O(n³) | None | Linear-Gaussian | Linear systems, sensor fusion |
EKF | O(n³) | Linearization | Gaussian | Mildly nonlinear, real-time systems |
UKF | O(n³) | Sigma points | Gaussian | Moderate nonlinearity, no derivatives |
Particle Filter | O(N⋅n²) | Monte Carlo | None | Highly nonlinear, non-Gaussian |
Selection Guidelines
Choose Linear Kalman Filter when:
- System is truly linear (or very close)
- Gaussian noise assumptions hold
- Maximum computational efficiency needed
Choose Extended Kalman Filter when:
- Mildly nonlinear system
- Jacobians are easy to compute
- Real-time constraints are tight
- Well-understood system dynamics
Choose Unscented Kalman Filter when:
- Moderate nonlinearities present
- Jacobians are difficult/expensive to compute
- Better accuracy needed than EKF
- System remains approximately Gaussian
Choose Particle Filter when:
- Highly nonlinear system
- Non-Gaussian noise or multi-modal distributions
- Computational resources available
- Robustness more important than efficiency
Implementation Best Practices
Hybrid Approaches
Multiple Model Filtering:
class MultipleModelFilter:
def __init__(self, models):
self.models = models # List of different KF variants
self.model_probabilities = np.ones(len(models)) / len(models)
def update(self, measurement):
# Run each model
for model in self.models:
model.predict()
model.update(measurement)
# Update model probabilities based on innovation likelihoods
# Choose best model or blend results
Switching Between Methods:
- Use EKF for normal operation
- Switch to UKF when nonlinearity increases
- Fall back to particle filter for extreme cases
Numerical Considerations
- Square Root Filtering: Use Cholesky decomposition to maintain positive definiteness
- Joseph Form Updates: Ensure numerical stability in covariance updates
- Regularization: Add small values to diagonal elements to prevent singular matrices
- Scaling: Normalize state variables to similar magnitudes
Key Takeaways
-
No Universal Solution: Each method has trade-offs between accuracy, computational cost, and robustness
-
Progressive Complexity: Start with simplest method that meets requirements
-
Application-Dependent: Choice depends on system nonlinearity, noise characteristics, and computational constraints
-
Hybrid Approaches: Combining methods can leverage individual strengths
-
Validation Critical: Extensive testing needed to verify performance in real conditions
Looking Forward
The nonlinear extensions we’ve explored represent the current state-of-the-art in recursive filtering. Our final post will examine advanced topics and future directions – including modern machine learning integration, distributed filtering, and emerging applications.
The evolution from linear Kalman filters to sophisticated nonlinear estimators mirrors the broader trajectory of engineering: starting with elegant mathematical foundations and extending them to handle real-world complexity while preserving practical utility.
Continue to Part 8: Advanced Topics and Future Directions