xbeta
Kalman filter

The Kalman Filter provides an efficient recursive estimator for the unobserved state of a linear discrete time dynamical system in the presence of measurement error. Kalman (1960) first introduced the method in the Engineering literature, but it can be understood in the context of Bayesian inference?.

Model

Let y t denote a vector of observed variables at time t and let s t denote the unobserved state variables of the system at time t. We wish to conduct inference about the state variables given only the observed data {y t} and the structure of a linear model consisting of a measurement equation and a transition equation.

The evolution of the observed variable depends on the state variables through a linear measurement equation

(1)y t=Fs t+ε t,ε tN(0,Ω ε).y_t = F s_t + \varepsilon_t, \quad \varepsilon_t \sim N(0, \Omega_\varepsilon).

The variable y t is observed with measurement error which follows the Normal distribution with mean zero and covariance matrix Ω ε.

The state vector s t obeys the transition equation

(2)s t=Gs t1+η t,η tN(0,Ω η)s_t = G s_{t-1} + \eta_t, \quad \eta_t \sim N(0, \Omega_\eta)

where G and Ω η are known matrices and η t captures the influence of effects that are outside the model on the state transition process. The noise terms ε t and η t are independent. In general G and F can be time-dependent but for the sake of simplicity the time subscripts are omitted here.

The Kalman Filter is similar in nature to the standard linear regression model. The state of the process s t corresponds to the regression coefficients, however the state is not constant over time, requiring the introduction of the transition equation.

Bayesian Interpretation

Let Y t=(y t,y t1,,y 1) denote the complete history of observed data at time t. Our goal is to obtain the posterior distribution? of s t given Y t. We know from Bayes' theorem that

(3)Pr(s tY t) =Pr(y ts t,Y t1)Pr(s tY t1)Pr(y tY t1) Pr(y ts t,Y t1)Pr(s tY t1).\begin{split} \Pr(s_t \vert Y_t) &= \frac{\Pr(y_t \vert s_t, Y_{t-1})\Pr(s_t \vert Y_{t-1})}{\Pr(y_t \vert Y_{t-1})} \\ &\propto \Pr(y_t \vert s_t, Y_{t-1}) \Pr(s_t \vert Y_{t-1}). \end{split}

The left-hand side is the posterior distribution? of s t. On the second line, the first term is the likelihood? of s t and the second term is the prior distribution? of s t. This equation defines a recursive Bayesian updating? relationship.

At time t1, our knowledge of the system is summarized by the the posterior distribution

(4)s t1Y t1N(s^ t1,Σ t1)s_{t-1} \vert Y_{t-1} \sim N(\hat s_{t-1}, \Sigma_{t-1})

where s^ t1 is our previous estimate about the mean of s t1. This process is initialized at time 0 by specifying s^ 0 and Σ 0.

Before observing y t, our best prediction of s t comes from (2), namely Gs t1+η t. However, combining this with (4), we have

(5)s tY t1N(Gs^ t1,R t),s_t \vert Y_{t-1} \sim N\left(G \hat s_{t-1}, R_t \right),

where

(6)R tGΣ t1G +Ω η.R_t \equiv G \Sigma_{t-1} G^\top + \Omega_\eta.

This follows directly from the properties of the multivariate Normal distribution.

After observing y t, we can update our knowledge about s t using the likelihood? Pr(y ts t,Y t1). Let e t denote the error in predicting y t,

(7)e ty ty^ t=y tFGs^ t1.e_t \equiv y_t - \hat y_t = y_t - F G \hat s_{t-1}.

Observing e t is equivalent to observing y t since F, G, and s t1 are all known. Thus, (3) becomes

(8)Pr(s ty t,Y t1)=Pr(s te t,Y t1)Pr(e ts t,Y t1)Pr(s tY t1).\Pr(s_t \vert y_t, Y_{t-1}) = \Pr(s_t \vert e_t, Y_{t-1}) \propto \Pr(e_t \vert s_t, Y_{t-1}) \Pr(s_t \vert Y_{t-1}).

Now, using the measurement equation (1), we can write e t=F(s tGs^ t1)+ε t, and therefore

(9)e ts t,Y t1N(F(s tGs^ t1),Ω ε).e_t \vert s_t, Y_{t-1} \sim N\left( F(s_t - G \hat s_{t-1}), \Omega_\varepsilon \right).

Now, from Bayes' theorem the posterior distribution of s t satisfies

(10)Pr(s ty t,Y t1)=Pr(e ts t,Y t1)Pr(s tY t1)Pr(e t,s tY t1)ds t.\Pr(s_t \vert y_t, Y_{t-1}) = \frac{\Pr(e_t \vert s_t, Y_{t-1})\Pr(s_t \vert Y_{t-1})}{\int \Pr(e_t, s_t \vert Y_{t-1})\,d\s_t}.

Once this probability is computed, we can perform another iteration of the recursion by going back to (3).

Calculating the Posterior Distribution

We can calculate the posterior distribution? (10) directly by appealing to the properties of the Normal distribution. Note that

(11)(s t e t)Y t1N[(Gs t1 0),(R R tF FR t Ω ε+FR tF )].\begin{pmatrix} s_t \\ e_t \end{pmatrix} \vert Y_{t-1} \sim N\left[ \begin{pmatrix} G s_{t-1} \\ 0 \end{pmatrix},\, \begin{pmatrix} R & R_t F^\top \\ F R_t & \Omega_\varepsilon + F R_t F^\top \end{pmatrix} \right].

where R t is given by (6). Conditional on e t, the distribution of s t is

(12)s te t,Y t1N[Gs^ t1R tF (Ω ε+FR tF ) 1e t,R tR tF (Ω ε+FR tF ) 1FR t].s_t \vert e_t, Y_{t-1} \sim N\left[ G \hat s_{t-1} R_t F^\top (\Omega_\varepsilon + F R_t F^\top )^{-1} e_t,\, R_t - R_t F^\top ( \Omega_\varepsilon + F R_t F^\top)^{-1} F R_t \right].

To summarize, the posterior distribution of s t was be calculated recursively by first choosing initial values for s 0 and Σ 0. Then at each period t, given the posterior? of s t1, with mean s^ t1 and covariance matrix Σ t1 as in (4), we form a prior for s t with mean Gs^ t1 and variance R t=GΣ t1G +Ω η as in (5), evaluate the likelihood in (9) given e t=y tFGs^ t1, and then arrive at the posterior in time t given by (12).

Algorithm

Using the theoretical derivation as a guide, we can implement the Kalman Filter as a recursive algorithm. Given initialization values s 0 and Σ 0, at time t,

  1. The posterior distribution at time t1 is Normal with mean s^ t1 and covariance matrix Σ t1.

  2. Form the covariance matrix of the prior distribution,

    R t=GΣ t1G +Ω η;R_t = G \Sigma_{t-1} G^\top + \Omega_\eta;
  3. Calculate the mean of the posterior,

    s^ t=Gs^ t1+R tF (Ω ε+FR tF ) 1e t;\hat s_t = G \hat s_{t-1} + R_t F^\top(\Omega_\varepsilon + F R_t F^\top)^{-1} e_t;
  4. Calculate the variance of the posterior:

    Σ t=R tR tF (Ω ε+FR tF ) 1FR t.\Sigma_t = R_t - R_t F^\top(\Omega_\varepsilon + F R_t F^\top)^{-1} F R_t.

References

Articles

  • The Kalman Filter by Greg Welsh and Gary Bishop, Dept. of Computer Science, University of North Carolina at Chapel Hill

  • Kalman filter Wikipedia entry

category: Statistics