Orthogonal Matching Pursuit

We are modeling a signal \(y \in \RR^M\) in a dictionary \(\Phi \in \RR^{M \times N}\) consisting of \(N\) atoms as \(y = \Phi x + r\) where \(r\) is the approximation error. Our objective is to construct a sparse model \(x \in \RR^N\). \(\Lambda = \supp(x)\) is the set of indices on which \(x_i\) is non-zero. \(K = \| x \|_0 = | \supp(x) |\) is the so called \(\ell_0\)-“norm” of \(x\) which is the number of non-zero entries in \(x\).

A sparse recovery or approximation algorithm need not provide the full vector \(x\). It can provide the positions of non-zero entries \(\Lambda\) and corresponding values \(x_{\Lambda}\) requiring \(2K\) units of storage where \(x_{\Lambda} \in \RR^{K}\) consists of entries from \(x\) indexed by \(\Lambda\). \(\Phi_{\Lambda}\) denotes the submatrix constructed by picking columns indexed by \(\Lambda\).

Orthogonal Matching Pursuit is presented below.

../../_images/algorithm_orthogonal_matching_pursuit.png

OMP builds the support incrementally. In each iteration, one more atom is added to the support set for \(y\). We terminate the algorithm either after a fixed number of iterations \(K\) or when the magnitude of residual \(\| y - \Phi x \|_2\) reaches a specified threshold.

Following analysis assumes that the main loop of OMP runs for \(K\) iterations. The iteration counter \(k\) varies from \(1\) to \(K\). The counter is increased at the beginning of the iteration. Note that \(K \leq M\).

Matching step requires the multiplication of \(\Phi^T \in \RR^{N \times M}\) with \(r^{k-1}\in \RR^{M}\) (the residual after \(k-1\) iterations). It requires \(2MN\) flops at maximum. OMP has a property that the residual after \(k\)-th iteration is orthogonal to the space spanned by the atoms selected till \(k\)-th iteration \(\{\phi_{\lambda_1}\dots \phi_{\lambda_k}\}\). Thus, the inner product of these atoms with \(r\) is 0 and we can safely ignore these columns. This reduces flop count to \(2M(N-k+1)\).

Identification step requires \(2N\) flops. This includes \(N\) flops for taking absolute values and \(N\) flops for finding the maximum.

\(\Lambda\) is easily implemented in the form of an array whose length is indicated by the iteration counter \(k\). A large array (of size \(M\)) can be allocated in advance for maintaining \(\Lambda\). Thus, support update operation requires a single flop and we will ignore it. \(\Lambda^{k}\) contains \(k\) indices.

While the algorithm shows the full sparse vector \(x\), in practice, we only need to reserve space for \(x_{\Lambda}\) which is an array with maximum size of \(M\) and can be preallocated. \(\Phi_{\Lambda^{k}}\) need not be stored separately. This can be obtained from \(\Phi\) by proper indexing operations. Its size is \(M \times k\).

Let’s skip the least squares step for updating representation for the moment.

Once \(x^{k}_{\Lambda^{k}}\) has been computed, computing the approximation \(y^{k}\) takes \(2Mk\) flops.

Updating the residual \(r^{k}\) takes \(M\) flops as both \(y\) and \(y^{k}\) belong to \(\RR^{M}\). Updating iteration counter takes 1 flop and can be ignored.

Least Squares through QR Update

Let’s come back to the least squares step. Assume that \(\Phi_{\Lambda^{k-1}}\) has a QR decomposition \(Q_{k-1} R_{k-1}\). Addition of \(\phi_{\lambda^{k}}\) to \(\Phi_{\Lambda^k}\) requires us updating the QR decomposition to \(Q_{k} R_{k}\). Following here, Computing and subtracting projection of \(\phi_{\lambda^{k}}\) for each normalized column in \(Q_{k-1}\) requires \(4M-1\) flops. This loop is run \({k-1}\) times. Computing norm and division requires \(3M+1\) flops. The whole QR update step requires \((k-1)(4M-1) + 3M + 1\) flops. We are assuming that enough space has been preallocated to maintain \(Q_k\) and \(R_k\). Solving the least squares problem requires additional steps of computing the projection \(d = Q^T y\) (\(2M\) flops for the new entry in \(d\)) and solving \(R x = d\) by back substitution (\(k^2\) flops). Thus, QR update based least squares solution requires \((k-1)(4M-1) + 3M + 1 + 2M + k^2\) flops.

Refer to here for a summary of all the steps.

Finally, we can put together the cost of all steps in the main loop of OMP as

\[2M(N-k+1) + 2N + 2Mk + M + (k-1)(4M-1) + 3M + 1 + 2M + k^2.\]

This simplifies to \(4\,M+2\,N-k+4\,M\,k+k^2+2\,M\,N+2\). Summing over \(k \in \{1,\dots, K\}\), we obtain

\[\frac{5\, K}{3} + 2\, K^2\, M + \frac{K^3}{3} + 6\, K\, M + 2\, K\, N + 2\, K\, M\, N.\]

For a specific setting of \(K = \sqrt{M} / 2\) and \(M = N/2\), we get

\[\frac{5\,\sqrt{2}\,\sqrt{N}}{12}+\frac{121\,\sqrt{2}\,N^{3/2}}{96}+\frac{\sqrt{2}\,N^{5/2}}{4}+\frac{N^2}{8} \approx \frac{\sqrt{2}\,N^{5/2}}{4}.\]

In terms of \(M\), it will simplify to:

\[\frac{M^2}{2}+\frac{5\,\sqrt{M}}{6}+\frac{121\,M^{3/2}}{24}+2\,M^{5/2} \approx 2\,M^{5/2}.\]

In a typical sparse approximation problem, we have \(K < M \ll N\). Thus, the flop count will be approximately \(2KMN\).

Total flop count of matching step over all iterations is \(K\, M - K^2\, M + 2\, K\, M\, N\). Total flop count of least squares step over all iterations is \(\frac{5\, K}{3} + 2\, K^2\, M + \frac{K^3}{3} + 3\, K\, M\). This suggests that the matching step is the dominant step for OMP.

\centering
\caption{Operations in OMP using QR update}
\begin{tabular}{c | c}
\hline
Operation & Flops \\
\hline
:math:`\Phi^T r` & :math:`2M(N - k +1)`\\
Identification  & :math:`2N` \\
:math:`y^{k} = \Phi_{\Lambda^{k}} x_{\Lambda^{k}}^{k}` & :math:`2Mk`\\
:math:`r^k = y - y^k` & :math:`M` \\
QR update & :math:`(k-1)(4M-1) + 3M + 1` \\
Update :math:`d = Q_k^T y` &  :math:`2M` \\
Solve :math:`R_k x = d` & :math:`k^2` \\
\hline
\end{tabular}

Least Squares through Cholesky Update

If the OMP least squares step is computed through Cholesky decomposition, then we maintain the Cholesky decomposition of \(G = \Phi_{\Lambda}^T \Phi_{\Lambda}\) as \(G = L L^T\). Then

\[\begin{split}\begin{aligned} &x = \Phi_{\Lambda}^{\dag} y\\ \iff & x = (\Phi_{\Lambda}^T \Phi_{\Lambda})^{-1} \Phi_{\Lambda}^T y\\ \iff & (\Phi_{\Lambda}^T \Phi_{\Lambda}) x = \Phi_{\Lambda}^T y\\ \iff & LL^T x = \Phi_{\Lambda}^T y = b \end{aligned}\end{split}\]

In each iteration, we need to update \(L_k\), compute \(b = \Phi_{\Lambda}^T y\), solve \(L u = b\) and then solve \(L^T x = u\). Now,

\[\begin{split}\Phi_{\Lambda^k}^T \Phi_{\Lambda^k} = \begin{bmatrix} \Phi_{\Lambda^{k-1}}^T \Phi_{\Lambda^{k-1}} & \Phi_{\Lambda^{k-1}}^T \phi_{\lambda^k}\\ \phi_{\lambda^k}^T \Phi_{\Lambda^{k-1}} & \phi_{\lambda^k}^T \phi_{\lambda^k} \end{bmatrix}.\end{split}\]

Define \(v = \Phi_{\Lambda^{k-1}}^T \phi_{\lambda^k}\). We have

\[\begin{split}G^k = \begin{bmatrix} G^{k - 1} & v \\ v^T & 1 \end{bmatrix}.\end{split}\]

The Cholesky update is given by:

\[\begin{split}L^k = \begin{bmatrix} L^{k - 1} & 0 \\ w^T & \sqrt{1 - w^T w} \end{bmatrix}\end{split}\]

where solving \(L^{k - 1} w = v\) gives us \(w\). For the first iteration, \(L^1 = 1\) since the atoms in \(\Phi\) are normalized.

Computing \(v\) would take \(2M(k-1)\) flops. Computing \(w\) would take \((k-1)^2\) flops. Computing \(\sqrt{1-w^T w}\) would take another \(2k\) flops. Thus, Cholesky update requires \(2M(k-1) + 2k + (k-1)^2\) flops. Then computing \(b = \Phi^T_{\Lambda} y\) requires only updating the last entry in \(b\) which requires \(2M\) flops. Solving \(LL^T x = b\) requires \(2k^2\) flops.

\centering
\caption{Operations in OMP using Cholesky update}
\begin{tabular}{c | c}
\hline
Operation & Flops \\
\hline
:math:`\Phi^T r` & :math:`2M(N - k +1)`\\
Identification  & :math:`2N` \\
:math:`y^{k} = \Phi_{\Lambda^{k}} x_{\Lambda^{k}}^{k}` & :math:`2Mk`\\
:math:`r^k = y - y^k` & :math:`M` \\
Cholesky update & :math:`2M(k-1) + 2k + (k-1)^2` \\
Update :math:`b = \Phi^T_{\Lambda} y` &  :math:`2M` \\
Solve :math:`LL^T x = b` & :math:`2k^2` \\
\hline
\end{tabular}

We can see that for \(k \ll M\), QR update is around \(4Mk\) flops while Cholesky update is around \(2Mk\) steps (asymptotically).

Flop counts for the main loop of OMP using Cholesky update is

\[3\,k^2+2\,M\,k+3\,M+2\,N+2\,M\,N+1.\]

Summing over \(k \in [K]\), we get total flop count for OMP as

(1)\[\frac{3\,K}{2}+K^2\,M+\frac{3\,K^2}{2}+K^3+4\,K\,M+2\,K\,N+2\,K\,M\,N.\]

For a specific setting of \(K = \sqrt{M} / 2\) and \(M = N/2\), we get In terms of \(M\), it will simplify to:

\[\frac{3\,M}{8}+\frac{M^2}{4}+\frac{3\,\sqrt{M}}{4}+\frac{33\,M^{3/2}}{8}+2\,M^{5/2} \approx 2\,M^{5/2}.\]

In a typical sparse approximation problem, we have \(K < M \ll N\). Thus, the flop count will be approximately \(2KMN\) i.e. dominated by the matching step.

Cholesky update based solution is marginally faster than QR update based solution for small values of \(M\).