Linear Algebra

Can we really say nothing about the inverse of A+B?

Unfortunately a general formula that simplifies the calculations for a matrix inversion $(A+B)^{-1}$ with arbitrary $A$ and $B$ does not exist. If one of the matrices corresponds to a low-rank update (e.g., $B=CD^T$ for some $C, D\in\mathbb{R}^{n\times k}$ with $k<n$), one can use the S.M.W. formula to great effect. However, in other situations, this formula would not simplify calculations.

But all hope is not lost yet. There is another case where $(A+B)^{-1}$ can be computed relatively efficiently. To understand how and when, we need to now the Neumann Series.

The Neumann Series is a beautiful generalization of the sum of geometric series for scalars. It is well-known (and easy to establish) that the sum
$$
\sum_{i=0}^\infty r^k = \frac{1}{1-r}=(1-r)^{-1}
$$
for any real $r$ with $|r|<1$. Moreover, the rate on convergence gets faster as the magnitude of $r$ gets small.

The Neumann Series generalizes the geometric series almost directly
$$
\sum_{k=0}^{\infty} A^k = (I-A)^{-1}
$$
if and only if $\rho(A)<1$, where $\rho(A)$ is the spectral radius of $A$ (see p618 of Carl D. Meyer). One can see that even the condition for convergence of Neumann series is most analogous to the condition for the scalars. Moreover –and this is the key for this article– the rate of convergence depends on how small $\rho(A)$ is. In this context, we love matrices with small radii.

We’re now in a position to see how and when the Neumann series can be used to compute the inverse of $(A+B)$. Specifically, we have that
$$
(A+B)^{-1}=(I-[-A^{-1}B])^{-1}A^{-1}=\left(\sum_{k=0}^\infty[-A^{-1}B]^k\right)A^{-1}.
$$
As you can see, the r.h.s. factors out $A^{-1}$, thus, if we have already computed it, we can re-use it. Moreover, $B^{-1}$ does not even appear on the r.h.s. — we only have $B$. But we have a little problem: We need to carry an infinite sum.

Enter the importance of small radius. If $A^{-1}B$ has a small radius, then the sum above will converge fast. But even that’s not enough for computational efficiency, because we have matrix multiplications in the form $(A^{-1}B)^k$. In general, matrix multiplication is as computationally demanding as inversion (p120), therefore, in general, the formula above will not yield computationally efficient results even if the sum converges relatively fast.

However, we are not done yet — all hope is not lost. In fact, there is a whole field of applied linear algebra that focuses on when and how one can ensure fast convergence and efficient operation. There is still no formula that works for all, but, with sufficient knowledge, one can recognize when it’s worth using the series approach above and when it’s computationally advantageous to do so. Below we’ll give a bite size introduction, but before we do so, it’s worth contrasting the computation of $(A+B)^{-1}$ via S.M.W. formula to that of via Neumann series. The S.M.W. formula is useful when the update term (e.g., $B$) is of low-rank, however the Neumann Series does not require that — our matrix can very well be full-rank. However, the Neumann Series does require the update matrix to be of small spectral radius — it should be less than 1 and the smaller the better. While an arbitrary matrix can clearly not be expected to be of small spectral radius, there are ways of addressing this situation: preconditioning.

The Neumann Series formula is particularly used for solving a linear system $Ax=b$ with a class of methods called the Linear Stationary Iterations. These methods first re-write the matrix $A$ via a splitting, $$A=M-N$$ (this is where the sum–i.e., difference– enters) such that $M^{-1}$ exists. Then, a matrix $H=M^{-1}N$ is defined, and a vector $d=M^{-1}b$ is obtained.

It turns out the, if $\rho(H)<1$, the solution of $(I-H)x=d$ turns out to be the solution of our original system $A_{n\times n}x=b$. Clearly, we care about this re-writing just because the former of these systems can be solved much faster than the latter, and by now it shouldn’t be difficult to see that this is possible to the degree that $\rho(H)$ is small, because, not to surprisingly, the solution is carried out by invoking the Neuman series on $(I-H)$, as
$$
(I-H) = I + H + H^2 \dots
$$
and the sum above will converge faster if the spectral radius of $H$ is smaller. Of course, the bigger question is, how to find splittings $M-N$ such that convergence is guaranteed and is reached as fast as possible. This is where we are transitioning to the domain of applied math or even engineering, because there is no method that guarantees either of these under general conditions. Some methods listed by Carl D. Meyer are the

  • Jacobi’s method
  • Gauss-Seidel Method
  • Successive Over-Relaxation (XOR)

Each of these methods has a different approach to construct a splitting. Overall, the XOR method is probably better (and likely has many variations), but there is no method that’s universally better than others (e.g., see Exercise 7.10.7 in Carl D. Meyer).

Frankly, if the system that you solve is of small size $n$ (what is considered small $n$ is a product of technology) or you don’t need real-time computation, you’ll never need to worry about any of these and you’ll just be using a standard method on $Ax=b$ (e.g., Householder transformation, modified Gauss elimination etc.). However, if you have very large $n$ or you need computation really fast, you’ll probably be having a look at these method or the Krylov Methods. Also, some of these methods are likely to be good for parallelization (on GPUs etc), because, while traditional methods need to process the entire matrix, Linear Stationary Iteration methods such as the Jacobi’s method can process each unknown independently (see Carl D. Meyer 622).