Demystifying Schur Inversion

In many textbooks of linear algebra you can see that the Schur complement (see below) can be used to invert a non-singular matrix as $$\begin{pmatrix}\mathbf A & \mathbf C\\ \mathbf R & \mathbf B \end{pmatrix}^{-1}=\begin{pmatrix}\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{CS}^{-1}\mathbf{RA}^{-1} & -\mathbf{A}^{-1}\mathbf{CS}^{-1} \\ \mathbf{-S}^{-1}\mathbf{RA}^{-1}& \mathbf{S}^{-1}\end{pmatrix},$$ where $$\mathbf{S=B-RA}^{-1}\mathbf C$$ is the Schur complement. (We are assuming that the blocks $\mathbf A$ and $\mathbf S$ are both nonsingular.)

This formula looks somewhat ugly and, while it is easy to verify (by direct multiplication) that this is indeed the inverse, it is not so obvious how we arrived at the equation above in the first place. However, it is actually very simple to arrive at this formula ourselves if we have figured out a few basics about inversion and about the point of block-partitioning. I claim that, once you read this post, arriving at the Schur decomposition will be straightforward for you.

In particular, there are three ingredients that we must have made our second nature. First, it is very easy to compute the inverse of an upper or lower triangular matrix with as diagonal blocks $\mathbf I$. Second, the general mechanism of Gauss elimination generalizes to block partitioned matrices (see below), thus we can arrive at our beloved block triangular forms that we are simple to invert. The third is the most elementary of all: $(\mathbf{AB})^{-1}=\mathbf{B}^{-1}\mathbf{A}^{-1}$.

The first of these points is trivial to see; a matrix inversion doesn’t get simpler than this: $$\begin{pmatrix}\mathbf I & \mathbf 0 \\ \mathbf L & \mathbf I\end{pmatrix}^{-1}=\begin{pmatrix}\mathbf I & \mathbf 0 \\ -\mathbf L & \mathbf I\end{pmatrix}.$$ An analogous identity can be shown for upper triangular matrix as well (and we’ll be using it below).
The second point, while surely no rocket science, is not as widely known (I think) as it should, and is clearly the key in arriving at the formula that we showed in the beginning. Anyone who took any linear algebra course probably learned that Gauss elimination on a matrix $$\begin{pmatrix} a & b \\ c & d\end{pmatrix}$$ with $a\neq 0$ leads to $$\begin{pmatrix} 1 & 0 \\ c/a & 1\end{pmatrix} \begin{pmatrix} a & b \\ 0 & d-cb/a\end{pmatrix},$$ which is the LU factorization. But did you know that you can directly extend this to block-matrices? That is, a matrix $$\begin{pmatrix} \mathbf A & \mathbf B \\ \mathbf{C} & \mathbf D\end{pmatrix}$$ with nonsingular $\mathbf A$ can be factored as $$\begin{pmatrix} \mathbf A & \mathbf B \\ \mathbf{C} & \mathbf D\end{pmatrix} = \begin{pmatrix}\mathbf I& \mathbf 0 \\ \mathbf{CA}^{-1} & \mathbf I\end{pmatrix}\begin{pmatrix}\mathbf A & \mathbf B \\ \mathbf 0 & \mathbf D-\mathbf {CA}^{-1}\mathbf B \end{pmatrix}.$$ Note that even the assumption (i.e., $\mathbf A^{-1}$ exists) is the same.

Now you have all the ingredients that you need to arrive at Schur’s inversion lemma — the rest is very simple! All you need to do is take the decomposition that we already started above one step further, and decompose the right-hand-side as $$\begin{pmatrix}\mathbf A & \mathbf B \\ \mathbf 0 & \mathbf D-\mathbf {CA}^{-1}\mathbf B \end{pmatrix} = \begin{pmatrix}\mathbf A & \mathbf B \\ \mathbf 0 & \mathbf S \end{pmatrix} = \begin{pmatrix}\mathbf A & \mathbf 0 \\ \mathbf 0 & \mathbf S\end{pmatrix} \begin{pmatrix}\mathbf I & \mathbf{A}^{-1}\mathbf{B} \\ \mathbf 0 & \mathbf I\end{pmatrix}.$$ In other words, we have shown that $$\begin{pmatrix} \mathbf A & \mathbf B \\ \mathbf{C} & \mathbf D\end{pmatrix} = \begin{pmatrix}\mathbf I& \mathbf 0 \\ \mathbf{CA}^{-1} & \mathbf I\end{pmatrix}\begin{pmatrix}\mathbf A & \mathbf 0 \\ \mathbf 0 & \mathbf S\end{pmatrix}\begin{pmatrix}\mathbf I & \mathbf{A}^{-1}\mathbf{B} \\ \mathbf 0 & \mathbf I\end{pmatrix}.$$ Every matrix on the right-hand-side is simple to invert, and by inverting each of these three matrices and using the third ingredient above, we arrive at the Schur inversion formula.

Why should you have few distinct eigenvalues

Demystifying the Sherman-Morrison-Woodbury formula

Why should you have few distinct eigenvalues

Demystifying the Sherman-Morrison-Woodbury formula

You May Also Like

State of the art in 3D face reconstruction may be wrong

3DI: Face Reconstruction via Inequality Constraints

SyncRef: Fast & Scalable Way to Find Synchronized Time Series