A matrix inversion formula that frequently appears in machine learning, linear algebra and other textbooks is the Sherman-Morrison-Woodbury formula, according to which a matrix sum can be inverted as

(A+CD^T)^{-1}=A^{-1}-A^{-1}C(I+D^TA^{-1}C)^{-1}D^TA^{-1}.

Your first reaction may be wondering if this formula even simplifies anything. The right-hand-side of this equation looks so complicated that one wonders if it’s not simpler to just use compute the sum $A+CD^T$ and invert it. However, this formula can significantly simplify calculations when the matrices $C, D\in \mathbb{R}^{n\times k}$ are such that $k<n$ and we already computed $A^{-1}$, say, in previous iterations, and $CD^T$ correspond to the update term. It is easiest to see this in the extreme version of the formula, called also the Sherman-Morrison Formula, according to which

(A+cd^T)^{-1} = A^{-1}-\frac{A^{-1}cd^TA^{-1}}{1+d^TA^{-1}c}.

In this case, $k=1$ and it’s clear that if we already computed $A^{-1}$, the right-hand-side is computationally very simple to obtain, whereas if we were to compute this inversion using the left-hand-side, we’d be computing the inverse of an $n\times n$ matrix from scratch. The formulae above are also important for theoretical purposes, as they allow us to investigate what the effect that perturbation on some matrix coefficients has on the inverse of the matrix, which is useful for sensitivity analysis of a system (see p125 of Carl D. Meyer).

Now that we’ve hopefully convinced you about the importance of this formula, let’s go back to the title: how the heck has it been arrived at?

If you’re familiar with the Schur component and how it can be used to invert a matrix (see, e.g., this post), then the Sherman-Morrison-Woodbury formula starts to be demystified. Specifically, the
\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{C}(\mathbf{B-RA}^{-1}\mathbf C)^{-1}\mathbf{RA}^{-1}

part that appears as the invert of the top-left of a matrix

\begin{pmatrix}\mathbf A & \mathbf C\\ \mathbf R & \mathbf B \end{pmatrix}.

In light of the above, it’s not difficult to see that if we apply the Schur inversion lemma and invert the matrix
$$\begin{pmatrix} A & C \\ D^T & -I\end{pmatrix}, \label{eq:1}$$
the top-left block of the inverse will be precisely the right-hand-side of the S.M.W. formula
$$A^{-1}-A^{-1}C(I+D^TA^{-1}C)^{-1}D^TA^{-1}$$
This is the biggest step towards the demystification–recognizing the pattern of the S.M.W. formula and matching it with the Schur inversion lemma. But we are not done yet.

The next important step is to left- and right-multiply the matrix that we listed above with some matrices,
$$\begin{pmatrix}* & * \\ * & * \end{pmatrix} \begin{pmatrix} A & C \\ D^T & -I\end{pmatrix}\begin{pmatrix}* & * \\ * & * \end{pmatrix}$$
such that the result of the multiplication above results in a matrix
* whose top-left block of size $n\times$ is exactly the l.h.s. of the S.M.W. formula, i.e., $A+CD^T$
* that is non-singular.

The first of these criteria suggest that the following should work:
$$\begin{pmatrix}* & * \\ * & * \end{pmatrix} \begin{pmatrix} A & C \\ D^T & -I\end{pmatrix}\begin{pmatrix}I & * \\ D^T & * \end{pmatrix}$$
The second of these criteria suggest each of the three matrices in this multiplication should be non-singular. Thus, we can just put

\begin{pmatrix}* & * \\ * & * \end{pmatrix}
\begin{pmatrix}
A & C \\ D^T & -I\end{pmatrix}\begin{pmatrix}I & 0 \\ D^T & I \end{pmatrix}.

Since all we are trying to do with the multiplication is to obtain a matrix like (see the two criteria above)

\begin{pmatrix}A+CD^T & * \\ * & * \end{pmatrix},

we don’t care much about the other entries as long as they lead to a nonsingular matrix. In light of these, we can populate the left-most matrix on the multiplication as:

\begin{pmatrix}I & * \\ * & I \end{pmatrix}
\begin{pmatrix}
A & C \\ D^T & -I\end{pmatrix}\begin{pmatrix}I & 0 \\ D^T & I \end{pmatrix}=
\begin{pmatrix}I & * \\ * & I \end{pmatrix}
\begin{pmatrix}A+CD^T & C \\ 0 & -I\end{pmatrix}.

The remaining *’s can possibly filled in multiple ways, but let’s go for the values that lead to the simplest non-singular form on the r.h.s., which would be block-diagonal. This would be achieved as below:

\begin{pmatrix}I & C \\ 0 & I \end{pmatrix}
\begin{pmatrix}A+CD^T & C \\ 0 & -I\end{pmatrix}.

And we are done! But let’s re-cap to consolidate we did:

• We identified the similarity between the Schur inversion lemma and the S.M.W. formula
• We realized that if we apply the Schur inversion lemma to the matrix
$$\begin{pmatrix}* & * \\ * & * \end{pmatrix} \begin{pmatrix} A & C \\ D^T & -I\end{pmatrix}\begin{pmatrix}I & * \\ D^T & * \end{pmatrix},$$
the top-left of the inverted matrix would be the r.h.s. of the S.M.W. formula that we seek
• We needed to left- and right-multiply the matrix above with some simple matrices such that the resultant matrix is nonsingular and the top-left block of the matrix is $A +CD^T$