## EBT-III matrices should be a key tool in your inventory

By reading this post you’ll be able to comprehend the basic mechanism behind the proof of LU decomposition Schur Inversion Lemma The Sherman-Morrison-Woodbury inversion formula Small perturbations can’t reduce rank (p216-217 of Meyer) Rank of a matrix equals the rank of its largest nonsingular submatrix (p214 of Meyer; see also Exercise 4.5.15) Characteristic polynomial of $AB$ and $BA$ is the same for square $A,B$ (see Exercise 7.1.19, 6.2.16 and eq. (6.2.1) of Meyer) And many more theorems and lemmas (a few…

## Can we really say nothing about the inverse of A+B?

Unfortunately a general formula that simplifies the calculations for a matrix inversion $(A+B)^{-1}$ with arbitrary $A$ and $B$ does not exist. If one of the matrices corresponds to a low-rank update (e.g., $B=CD^T$ for some $C, D\in\mathbb{R}^{n\times k}$ with $k<n$), one can use the S.M.W. formula to great effect. However, in other situations, this formula would not simplify calculations. But all hope is not lost yet. There is another case where $(A+B)^{-1}$ can be computed relatively efficiently. To understand how and…

## Demystifying the Sherman-Morrison-Woodbury formula

A matrix inversion formula that frequently appears in machine learning, linear algebra and other textbooks is the Sherman-Morrison-Woodbury formula, according to which a matrix sum can be inverted as \begin{equation}(A+CD^T)^{-1}=A^{-1}-A^{-1}C(I+D^TA^{-1}C)^{-1}D^TA^{-1}.\end{equation}Your first reaction may be wondering if this formula even simplifies anything. The right-hand-side of this equation looks so complicated that one wonders if it’s not simpler to just use compute the sum $A+CD^T$ and invert it. However, this formula can significantly simplify calculations when the matrices $C, D\in \mathbb{R}^{n\times k}$ are…

## Demystifying Schur Inversion

In many textbooks of linear algebra you can see that the Schur complement (see below) can be used to invert a non-singular matrix as $$\begin{pmatrix}\mathbf A & \mathbf C\\ \mathbf R & \mathbf B \end{pmatrix}^{-1}=\begin{pmatrix}\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{CS}^{-1}\mathbf{RA}^{-1} & -\mathbf{A}^{-1}\mathbf{CS}^{-1} \\ \mathbf{-S}^{-1}\mathbf{RA}^{-1}& \mathbf{S}^{-1}\end{pmatrix},$$ where $$\mathbf{S=B-RA}^{-1}\mathbf C$$ is the Schur complement. (We are assuming that the blocks $\mathbf A$ and $\mathbf S$ are both nonsingular.) This formula looks somewhat ugly and, while it is easy to verify (by direct multiplication) that this is indeed the…

## Why should you have few distinct eigenvalues

The minimum polynomial initially looks awfully similar to the characteristic polynomial, and it is not clear if learning it will have any practical utility. But it does. In fact, one of the most popular and efficient optimization techniques, namely conjugate gradients, relies on the Krylov sequences, which is build upon the concept of minimum polynomial. First, let’s see the how the characteristic and minimum polynomials are defined. The characteristic polynomial $p(x)$ of a matrix $\mathbf A$ is $$p(x) = \prod\limits_{i=1}^ S(x-\lambda_i)^{a_i},$$…

## Magical generalizations thanks to Eigenvalues

Imagine taking ubiquitous scalar function $f(x)$, like $\exp(x)$ or $\sin (x)$, generalizing it somehow to a matrix function $f(\mathbf A)$, and expecting some of the properties of the scalar function to hold for the matrix function as well. Sounds like a magic trick, right? But generalizations of this kind are possible, and it’s precisely the point of this article to show some of them! This article also strongly supports the statement that eigenvalues are like the chromosomes or genes of a…

## How to compute the “real” rank of a matrix?

If you fill an $n\times n$ matrix with random entries, than you’ll almost surely end up with a full-rank matrix. Also, any matrix that is constructed with real and continuous data (e.g., sensor input) will also be almost surely of full rank even if the underlying should have lead to linearly dependent columns/rows. Further, if we do not use exact arithmetic but, say, floating point arithmetic, our $\mathbf A$ will almost surely be somewhat perturbed, especially if it is a result…

## How to prove SVD ? A recipe approach

To prove the existence of SVD is no trivial task, but it turns out that it’s not too difficult either. Looks like one needs a few ingredients (hence the title), but once we know them and understand the overall idea, the proof is not too difficult. Below we list the basic ingredients needed to prove the existence of SVD. The URV decomposition $\mathbf{A} = \mathbf{URV} = \mathbf{U}\begin{pmatrix}\mathbf C & \mathbf 0 \\ \mathbf 0 & \mathbf 0\end{pmatrix}\mathbf{V}$ $||\mathbf{A}||_2 = ||\mathbf{URV}||_2 =…

## How can we compute basis for Nullspace?

There are at least two answers to this question; one of these is more educative and the other one is at least as educative (in a different and profound way) as well as practical. Method 1 The first method is a more introductory level method. It is helpful to know it and good to read it as a refresher even if one is more advanced student of the topic. A linear system $\mathbf{Ax=b}$ is homogeneous when $\mathbf b=\mathbf 0$ and nonhomogeneous…

## Since we already have SVD, do we need URV factorization?

The SVD factorization is a special case of the URV factorization that has many great properties. The latter decomposes a matrix $A_{m\times n}$ as $$A=URV^T,$$ where $U=(U_1|U_2)$ is an orthonormal matrix such that $U_1$ is a basis for $R(A)$, $U_2$ is a basis for $N(A^T)$; and $V=(V_1|V_2)$ is another orthonormal matrix such that $V_1$ is a basis for $R(A^T)$ and $V_2$ is a basis for $N(A)$. Suppose that the rank of $A$ is $r$, in which case $U_1$ has $r$ columns…