• Some matrix norms are significantly costlier than others. For example, the matrix-2 norm requires computation of eigenvalues, whereas the matrix-1 norm is simply computed by finding the column of the matrix with the largest (absolute) sum (p283 of Carl D. Meyer). Thus, the following question becomes relevant: Can we use some matrix norm in place of some other norm? Fortunately, for some applications, we can do this. First of all, for analyzing limiting behavior, it makes no difference whether we use…

• Well, it depends on how you compute it. If you compute it by using the definition of determinant directly, in fact it can take more than 10¹⁴¹ years, as we’ll see.${}^\dagger$ Let’s recall the definition first. The determinant of a matrix $\mathbf A$ is $$\text{det}(\mathbf A) = \sum\limits_p \sigma(p) a_{1p_1} a_{2p_2} \dots a_{np_n},$$ where the sum is taken over the $n!$ permutations $p=(p_1,p_2,\dots,p_n)$ of the numbers $1,2,\dots,n$. The total number of multiplications in this definition are $n! (n-1)$. Based on my…

• Reflectors are a class of matrices that are not introduced in all linear algebra textbooks. However, the book of Carl D. Meyer uses this class of matrices heavily for various fundamental results. Indeed, the book uses reflectors for theoretical reasons, such as proving the existence of fundamental transformations like SVD, QR, Triangulation, or Hessenberg decompositions (more to come below), as well as applications, such as the implementation of the QR algorithm via the Householder transformation or solution of large-scale linear systems…

• These three inequalities often tend to appear as a package in many textbooks about real analysis, signal processing or linear algebra. It is good to know the main reason that we see this package all the time, and to separate the role of each of these three fundamental inequalities. The overall reason is that, these inequalities are the key to generalize the facts about vectors in 2D/3D spaces and the Euclidean norm to higher dimensional spaces and (non-Euclidean) $p$-norms. Each of…

• The triangle inequality $$||x+y||\leq ||x|| + ||y||$$ is possibly the earliest inequality that we learn. Starting from high school, we learn that the shortest distance between two points is a single straight line, and if we want to travel the same distance by two or more straight lines, we will travel longer. We know very well that the triangle inequality that we so intuitively get is not limited only to the 2D or 3D spaces that we can visualize, but to…

• The four fundamental spaces of a matrix $A$, namely the range and the nullspace of itself $A$ or its transpose $A^T$, are the heart of linear algebra. We often find ourselves in need of computing a basis for the range or the nullspace of a matrix, for theoretical or applicational purposes. There are many ways of computing a basis for the range or nullspace for $A$ or $A^T$. Some are better for application, either due to their robustness against floating point…

• By reading this post you’ll be able to comprehend the basic mechanism behind the proof of LU decomposition Schur Inversion Lemma The Sherman-Morrison-Woodbury inversion formula Small perturbations can’t reduce rank (p216-217 of Meyer) Rank of a matrix equals the rank of its largest nonsingular submatrix (p214 of Meyer; see also Exercise 4.5.15) Characteristic polynomial of $AB$ and $BA$ is the same for square $A,B$ (see Exercise 7.1.19, 6.2.16 and eq. (6.2.1) of Meyer) And many more theorems and lemmas (a few…

• Unfortunately a general formula that simplifies the calculations for a matrix inversion $(A+B)^{-1}$ with arbitrary $A$ and $B$ does not exist. If one of the matrices corresponds to a low-rank update (e.g., $B=CD^T$ for some $C, D\in\mathbb{R}^{n\times k}$ with $k<n$), one can use the S.M.W. formula to great effect. However, in other situations, this formula would not simplify calculations. But all hope is not lost yet. There is another case where $(A+B)^{-1}$ can be computed relatively efficiently. To understand how and…

• A matrix inversion formula that frequently appears in machine learning, linear algebra and other textbooks is the Sherman-Morrison-Woodbury formula, according to which a matrix sum can be inverted as \begin{equation}(A+CD^T)^{-1}=A^{-1}-A^{-1}C(I+D^TA^{-1}C)^{-1}D^TA^{-1}.\end{equation}Your first reaction may be wondering if this formula even simplifies anything. The right-hand-side of this equation looks so complicated that one wonders if it’s not simpler to just use compute the sum $A+CD^T$ and invert it. However, this formula can significantly simplify calculations when the matrices $C, D\in \mathbb{R}^{n\times k}$ are…

• In many textbooks of linear algebra you can see that the Schur complement (see below) can be used to invert a non-singular matrix as $$\begin{pmatrix}\mathbf A & \mathbf C\\ \mathbf R & \mathbf B \end{pmatrix}^{-1}=\begin{pmatrix}\mathbf{A}^{-1}+\mathbf{A}^{-1}\mathbf{CS}^{-1}\mathbf{RA}^{-1} & -\mathbf{A}^{-1}\mathbf{CS}^{-1} \\ \mathbf{-S}^{-1}\mathbf{RA}^{-1}& \mathbf{S}^{-1}\end{pmatrix},$$ where $$\mathbf{S=B-RA}^{-1}\mathbf C$$ is the Schur complement. (We are assuming that the blocks $\mathbf A$ and $\mathbf S$ are both nonsingular.) This formula looks somewhat ugly and, while it is easy to verify (by direct multiplication) that this is indeed the…