Why should you have few distinct eigenvalues
The minimum polynomial initially looks awfully similar to the characteristic polynomial, and it is not clear if learning it will have any practical utility. But it does. In fact, one of the most popular and efficient optimization techniques, namely conjugate gradients, relies on the Krylov sequences, which is build upon the concept of minimum polynomial. First, let’s see the how the characteristic and minimum polynomials are defined. The characteristic polynomial $p(x)$ of a matrix $\mathbf A$ is $$p(x) = \prod\limits_{i=1}^ S(x-\lambda_i)^{a_i},$$…
Magical generalizations thanks to Eigenvalues
Imagine taking ubiquitous scalar function $f(x)$, like $\exp(x)$ or $\sin (x)$, generalizing it somehow to a matrix function $f(\mathbf A)$, and expecting some of the properties of the scalar function to hold for the matrix function as well. Sounds like a magic trick, right? But generalizations of this kind are possible, and it’s precisely the point of this article to show some of them! This article also strongly supports the statement that eigenvalues are like the chromosomes or genes of a…
How to compute the “real” rank of a matrix?
If you fill an $n\times n$ matrix with random entries, than you’ll almost surely end up with a full-rank matrix. Also, any matrix that is constructed with real and continuous data (e.g., sensor input) will also be almost surely of full rank even if the underlying should have lead to linearly dependent columns/rows. Further, if we do not use exact arithmetic but, say, floating point arithmetic, our $\mathbf A$ will almost surely be somewhat perturbed, especially if it is a result…
How to prove SVD ? A recipe approach
To prove the existence of SVD is no trivial task, but it turns out that it’s not too difficult either. Looks like one needs a few ingredients (hence the title), but once we know them and understand the overall idea, the proof is not too difficult. Below we list the basic ingredients needed to prove the existence of SVD. The URV decomposition $\mathbf{A} = \mathbf{URV} = \mathbf{U}\begin{pmatrix}\mathbf C & \mathbf 0 \\ \mathbf 0 & \mathbf 0\end{pmatrix}\mathbf{V}$ $||\mathbf{A}||_2 = ||\mathbf{URV}||_2 =…
How can we compute basis for Nullspace?
There are at least two answers to this question; one of these is more educative and the other one is at least as educative (in a different and profound way) as well as practical. Method 1 The first method is a more introductory level method. It is helpful to know it and good to read it as a refresher even if one is more advanced student of the topic. A linear system $\mathbf{Ax=b}$ is homogeneous when $\mathbf b=\mathbf 0$ and nonhomogeneous…
Since we already have SVD, do we need URV factorization?
The SVD factorization is a special case of the URV factorization that has many great properties. The latter decomposes a matrix $A_{m\times n}$ as $$A=URV^T,$$ where $U=(U_1|U_2)$ is an orthonormal matrix such that $U_1$ is a basis for $R(A)$, $U_2$ is a basis for $N(A^T)$; and $V=(V_1|V_2)$ is another orthonormal matrix such that $V_1$ is a basis for $R(A^T)$ and $V_2$ is a basis for $N(A)$. Suppose that the rank of $A$ is $r$, in which case $U_1$ has $r$ columns…
Drazin Inverse and what it means
The Drazin inverse and the discussion around it (p400 of C.D.Meyer) made me truly grasp some of the points about what a generalized inverse is and what is the connection between linear operators and matrices; and change of basis. The Drazin inverse is a natural consequence of the Core-Nilpotent decomposition, according to which, a matrix can be decomposed as $$A=Q\begin{pmatrix}C_{r\times r} & 0 \\ 0 & N\end{pmatrix}Q^{-1},$$where $C$ is nonsingular, and $N$ is nilpotent. Here, $r$ is not the rank of…
Taylor series of an image
Taylor series is the main reason that the field of computer vision exists*. This may seem like a bold statement, but think about it. Classical computer vision problems like image alignment, optical flow, depth estimation (from stereo) or shape from motion, all were first solved with Taylor series expansion. And this should not be very surprising. Most computer vision problems are just optimization problems, and it is most natural that the first computer vision researchers used existing literature on optimization, which…
What’s the use of characteristic functions in probability?
Many books and classes in probability mention some mysterious entities called “Characteristic functions”, with little or no motivation. For students like myself, concepts without proper motivation and excitement do not register in the brain, that’s why I almost completely forgot all I’ve learned about characteristic functions from my first classes/books. The truth is that characteristic functions turn out to be incredibly useful in probability and play an important reason for the following four reasons. 1) Characteristic function is the other side…
An elegant trick to compute moments of Gaussian RV
Assume we have a zero-mean Gaussian RV $\mathbf x$ with variance $\sigma^2$, and we want to compute its moments, i.e., $E\{\mathbf x^n\}$. The moments for odd $n$ are zero because the density function of our RV $f(x)$ is an even function. But how to compute the moments. i.e., $$E\{\mathbf x^n\} =\int\limits_{-\infty}^{\infty} x^n f(x) dx =\frac{1}{\sigma \sqrt{2\pi}}\int\limits_{-\infty}^{\infty} x^n e^{-x^2/2\sigma^2} dx $$ for even $n$? Of course this is not a very difficult integral (can be computed through integration by parts etc.), but…