How to compute the “real” rank of a matrix?
If you fill an $n\times n$ matrix with random entries, than you’ll almost surely end up with a full-rank matrix. Also, any matrix that is constructed with real and continuous data (e.g., sensor input) will also be almost surely of full rank even if the underlying should have lead to linearly dependent columns/rows. Further, if we do not use exact arithmetic but, say, floating point arithmetic, our $\mathbf A$ will almost surely be somewhat perturbed, especially if it is a result of other floating-point computations. But why does this happen, and how can we ameliorate it and compute the “real” rank of a matrix in these cases? These questions make it clear that knowing the following concepts well pays off:
- Matrix rank
- SVD decomposition.
First, it can be theoretically shown that small perturbations can’t reduce rank (p216 of Carl D. Meyer):
Small Perturbations Can’t Reduce Rank
If $\mathbf A$ and $\mathbf E$ are $m\times n$ matrices and if entries of $\mathbf E$ have sufficiently small magnitude*, then $$\text{rank}(\mathbf{A+E})\geq \text{rank}(\mathbf A).$$ Furthermore, the equality here almost never holds in practice (see p218), so the rank almost always increases by small perturbations.
* The term sufficiently small will soon be clarified.
The proof of the theorem above is given in p218, but more than the proof, we are interested in the implications of this theorem since it directly supports the statement in the opening of this article. To fully understand the statement “sufficient small” and to find a workaround to the question of how can we compute the “actual” rank of a matrix, we need to understand the SVD decomposition in general and the singular values in particular.
Suppose that the first $r$ singular values of the matrix $\mathbf{A}$ of rank $r$ are $\sigma_1,\dots,\sigma_r$. When we talk about real data, we never have access to the “real” $\mathbf A$, but to some version of it that is affected by some noise $\mathbf E$, namely $\mathbf{A+E}$. Thus, even though the last $n-r$ singular values must have been theoretically zero (which would make it trivial to compute the actual rank), in practice the version of the $\mathbf A$ matrix that we measure will almost never have any zero singular value. But that’s fortunately not the end of it since, as you may guess, the values that are supposed to be zero will still be very small compared to the first $r$ singular values. And this is the key to identifying the ”real” rank of a matrix.
Before proceeding, let’s first clarify the meaning of ”sufficiently small” in the Theorem stated above. In our context, $\mathbf E$ being sufficiently small means that the largest singular value of $\mathbf E$ (i.e., its norm) is smaller then $\sigma_r$; that is, $||\mathbf E||_2 < \sigma_r$ (see Exercise 5.12.4 of Carl D. Meyer).
To recap, we don’t have access to the real singular values of $\mathbf A$, which would be ordered as $$\sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_r > 0 = 0 \dots 0,$$ but to some perturbed version
$$\tilde{\sigma}_1 \geq \tilde{\sigma}_2 \geq \dots \geq \tilde{\sigma}_r \geq \tilde{\sigma}_{r+1} > \dots \geq \tilde{\sigma}_1.$$ If we would have access to the noise matrix $\mathbf E$, then we could have used it to identify the real rank of $\mathbf A$, because it would have satisfied (see p422) $$\tilde{\sigma}_1 \geq \tilde{\sigma}_2 \geq \dots \geq \tilde{\sigma}_r > ||\mathbf E||_2\geq \tilde{\sigma}_{r+1} > \dots \geq \tilde{\sigma}_1.$$ But we don’t have access to $\mathbf E$. Still, we may be able to use some useful estimates of $||\mathbf E||$. If the noise $\mathbf E$ is purely due to floating point arithmetic errors, then a good estimate is $$||\mathbf E||_2\approx 5\times 10^{-t} ||\mathbf A||_2$$ when $t-$digit floating point arithmetic is used. But if noise is due to, say, imperfect sensor output, then $||\mathbf E||$ will depend on the amount of sensor noise. In this case, we can either consider measuring the variance of noise and using it to identify an estimation for $||\mathbf E||_2$; or, we can plot the $\sigma_i$ values against $i$, check if the graph has an ”elbow”, and assume that the real rank $r$ is where the ”elbow” occurs.