Linear Algebra

How to compute basis for the range, nullspace etc. of a matrix? 6 Approaches

The four fundamental spaces of a matrix $A$, namely the range and the nullspace of itself $A$ or its transpose $A^T$, are the heart of linear algebra. We often find ourselves in need of computing a basis for the range or the nullspace of a matrix, for theoretical or applicational purposes.

There are many ways of computing a basis for the range or nullspace for $A$ or $A^T$. Some are better for application, either due to their robustness against floating point errors, or due to their efficiency. Others are better for theoretical purposes, either because they are orthogonal or because they are unique. It’s good to have an overall picture of many approaches so that we can use the one that’s best suited for a given purpose.

I. Gaussian elimination and Gauss-Jordan reduction

These are typically the methods that are introduced early in a linear algebra book. Later we figure out more ways to compute a basis, but Gaussian elimination and Gauss-Jordan remain useful nevertheless.

Gauss Elimination

Gaussian elimination takes a matrix $A_{m\times n}$ and applies a series of Type I, Type II and Type III operations to reduce it to an upper triangular matrix $U$. The series of Type I, Type II and Type III operations are encoded in a non-singular matrix $P$, and the decomposition takes the form
$$
PA = U.
$$
Let us break down the matrix $P$ as
$$
P = \begin{pmatrix}
P_1 \\ P_2
\end{pmatrix},
$$
where $P_1$ is of size $r\times m$ and $P_2$ is of size $(m-r)\times m$, where $r$ is the rank of the matrix $A. Moreover, let us define basic columns as the columns corresponding to the pivots during Gaussian Elimination.

Now, we are in position to define bases for all of the four fundamental subspaces (see p176 and p178 of Carl D. Meyer):

  • Basis for $R(A)$: the basic columns in $A$
  • Basis for $R(A^T)$: the nonzero rows in $U$
  • Basis for $N(A)$: the $h_i$’s in the general solution of $Ax=0$ (see my notes on Carl D. Meyer, Chapter 2 — eq (2.2) of my notes)
  • Basis for $N(A^T)$: The rows of $P_2$ above (transposed)

Of note, all the bases can be computed with just one Gaussian Elimination (see end of p175 of Carl D. Meyer).

Gauss-Jordan

Gauss-Jordan is very similar to Gauss elimination and everything we summarized above for the four fundamental subspaces applies to Gauss-Jordan as well; we just replace the $U$ (i.e., the row echelon form computed with Gaussian Elimination) with $E_A$, which is the reduced row echelon form. But there are two key differences:

  • $U$ is computed faster but is not unique
  • $E_A$ takes more time (see 119 of Carl D. Meyer) to compute but is unique

The uniqueness is always a plus for theoretical purposes (e.g. it is used in Example 3.9.2, results (3.9.7–3.9.10), Exercise 3.9.8, p633 of Carl D. Meyer).

A final note about reliability. The standard Gauss-elimination or Gauss-Jordan methods are both not reliable in floating point computation. However, partial pivoting can be used to remedy this issue significantly in practical work (see p28 of Carl D. Meyer).

Conclusions

Both Gaussian Elimination and Gauss-Jordan are tools that you should be aware of and always keep in your mind. Gaussian Elimination (after partial pivoting) is still an efficient approach that can be used for real life problems. The LU factorization, which is nothing but the roadmap for the Gauss elimination (see p312 of Carl D. Meyer), must actually be used in linear algebra packages, testifying to the efficiency of this approach.

Gauss-Jordan is useful for theoretical purposes due to the unique factorization that it provides.

II. QR Factorization

QR factorization takes a set of vectors and creates an orthonormal set of basis vectors out of them. QR typically starts from a linearly independent set of vectors (i.e., a basis), but it is possible to extend it to work with possibly linearly dependent vectors too (e.g., see Exercises 6.4 or 6.5 of David A. Harville; see also here).

Say that the columns of $A$ are linearly independent. Then, you can apply various algorithms that perform QR decomposition and create a matrix $Q$ the columns of which are an orthonormal basis for $R(A)$. The typical textbook algorithm is the Gram-Schmidt algorithm, but it is not good for real life computations as it is unstable. The modified Gram-Schmidt algorithm is more stable, but still it looks like there are other better QR factorizations, such as the Householder transformation (see my notes on Carl D. Meyer, Table 5.1), which is both stable and faster than Gram-Schmidt.

If we split the $Q$ matrix as $Q=(Q_1, \, Q_2)$, then we have that:

  • $R(A) = R(Q_1)$
  • $N(A^T) = R(Q_2)$

Conclusions

Differently from the bases computed via Gaussian elimination or the Gauss-Jordan approach, QR not only gives us a basis, but it gives us orthonormal basis, which have added advantages, such as the trivial solution of $Qx=b$ as $x=Q^t b$, and isometry.

III. URV factorization

This is a more general category of factorization, where any matrix $A$ is factorized as
$$
A = URV= U \begin{pmatrix} C & 0 \\ 0 & 0\end{pmatrix} V^T
$$
where $C$ is a nonsingular matrix with the same rank as $A$, and $U$ and $V$ are orthogonal matrices. In this factorization,

  • The first $r$ columns of $U$ are an orthonormal basis for $R(A)$
  • The last $n-r$ columns of $V$ are an orthonormal basis for $N(A)$
  • The first $r$ columns of $V$ are an orthonormal basis for $R(A^T)$
  • The last $m-r$ columns of $U$ are an orthonormal basis for $N(A^T)$

URV factorization is not unique. However, for a given $R$, there is only one URV factorization. Similarly, any $U$ and $V$ that satisfy the conditions above lead to a URV factorization of $A$. These are both powerful properties; the advantage of uniqueness is rather obvious, but being non-unique can have advantages too.

And how can we compute a URV factorization? Two approaches are: (i) to compute the SVD factorization, which is the special case of a URV factorization where $C$ is diagonal, and (ii) we can apply the QR transformation twice (see here). If we don’t need a diagonal $C$ matrix, SVD would be unnecessarily complex and we’d probably be better of using the latter approach, which is trivial to implement too.

Conclusions

The URV factorization is something that we should be keeping in our toolbox, as it has both theoretical advantages and can be efficiently operationalized by applying two QR transformations (each of which is $O(2n^3/3)$ with Householder reduction; see Table 5.1 of my notes on Carl D. Meyer). And it yields not only bases for all $R(A), N(A), R(A^T)$ and $N(A^T),$ but also these bases are orthonormal.

Moore-Penrose Inverse

The Moore-Penrose inverse $A^\dagger$ is the generalized inverse of the URV factorization

$$
A^\dagger = V\begin{pmatrix}C^{-1} & 0 \\ 0 & 0\end{pmatrix}U^T.
$$

Interestingly, even though the URV factorization is not unique, the Moore-Penrose inverse is unique. (See p422 of Carl D. Meyer). While the main purpose of the Moore-Penrose inverse is not to define bases for the fundamental subspaces of $A$, it nevertheless defines some relationships:

  • $R(A^T) =R(A^\dagger)$
  • $N(A^T) = R(A^\dagger)$

and we cannot say anything for the other fundamental subspaces, $R(A)$ or $N(A)$ (see p504 of D. A. Harville or p428 of Carl D. Meyer).

The Moore-Penrose inverse is a reflexive pseudoinverse that satisfies a unique set of properties (p422 of Carl D. Meyer). Also, in a linear system of equations $Ax=b$, $x=A^\dagger b$ is

  • the solution with minimal Euclidean norm when the system is consistent
  • The least squares solution when it is not consistent

(see here or p423 of Carl D. Meyer.)

The Moore-Penrose inverse is typically not used in applications as it is extremely sensitive to small inconsistencies in the entries of a matrix (see p424 of Meyer). Still, this pseudo-inverse is a useful theoretical and notational tool (see p424).

(You may be wondering that, if least-squares or least-norm solutions are so popular in application, how come the Moore-Penrose –which gives these solutions– is not useful for in application? The answer is that, in practice, a numerically more stable operationalization of the least-squares/least-norm is used; e.g., see p314 of Meyer.)

Conclusions

The Moore-Penrose inverse is more of a theoretical tool rather than a practical tool, and it is not directly relevant for computing bases for the fundamental subspaces of $A$. Still, it is important to keep in mind the relationship between the row space or nullspace of the transpose of $A$ for theoretical purposes.