Linear Algebra

On the importance of the Cauchy-Bunyakovskii-Schwarz Inequality

The triangle inequality $$||x+y||\leq ||x|| + ||y||$$ is possibly the earliest inequality that we learn. Starting from high school, we learn that the shortest distance between two points is a single straight line, and if we want to travel the same distance by two or more straight lines, we will travel longer.

We know very well that the triangle inequality that we so intuitively get is not limited only to the 2D or 3D spaces that we can visualize, but to any $n$-dimensional space. That is, for any $x,y \in \mathbb R^{n}$, $$||x+y||_2\leq ||x||_2 + ||y||_2.$$

This inequality is the cornerstone of many mathematical and by extension engineering fields. But how can we prove and get this identity in spaces that we cannot visualize?

Enter the Cauchy-Bunyakoovski-Schwarz (CBS) inequality. (This is the one known as the Cauchy-Schwarz inequality, often dropping the name of the Russian mathematician who did the generalization –like Schwarz– but well before Schwarz). The CBS inequality does the critical linking between the inner product between two vectors and the concept norm. If we restrict ourselves to spaces of real vectors, the CBS inequality can be written as:
$$
|x^T y| \leq ||x|| ||y||.
$$
In words, the inner product between two vectors cannot larger than the product of the vectors’ norms. Equality can be attained if and only if one vector is the scalar multiple of the other (e.g., see Exercise 5.1.9 of Meyer). (Here we consider only the Euclidean norm, and below we discuss why this norm is so special in this case.)

Once we have the CBS inequality above, it becomes almost trivial to prove the triangle inequality:
$$
\begin{align}
||x+y||^2 &= (x+y)^T (x+y) = ||x||^2 + 2x^Ty +|| y ||^2 ||\\
&\leq ||x||^2+2||x||\, ||y|| + ||y||^2 = (||x||+||y||)^2
\end{align}
$$

Moreover, using what we said above, we know that equality is attained if and only if $x$ is a scalar multiple of $y$, which makes sense: the distance between the sum of two vectors is identical to the sum of the distance if and only if the vectors are on the same line. Thus, the CBS fills in the critical part of generalizing all our intuition on visualizable spaces to $n$-dimensional (and actually, even infinite) spaces.

Another critical thing that CBS allows us to do is to generalize the notion of angles from visualizable to $n$-dimensional ($n$>2) spaces. In 2D/3D geometry, if we use the definition of dot product (i.e., inner) product we have that
$$
\cos\theta = \frac{x^T y}{||x||\,|| y||}.
$$
Clearly, the cosine that we know takes values from the range $[-1, 1]$, and attains the bounds in these ranges. The CBS allows us to generalize the notion of angle to unvisualizable spaces, $|x^T y| \leq ||x|| ||y||$ implies that the cosine properties that we listed (i.e., that it takes values from the range $[-1, 1]$ and attains the bounds of this range –when $x$ is scalar multiple of $y$) all hold.

The point that we highlight in the paragraph above is of critical importance, because the concept of orthogonality is fundamental in linear algebra, and we can talk about orthogonality only if we can talk about angles, since orthogonality is defined through angle (or inner products) — two elements in an inner-product space are orthogonal if and only if their angle is zero. My summary of Chapter 5 of Meyer (e.g., see Figure 5.1 of my summary) expands this point further. In sum, the concept of orthogonality and thereby a big part of linear algebra rests on the CBS inequality.

Why is Euclidean norm so special?

The CBS equality applies for the Euclidean norm, but there is a widely known generalization to p-norms, namely Hölder’s inequality, according to which
$$
|x^T y| \leq ||x||_p ||y||_q,
$$
where $||\cdot ||_p$ is the $p$-norms and $||\cdot ||_q$ is the $q$-norm, and $1/p+1/q=1.$ Clearly, the only $p, q$ values that satisfy the latter equality and $p=q$ is $p=q=2$. In other words, in all the other $p$ and $q$ values, we are talking about different and therefore non-commensurate norms.

This suggests that the angle concept that we just outlined above will not generalize to norms other than the 2-norm (i.e., Euclidean), since we won’t be able to guarantee that the cosine between two angles (as defined above) will vary between $-1$ and $1$ and attain these values. Indeed, the only $p$-norm for which the cosine distance makes sense is the Euclidean norm (p292 of Meyer). Thus, even though the Hölder norm is extremely important for establishing that the triangle inequality holds for all p-norms (see Exercise 5.1.13 of Meyer), it also hints to the fact that if we want to be able to talk about “angles” in higher-dimensional spaces of p-norms, we are limited to Eucliden norm. (Mind you, there are non-p-norms where we can still talk about angles, such as the elliptical norm. In fact, for any norm that is generated from an inner product, we can talk about angles — see p288).