Note to the reader: This is a set of notes for the material for the second term of Linear Algebra. Linear algebra is a very old branch of mathematics so there are excellent textbooks available, some of which are

They often present things in a way that is slightly different to the approach being taken here.

Comments about the lecture notes are very welcome – especially if you find any mistakes and/or typos.

1 Eigenvectors and Diagonalization

1.1 Prologue

If $T\colon{\mathbb R}^n\to{\mathbb R}^n$ is a linear mapping, then $T$ may be represented as an $n\times n$ matrix $M$. But this involves a choice of basis on ${\mathbb R}^n$, and $M$ changes if you choose a different basis. (You might want to back to §7.4 in the Michaelmas Linear Algebra Notes, and check the associated video in DUO, to remind yourself of what it means to write a vector in a different basis and what it means to write a linear transformation in a different basis.) The big theme of this chapter is that a suitable choice of basis makes $M$ diagonal, and it is then easy to do calculations.

A predator-prey model. Let $x_n$ and $y_n$ be the number of owls and mice, respectively, at the end of year $n$. Suppose that the change in population from one year to the next is modelled by ${\bf v}_{n+1}=M{\bf v}_n$, where ${\bf v}_n$ is the column vector with entries $x_n$ and $y_n$, and \[M=\left(\begin{matrix} 0.6 & 0.4 \\ -0.1 & 1.2 \end{matrix}\right).\] So given a starting population ${\bf v}_0$, the population after $n$ years is $M^n{\bf v}_0$. Can we get a formula for $M^n$? In this simple case, we could guess one by trial and error, but for larger matrices that is not feasible. However, if we can construct a change of basis to a diagonal form \[N = P^{-1} M P = \begin{pmatrix} a & 0\\ 0& b\end{pmatrix}\] for some $a,b\in\mathbb{R}$ and invertible $P\in M_{2\times 2}(\mathbb{R})$ then the problem becomes very easy to solve \[M^n = (PNP^{-1})^n = P N^n P^{-1} = P\begin{pmatrix} a^n & 0\\ 0& b^n\end{pmatrix} P^{-1}\,.\] So the idea is to solve the original problem by changing basis so that $M$ is diagonal. This process is called diagonalization, and almost all square matrices can be diagonalized.

Consider the following system of ordinary differential equations for $x(t)$ and $y(t)$: \[\begin{aligned} \frac{dx}{dt} & = -x+4y, \\ \frac{dy}{dt} & = -2x+5y. \end{aligned}\] This can be written in matrix form as \[\frac{d}{dt} {\bf v}= M {\bf v},\] where \[{\bf v}(t) = \begin{pmatrix} x(t)\\y(t) \end{pmatrix} \qquad\text{and}\qquad M=\left(\begin{matrix} -1 & 4 \\ -2 & 5 \end{matrix}\right).\] Now let’s assume that via some change of basis $P$ we can make $M$ diagonal with diagonal entries 1 and 3 (this is true in this case, as we will see in example [ex:diagonalization-1] below) \[N = P^{-1}MP = \begin{pmatrix}1&0\\0&3\end{pmatrix}\, .\] Then the differential equation can be written as \[\frac{d}{dt} {\bf v}= PNP^{-1} {\bf v},\] or equivalently (taking into account that $P$ is a constant and that derivatives are linear) \[\frac{d}{dt}(P^{-1}{\bf v}) = N (P^{-1}{\bf v})\, .\] So if we define \[\begin{pmatrix} \tilde x\\ \tilde y \end{pmatrix} \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=P^{-1}{\bf v}\] we have the much simpler system \[\begin{aligned} \frac{d\tilde x}{dt} & = \tilde x\\ \frac{d\tilde y}{dt} & = 3\tilde y \end{aligned}\] with solution $x(t)=C e^{t}$ and $y(t)=De^{3t}$. Therefore \[\begin{pmatrix}x(t)\\y(t)\end{pmatrix} = P\begin{pmatrix}C e^{t}\\De^{3t}\end{pmatrix}\, .\]

1.2 Eigenvalues and eigenvectors

We start with some simple definitions.

Let $A$ be a square matrix (same number $n$ of rows as columns). A non-zero $n$-vector is an eigenvector of $A$ with eigenvalue $\lambda\in{\mathbb C}$, if \[A {\bf v} = \lambda {\bf v}.\]

(Both the Greek letter, and the hybrid German-English words, are traditional.) Here we are thinking of ${\bf v}$ as a column vector. In particular, \[{\bf 0}=A{\bf v} - \lambda{\bf v} =(A - \lambda I){\bf v}.\] By the fundamental lemma of invertible matrices (§3.6 of the Michaelmas term lecture notes) we have that existence of such a v for a given $\lambda$ is equivalent to the matrix $A - \lambda I$ being non-invertible. Therefore the set of eigenvalues $\lambda$ is the set of solutions to \[\det(A - \lambda I)=0.\] This is called the characteristic equation of the matrix $A$. Notice that $\det(A - t I)$ is a polynomial in the variable $t$, of degree $n$. We define $p_A(t)=\det(A - tI)$ to be the characteristic polynomial of $A$; the eigenvalues are the roots of $p_A$. Note that if ${\bf v}$ is an eigenvector, then so is $c{\bf v}$, for any number $c\neq0$.

If $A$ is a diagonal matrix, then the eigenvalues of $A$ are precisely the entries on the diagonal. For example, \[A=\left(\begin{matrix} -2 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 1 \end{matrix}\right).\] has characteristic polynomial $p_A(t)=-(t+2)(t-3)(t-1)$, and so the eigenvalues are $\lambda=-2,\,3,\,1$.

If $A$ is an upper triangular matrix, then the eigenvalues of $A$ are again the entries on the diagonal. For example \[A=\left(\begin{matrix} -2 & 5 & 1 \\ 0 & 3 & 2 \\ 0 & 0 & 1 \end{matrix}\right)\] has characteristic polynomial $p_A(t)=-(t+2)(t-3)(t-1)$, and so the eigenvalues are $\lambda=-2,\,3,\,1$, as before.

Find the eigenvalues of the matrix \[A=\left(\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}\right).\] Solution. The characteristic polynomial is $p_A(t)=t^2-1$, and so the eigenvalues are $\lambda=\pm1$.

Find the eigenvalues of the matrix \[A=\left(\begin{matrix} -2 & 1 & 2 \\ 2 & 3 & -4 \\ -1 & 1 & 1 \end{matrix}\right).\] Solution. We calculate \[\begin{aligned} p_A(t) & = \det(A-tI ) \\ & = \det\left(\begin{matrix} -2-t & 1 & 2 \\ 2 & 3-t & -4 \\ -1 & 1 & 1-t \end{matrix}\right) \\ & = (-2-t)(3-t)(1-t)+4+4+4(-2-t)+2(3-t)-2(1-t) \\ & = -t^3+2t^2+t-2 \\ & = -(t-2)(t-1)(t+1). \end{aligned}\] The factorization is obtained, for example, by spotting that $t=1$ is a root, and then doing long division by the factor $t-1$. The eigenvalues are $\lambda=2,\,1,\,-1$.

To find the corresponding eigenvectors, we solve $(A-\lambda I){\bf v}={\bf0}.$

The eigenvectors in example [example1-1] are clearly ${\bf v}_1=(1,0,0)$, ${\bf v}_2=(0,1,0)$ and ${\bf v}_3=(0,0,1)$.

The eigenvectors in example [example1-2] are ${\bf v}_1=(1,0,0)$, ${\bf v}_2=(1,1,0)$ and ${\bf v}_3=(-4/3,-1,1)$.

The eigenvectors in example [example1-3] are ${\bf v}_1=(1,1)$ and ${\bf v}_2=(1,-1)$.

The eigenvectors in example [example1-4] are ${\bf v}_1=(1,2,1)$, ${\bf v}_2=(1,1,1)$ and ${\bf v}_3=(2,0,1)$.

Multiple eigenvalues. If we have a polynomial $p(t)$ with degree $N$, the following decomposition holds: \[p(t) =c\, \left( t - \lambda_1\right )^{k_1} \cdot \left( t - \lambda_2\right)^{k_2}\cdot \,\ldots\, \cdot\left( t - \lambda_p\right)^{k_p}\,,\] with $c\in\mathbb{C}$ is non-zero. The roots, $\lambda_i \in \mathbb{C}$, are all distinct, i.e. $\lambda_i\neq \lambda_j$ when $i\neq j$.

The algebraic multiplicity of the root $\lambda_i$ is given by $k_i\in\mathbb{N}$, with $k_i\neq 0$, and $\sum_{i=1}^p k_i = N$.

If an eigenvalue $\lambda$ has multiplicity $k\neq 0 \in \mathbb{N}$ it means that the characteristic polynomial takes the form \[p_A(t) = \left( t - \lambda\right)^k\, Q(t)\,,\] where the polynomial $Q(t)$ has degree $N-k$ and it is non-vanishing on $\lambda$, i.e. $Q(\lambda) \neq 0$.

There are $p$ linearly independent eigenvectors corresponding to $\lambda$, where $1\leq p\leq k$.

That $p\geq 1$ follows from the definition of eigenvalue (the determinant of $A-\lambda I$ vanishes, so there is at least one non-zero vector in its kernel). That $p\leq k$ follows from the fact that, if you have $p$ linearly independent eigenvectors then (as we will see below) you can do a change of basis to put $A$ in the block diagonal form \[\begin{pmatrix} \lambda I_{p\times p} & B \\ 0 & C \end{pmatrix}\] for $B$ and $C$ some matrices of the appropriate dimension. Since changes of basis do not change the characteristic polynomial (see proposition [prop:eigenvalues-for-similar-matrices] below) $\det(A-t I) = (\lambda-t)^p \det(C-tI)$. So the algebraic multiplicity of $\lambda$ is always at least $p$.

The eigenspace $V_{\lambda}$ is the $p$-dimensional vector space spanned by these eigenvectors. Since the eigenspace $V_{\lambda}$ is generated by all the eigenvectors associated with the eigenvalue $\lambda$ we can also write it as \[V_{\lambda} = \ker \left( A-\lambda I \right)\,,\] where the kernel, $\ker$, of a linear operator $B$ is a subspace of the full vector space $V$ given by \[\ker \,B = \left\lbrace {\bf v}\in V \mid B {\bf v}= 0\right\rbrace\,.\]

We call $p=\dim V_\lambda$ the geometric multiplicity of $\lambda$.

Some simple examples are the following:

The $2\times2$ matrix $A=I$ has eigenvalue $\lambda=1$ (twice), and two independent eigenvectors: the eigenspace is 2-dimensional.

By contrast, \[A=\left(\begin{matrix} 1 & 1 \\ 0 & 1 \end{matrix}\right)\] also has eigenvalue $\lambda=1$ (twice), but ${\bf v}=(1,0)$ is the only eigenvector (up to scalar multiplication): the eigenspace is 1-dimensional, namely $\mathop{\mathrm{span}}({\bf v})$.

1.3 The Cayley-Hamilton theorem

[Cayley-Hamilton] Let $B$ be a square matrix with characteristic polynomial $p_B(t)$. Then $p_B(B)=0$, as a matrix equation.

Let us show this first for $2\times2$ matrices by direct calculation. Let \[B=\left(\begin{matrix} a & b \\ c & d \end{matrix}\right).\] Then $p_B(t)=t^2-(a+d)t+(ad-bc)$, and \[p_B(B)=\left(\begin{matrix} a & b \\ c & d \end{matrix}\right) \left(\begin{matrix} a & b \\ c & d \end{matrix}\right) - (a+d)\left(\begin{matrix} a & b \\ c & d \end{matrix}\right) + (ad-bc)\left(\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}\right) =\left(\begin{matrix} 0 & 0 \\ 0 & 0 \end{matrix}\right).\]

The matrix $B$ defines a linear transformation $T$. Let us change basis to a flag basis for $T$, i.e. an ordered basis $\{{\bf v}_1,...,{\bf v}_N\}$ such that $B$ in this basis becomes upper triangular \[A=P^{-1}BP=\left(\begin{matrix} a_{11} && a_{12} && \ldots && a_{1N}\\ 0 && a_{22} && \ldots && a_{2N}\\ \vdots && \ddots && \ddots && \vdots \\ 0 && \ldots && 0 && a_{NN} \end{matrix}\right)\] where $P$ is the change of basis matrix. We use the result (without proving it) that for any linear transformation $T$ we can always find such a change of basis.

The characteristic polynomial does not change under changes of basis (see proposition [prop:eigenvalues-for-similar-matrices] below), so the characteristic polynomial of the matrix in the new basis, which takes the form \[p_{A}(t) = (a_{11}-t) (a_{22}-t) \ldots (a_{NN}-t)\,,\] is the same as the original characteristic polynomial $p_B(t)$. Also, \[\begin{split} p_{A}(A) & = (a_{11}-A) (a_{22}-A) \ldots (a_{NN}-A) \\ & = (P^{-1}a_{11}P-P^{-1}BP)(P^{-1}a_{22}P-P^{-1}BP) \ldots (P^{-1}a_{NN}P-P^{-1}BP)\\ & = P^{-1} (a_{11}-B) (a_{22}-B) \ldots (a_{NN}-B) P\\ & = P^{-1} p_B(B) P \end{split}\] so $p_{A}(A)$ will vanish if and only if $p_B(B)$ vanishes.

We will now show that $p_{A}(A)=0$. To prove this matrix equation we will show for all $1\leq i\leq N$ that $\forall\, {\bf v}\in V_i\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\mathop{\mathrm{span}}({\bf v}_1,\,...\, {\bf v}_{i})$ we have $p_i(A) {\bf v}= 0$, with \[p_i(A) = (A-a_{11} I) (A-a_{22}I ) \ldots (A-a_{ii} I)\,.\] The case $i=N$ is the statement that we are after, since in this case $p_N(A)=p_{A}(A)$ and $V_N$ is the full vector space. We do this using induction on $i$.

For $i=1$ we have that $V_1 = \mathop{\mathrm{span}}({\bf v}_1)$ and $p_1=(A- a_{11} I)$. Clearly, from the upper triangular form of $A$, \[A{\bf v}_1 = a_{11} {\bf v}_1\] so for $i=1$ indeed $p_1{\bf v}=0$ for all ${\bf v}\in V_1$.

Let us assume that the inductive hypothesis is valid up to $i-1$, we will show that it also holds for $i$. To prove it we note that $\forall\, {\bf v}\in V_i$, we can write \[{\bf v}= \alpha {\bf v}_i + {\bf w},\] with $\alpha \in \mathbb{C}$ and ${\bf w}\in V_{i-1}$. By linearity (and noting that $p_i=p_{i-1}(A-a_{ii}I)=(A-a_{ii})p_{i-1}$) \[p_i{\bf v}= \alpha p_{i-1}(A-a_{ii}I) {\bf v}_i + (A-a_{ii})p_{i-1}{\bf w}\] but $p_{i-1}{\bf w}=0$ by the inductive hypothesis, so \[p_i{\bf v}= \alpha p_{i-1}(A-a_{ii}I) {\bf v}_i\] Now, because of the upper triangular form, if we apply $A$ to ${\bf v}_i$ we obtain \[A {\bf v}_i = a_{ii} {\bf v}_i + {\bf w}_{i-1}\] with ${\bf w}_{i-1} \in V_{i-1}$. Therefore \[p_{i-1}(A-a_{ii}I) {\bf v}_i = p_{i-1}{\bf w}_{i-1} = 0\] concluding our theorem.

1.4 The similarity relation

We want to be able to identify matrices differing from one another just by a change of basis, i.e. representing the same linear transformation but in a different basis.

An equivalence relation, $\sim$, on a set $X$, is a binary relation with the following properties:

$a\sim a$ (reflexivity)
$a\sim b \Rightarrow\, b\sim a$ (symmetry)
If $a\sim b$ and $b\sim c\, \Rightarrow\, a\sim c$ (transitivity),

$\forall a,b,c \in X$.

We say that two square matrices $A$ and $B$ (of the same size) are similar, written $A\sim B$, if there exists an invertible matrix $M$ such that $A=MBM^{-1}$.

Similarity is an equivalence relation.

First, the relation is reflexive, namely $A\sim A$ for any $A$: just take $M=I$. Secondly, the relation is symmetric, namely $A\sim B$ implies $B\sim A$: because $A=MBM^{-1}$ implies $B=NAN^{-1}$, where $N=M^{-1}$. Finally, the relation is transitive, namely if $A\sim B$ and $B\sim C$, then $A\sim C$: this follows easily from matrix multiplication. Hence similarity between matrices is indeed an equivalence relation.

So the set of all $n\times n$ matrices gets partitioned into equivalence classes \[\left[ A \right] = \{ B \mid B\sim A,\,\mbox{equivalently} \,B= M A M^{-1}\,\mbox{for some}\,M\in GL(n,\mathbb{R}) \}.\] For example the equivalence class of the identity matrix is simply given by $\left[I \right] = \{I\}$.

Similar matrices have the same eigenvalues.

Suppose $A=MBM^{-1}$. The characteristic polynomial of $A$ is \[p_A(t)=\det(A-tI)=\det[M(B-tI)M^{-1}]= \det(M)\,\det(B-tI)\,\det(M)^{-1}=p_B(t),\] so $A$ and $B$ have the same eigenvalues.

Note the converse statement is not always true: two matrices with the same eigenvalues are not necessarily similar, as the following example shows.

Consider the matrices \[A=\left(\begin{matrix}1 && 0\\ 0 && 1\end{matrix}\right)=I\,, \qquad B=\left(\begin{matrix}1 && \alpha\\ 0 && 1\end{matrix}\right)\] with $\alpha\in \mathbb{R}$ and $\alpha \neq 0$. Clearly $p_A(t) = p_B(t) = (t-1)^2$ so $A$ and $B$ have eigenvalues $\lambda=\{1,1\}$. Despite having the same eigenvalues $A$ cannot possibly be similar to $B$, in fact all the matrices similar to $A$ are given by the equivalence class $\left[A\right] = \{ I \}$ so $B \notin [A]$ which means that $B$ is not similar to $A$.

1.5 Diagonalization by change of basis

A square matrix $A$ is said to be diagonalizable if $A$ is similar to a diagonal matrix (which will then obviously have the eigenvalues of $A$ as its diagonal entries).

Here is a summary of some facts about similarity and diagonalizability. Note that if we allow complex roots (as we do), then the $n\times n$ matrix $A$ has $n$ eigenvalues, but they may not all be distinct.

If $A$ and $B$ have the same eigenvalues, and both are diagonalizable, then $A\sim B$. Being diagonalizable means that $A = M D_1 M^{-1}$ and $B=N D_2 N^{-1}$ for some diagonal matrices $D_1,D_2$. But since they have the same eigenvalues it means that $D_1 \sim D_2$, so by the transitivity of $\sim$ it follows that $A\sim B$.
If $A$ is diagonalizable but $B$ is not, then they cannot be similar. Diagonalizability of $A$ means $A\sim D$ for some diagonal matrix $D$; if $A$ were to be similar to $B$ this would mean, by the transitivity of $\sim$, that $B\sim D$, contradicting our hypothesis.
Not all square matrices are diagonalizable. The matrix $B$ in example [ex:non-similar-same-eigenvalues] was an example: if it was diagonalizable it would be similar to the identity matrix, but we showed there that it was not.
If the eigenvalues of $A$ are distinct, then $A$ is diagonalizable. We will show this below as proposition [prop:diagonalizability-different-eigenvalues].
If $A$ is a real symmetric matrix, then $A$ is diagonalizable. We will discuss this in a later chapter.

The $n\times n$ matrix $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors.

Suppose that $\{{\bf v}_1,{\bf v}_2,\ldots,{\bf v}_n\}$ are a set of linearly-independent eigenvectors with eigenvalues $\{\lambda_1,\lambda_2,\ldots,\lambda_n\}$. Assemble these column vectors to make an $n\times n$ matrix $M=({\bf v}_1, {\bf v}_2, \ldots, {\bf v}_n)$ (note that this matrix is invertible, by theorem 6.3.1 in the Michaelmas lecture notes), and let $D$ denote the diagonal matrix with $\lambda_1, \lambda_2, \ldots, \lambda_n$ down the main diagonal. Now observe that \[\begin{aligned} AM &= (\lambda_1{\bf v}_1, \lambda_2{\bf v}_2, \ldots, \lambda_n{\bf v}_n)\\ &= ({\bf v}_1, {\bf v}_2, \ldots, {\bf v}_n) \, {\rm{diag}}(\lambda_1, \lambda_2, \ldots, \lambda_n)\\ &= MD. \end{aligned}\] Thus $M^{-1}AM=D$, and $A$ is diagonalizable. For the converse, suppose that $M^{-1}AM=D$, this implies that $AM=MD$, and thus the columns of $M$ are eigenvectors of $A$. They form a linearly-independent set since $M$ is invertible.

As we will see later on, eigenvalues corresponding to different eigenvectors will be automatically linearly independent, this means that each eigenspace $V_\lambda$ will be linearly independent from $V_\mu$ when the eigenvalues $\lambda$ is different from the eigenvalue $\mu$.

In particular, we know that $1\leq \dim\, V_\lambda \leq m_\lambda$, where $m_\lambda$ is the algebraic multiplicity of the eigenvalue $\lambda$. Since the degree of the characteristic polynomial is precisely the dimension $n$ of our matrix, it means that if we sum the multiplicities of all the eigenvalues we find $\sum_\lambda m_\lambda = n$.

Hence, for the matrix under consideration to be diagonalizable, we must have for all the eigenspaces $V_\lambda$ that $\dim V_\lambda = m_\lambda$, since only in this case we will have a set of $n$ linearly independent eigenvectors. We can equivalently say (using definition [def:geometric-multiplicity]) that a matrix will be diagonalizable if and only if the geometric and algebraic multiplicities of all eigenvalues are the same.

This observation also tells one how to construct $M$. Here are some examples.

If possible, diagonalize the matrix \[A=\left(\begin{matrix} -1 & 4 \\ -2 & 5 \end{matrix}\right).\] Solution. $p(t)=t^2-4t+3=(t-1)(t-3)$, so the eigenvalues are $\lambda_1=1$ and $\lambda_2=3$.

$\lambda=1$ \[\left(\begin{matrix} 0 \\ 0 \end{matrix}\right)={\bf 0}=(A - I){\bf v} =\left(\begin{matrix} -2 & 4 \\ -2 & 4 \end{matrix}\right) \left(\begin{matrix} v_1 \\ v_2 \end{matrix}\right)\] gives the condition $-2v_1+4v_2=0$, so taking $v_2=1$ gives $v_1=2$, and \[{\bf v}_1=\left(\begin{matrix} 2 \\ 1 \end{matrix}\right).\]
$\lambda=3$ \[\left(\begin{matrix} 0 \\ 0 \end{matrix}\right)={\bf 0}=(A-3I){\bf v} =\left(\begin{matrix} -4 & 4 \\ -2 & 2 \end{matrix}\right) \left(\begin{matrix} v_1 \\ v_2 \end{matrix}\right)\] gives the condition $-4v_1+4v_2=0$, so taking $v_2=1$ gives $v_1=1$, and so \[{\bf v}_2=\left(\begin{matrix} 1 \\ 1 \end{matrix}\right).\]

Therefore we set \[M=\left(\begin{matrix} 2 & 1 \\ 1 & 1 \end{matrix}\right),\quad M^{-1}=\left(\begin{matrix} 1 & -1 \\ -1 & 2 \end{matrix}\right),\] and then \[M^{-1}AM=\left(\begin{matrix} 1 & 0 \\ 0 & 3 \end{matrix}\right).\]

The construction in the previous example gives you $M$; if you want to check your answer without computing $M^{-1}$, just check that $AM=MD$, where $D$ is the diagonal matrix of eigenvalues (in the right order).

Solve the system of ODEs \[\begin{aligned} \dot{x} & = -x+4y, \\ \dot{y} & = -2x+5y\,, \end{aligned}\] where $x=x(t)\,,\,y=y(t)$ and $\dot{x} = \frac{d x(t)}{dt}$, similarly $\dot{y} = \frac{d y(t)}{dt}$.

Solution. The procedure is to diagonalize the system, which amounts to the same thing as changing basis. If ${\bf v}$ denotes the column vector with entries $(x,y)$, put ${\bf w}=M^{-1}{\bf v}$. Then ${\bf w}(t)$ satisfies the system $\dot{\bf w}=D{\bf w}$, where $D={\rm{diag}}(1, 3)$. The solution is $w_1(t)=c_1\exp(t)$, $w_2(t)=c_2\exp(3t)$, with $c_1$ and $c_2$ constant; and then ${\bf v}=M{\bf w}$ gives \[\begin{aligned} x(t) & = 2c_1{\rm e}^t + c_2{\rm e}^{3t}, \\ y(t) & = c_1{\rm e}^t + c_2{\rm e}^{3t}. \end{aligned}\] (At this stage, you should substitute these answers back into the ODEs to check the calculation!)

If possible, diagonalize the matrix \[A=\left(\begin{matrix} 1 & 2 \\ 0 & 1 \end{matrix}\right).\] Solution. Here $p(t)=(1-t)^2$, so there is one eigenvalue are $\lambda=1$ with algebraic multiplicity $2$. Now \[\left(\begin{matrix} 0 \\ 0 \end{matrix}\right)={\bf 0}=(A-I){\bf v} =\left(\begin{matrix} 0 & 2 \\ 0 & 0 \end{matrix}\right) \left(\begin{matrix} v_1 \\ v_2 \end{matrix}\right)\] gives $v_2=0$. So the eigenspace $V_1$ is spanned by \[{\bf v}_1=\left(\begin{matrix} 1 \\ 0 \end{matrix}\right).\] In other words, $V_1$ is 1-dimensional. In view of our theorem, we conclude that $A$ is not diagonalizable.

Eigenvectors that correspond to distinct eigenvalues of $A$ are linearly independent. In particular, if all the eigenvalues of $A$ are distinct, then $A$ is diagonalizable.

Suppose ${\bf v}_1,\,{\bf v}_2$ are eigenvectors of $A$ with eigenvalues $\lambda_1\neq\lambda_2$. Define ${\bf w}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=c_1{\bf v}_1+c_2{\bf v}_2$ and assume that the two eigenvectors satisfy the linear relation ${\bf w}=\bf 0$. Then expanding $A {\bf w}-\lambda_1 \,{\bf w}=(A-\lambda_1)\mathbf{0}=\mathbf 0$ we have \[A {\bf w}-\lambda_1 \,{\bf w}= \lambda_1c_1{\bf v}_1+\lambda_2c_2{\bf v}_2 - \lambda_1(c_1{\bf v}_1+c_2{\bf v}_2)=(\lambda_2-\lambda_1)c_2{\bf v}_1=\bf 0,\] and hence $c_2=0$ (since $\lambda_2\neq\lambda_1$). Similarly $c_1=0$. Thus $\{{\bf v}_1,{\bf v}_2\}$ are linearly independent.

This proof extends directly to $k$ eigenvectors. Suppose ${\bf w}=c_1{\bf v}_1+\ldots+c_k{\bf v}_k={\bf0}$. Let us compute $A {\bf w}-\lambda_k\,{\bf w}=\bf0$ \[A {\bf w}-\lambda_k\,{\bf w}= c_1 (\lambda_1-\lambda_k) {\bf v}_1+...+c_{k-1} (\lambda_{k-1}-\lambda_k) {\bf v}_{k-1} ={\bf w}'=\bf 0\,.\] Note that ${\bf v}_k$ does not appear in ${\bf w}'$, and neither does $c_k$. Now we just iterate this procedure to eliminate all other $c_k$ except $c_1$. That is, expand $A {\bf w}'-\lambda_{k-1} {\bf w}'=\bf 0$ to get rid of ${\bf v}_{k-1}$, and so on. Eventually we end up with \[(\lambda_1-\lambda_2)\ldots(\lambda_1-\lambda_k)c_1{\bf v}_1={\bf0},\] and so $c_1=0$, since all eigenvalues are distinct by assumption. Similarly $c_j=0$, for all $j$.

As mentioned before, from the above proposition we deduce that eigenspaces corresponding to different eigenvalues must be linearly independent. Since we know that the dimension of each eigenspace, $V_\lambda$, is bounded by the algebraic multiplicity, $m_\lambda$, of that particular eigenvalue, we deduce that the only obstruction to finding $n$, linearly independent, eigenvectors comes from all the eigenspaces corresponding to an eigenvalue with $m_\lambda >1$.

Given a matrix $A$, a necessary and sufficient condition for $A$ to be diagonalizable is that $\dim \,V_\lambda = \dim \,\ker\left(A-\lambda I \right) = m_\lambda$, for all the eigenvalues $\lambda$. Only in this case we can decompose the total vector space $V$ as a direct sum of eigenspaces \[V = V_{\lambda_1} \oplus V_{\lambda_2}\oplus...\oplus V_{\lambda_p}\,,\] so that we have precisely $n$ linearly independent eigenvectors since \[\dim V = \dim V_{\lambda_1}+ \dim V_{\lambda_2}+ ...+\dim V_{\lambda_1}=m_{\lambda_1}+m_{\lambda_2}+...+m_{\lambda_p}= n\,,\] since we know that the degree of the characteristic polynomial $p_A(t)$ equals $n$ and it is also equal to the sum of all the multiplicities of all the eigenvalues: \[\mbox{deg} \left(p_A(t) \right)= m_{\lambda_1}+m_{\lambda_2}+...+m_{\lambda_p}=n\,.\]

Note that the above remarks are valid only when we allow ourselves to work over the complex numbers! In particular, eigenvalues might be complex, even when the matrix is real.

Let us try to diagonalize \[A=\left(\begin{matrix} -2 & -1 \\ 5 & 2 \end{matrix}\right).\] Solution. $p(t)=t^2+1$, so the eigenvalues are $\lambda_1={\rm i}$ and $\lambda_2=-{\rm i}$. Then we get \[M=\left(\begin{matrix} -2+{\rm i}& -2-{\rm i}\\ 5 & 5 \end{matrix}\right),\quad M^{-1}=\frac{1}{10} \left(\begin{matrix} -5{\rm i}& 1-2{\rm i}\\ 5{\rm i}& 1+2{\rm i}\end{matrix}\right),\] and \[M^{-1}AM=\left(\begin{matrix} {\rm i}& 0 \\ 0 & -{\rm i}\end{matrix}\right).\]

If $A$ is a real matrix, it may be possible to diagonalize $A$ over ${\mathbb C}$, but not over ${\mathbb R}$, as in this example. In general, we take “diagonalizable” to mean that complex numbers are allowed. But sometimes we might ask whether $A$ is “diagonalizable when ${\mathbb R}$ is the underlying field of scalars”, or “diagonalizable over ${\mathbb R}$”, which means that only real numbers are allowed, and is more restrictive. Note that the eigenvectors belong to ${\mathbb C}^2$, a complex vector space.

${\mathbb C}^n$ is the vector space of $n$-dimensional complex vectors. The standard basis is the same as for ${\mathbb R}^n$, i.e. the ${\bf{z}}\in{\mathbb C}^n$ given by \[{\bf{z}} = \left(\begin{matrix} z_1\\ z_2\\ \vdots\\ z_n \end{matrix}\right)\] with $z_1,...,z_n\in\mathbb{C}$ can be written as \[{\bf{z}} =z_1\,{\bf{e}}_1+z_2\,{\bf{e}}_2+...+z_n\,{\bf{e}}_n = z_1 \left(\begin{matrix} 1\\ 0\\ \vdots\\ 0 \end{matrix}\right)+z_2\left(\begin{matrix} 0\\ 1\\ 0\\ \vdots \end{matrix}\right)+...+z_n\left(\begin{matrix} 0\\ \vdots \\ 0\\ 1 \end{matrix}\right)\,.\] The only thing changing from ${\mathbb R}^n$ to ${\mathbb C}^n$ is the fact that the scalar numbers to be used are now complex numbers.

If \[A=\left(\begin{matrix} -1 & 4 \\ -2 & 5 \end{matrix}\right),\] derive a formula for $A^n$.

Solution. Note that $A=MDM^{-1}$, where $M$ and $D$ are as in example [ex:diagonalization-1]. So \[A^n=MD^nM^{-1} =\left(\begin{matrix} 2-3^n & 2\times3^n-2 \\ 1-3^n & 2\times3^n-1 \end{matrix}\right).\]

Mathematical models often lead to discrete-time evolution systems of the form ${\bf v}_{k+1}=A{\bf v}_k$. These can be solved by diagonalizing $A$, as we did for differential equations. Sometimes we are interested in whether the solution stabilizes, ie ${\bf v}_k\to{\bf v}$ as $k\to\infty$. Such a ${\bf v}$ has to have the property $A{\bf v}={\bf v}$ (eigenvalue 1). Exercise for you: show that the dynamical system defined by \[A=\left(\begin{matrix} 0.8 & 0.1 \\ 0.2 & 0.9 \end{matrix}\right)\] admits a stable solution, and find it (as a unit vector).

Solution. $p(t)=(t-0.7)(t-1)$, ${\bf v}=\pm(1,2)/\sqrt{5}$. The other eigenvector, corresponding to $\lambda=0.7$, is ${\bf v}=\pm(1,-1)/\sqrt{2}$.

Can we define the $k^{th}$ root of a diagonalizable matrix $A$, i.e. a matrix $B =$“$A^{1/k}\,$”, with $k\in \mathbb{N}$, such that $B^k = A$ ? If $A$ is diagonalizable, it means that $M^{-1} A M = D= {\rm{diag}}\left(\lambda_1,...,\lambda_n\right)$ for some $\lambda_i \in\mathbb{C}$. If we define $B=M \tilde{D}M^{-1}$, with $\tilde{D}={\rm{diag}}\left(\lambda_1^{1/k},...,\lambda_n^{1/k}\right)$ we can easily see that $B^k= M{\tilde{D}}^k M^{-1}= M \,D\,M^{-1} = A$.

Note that this matrix $B$ is not unique, for example if we take \[B_1=M {\rm{diag}}\left(\omega_1\lambda_1^{1/k},...,\omega_n\lambda_n^{1/k}\right)M^{-1},\] with $\omega_p$ a $k^{th}$ root of unity, i.e. $\omega_p= e^{2\pi\,i \,p / k}\in\mathbb{C}$, $\omega_p^k=1$, we still have $B_1^k= A$.

2 Inner-product spaces

2.1 Real bilinear forms and inner products

We want to generalise the concept of dot product (or scalar product) between vectors of ${\mathbb R}^3$. As we have seen (you might want to go back to the Michaelmas Linear Algebra notes to remind yourself of what the properties of the dot product are) the scalar product in $\mathbb{R}^3$ takes two $3$-dimensional vectors ${\bf{v}},{\bf{w}}$ and gives you the number ${\bf{v}}\cdot {\bf{w}}$, i.e. in mathematical language we have an operation $(\cdot)\colon {\mathbb R}^3\times {\mathbb R}^3 \mapsto {\mathbb R}$.

In particular the dot product is linear in both “entries” (we will say bilinear), i.e. $(\alpha{\bf{v}_1} +\beta{\bf{v}_2})\cdot {\bf{w}} = \alpha {\bf{v}_1} \cdot {\bf{w}} +\beta {\bf{v}_2} \cdot {\bf{w}}$ and similarly ${\bf{v}}\cdot (\alpha{\bf{w}_1} +\beta{\bf{w}_2}) = \alpha {\bf{v}}\cdot{\bf{w}_1} +\beta {\bf{v}}\cdot {\bf{w}_2}$. This first properly can be easily generalised as follows:

A bilinear form $B$ on a real vector space $V$ is a mapping $B:V\times V \to \mathbb{R}$ assigning to each pair of vectors ${\bf u},{\bf v}\in V$ a real number $B({\bf u},{\bf v})$ satisfying the following for all ${\bf u},{\bf v},{\bf w}\in V$ and all $\lambda\in{\mathbb R}$:

$B({\bf u}+{\bf v}, {\bf w})=B({\bf u}, {\bf w})+B({\bf v}, {\bf w})\,,$
$B( {\bf w},{\bf u}+{\bf v})=B({\bf w}, {\bf u})+B({\bf w}, {\bf v})\,,$
$B( \lambda{\bf u},{\bf v}) = B( {\bf u},\lambda{\bf v})=\lambda B( {\bf u},{\bf v})\,.$

Going back to our initial discussion concerning the dot product over ${\mathbb R}^3$, we have to remember that, besides bilinearity, the scalar product has two other key properties, namely it is symmetric, i.e. ${\bf{v}}\cdot {\bf{w}} = {\bf{w}}\cdot {\bf{v}}$ and it is positive, in the sense that if we compute the dot product of a vector with itself we get a positive value, in fact we obtain its length square, i.e. ${\bf{v}}\cdot {\bf{v}} = v_x^2+v_y^2+v_z^2 \geq 0$. Since we want to generalise the notion of dot product we will not quite work with the most general bilinear form defined above, because this would not have this symmetricity and positivity properties that the dot product has. For these reasons we will require some extra properties and work with very special bilinear forms called inner-products. A bilinear form $B$, which is also positive and symmetric, defines an inner product.

An inner product on a real vector space $V$ is a map $(\cdot , \cdot) :V\times V\mapsto \mathbb{R}$ which assigns to each pair of vectors ${\bf u},{\bf v}\in V$ a real number $({\bf u},{\bf v})$, satisfying the following: for all ${\bf u},{\bf v},{\bf w}\in V$ and all $\lambda\in{\mathbb R}$

$({\bf u}+{\bf v},{\bf w})=({\bf u},{\bf w})+({\bf v},{\bf w})$;
$(\lambda {\bf u},{\bf v})=\lambda({\bf u},{\bf v})$;
$({\bf u},{\bf v})=({\bf v},{\bf u})$ (symmetry);
$({\bf v},{\bf v})\geq0$ with $({\bf v},{\bf v})=0\iff {\bf v}=0$ (positivity).

The first two properties constitute linearity in the first factor; and together with the third one they make the inner product symmetric and bilinear, ie. linear in each factor separately.

A real inner-product space is a real vector space equipped with an inner product.

$V={\mathbb R}^n$ with the standard Euclidean inner product (dot product) \[({\bf u},{\bf v})={\bf u}\cdot{\bf v}=u_1v_1+\dots+u_nv_n.\]

$V={\mathbb R}^n$ and $({\bf u},{\bf v})=a_1u_1v_1+\dots+a_nu_nv_n$, where $a_1,a_2,\ldots,a_n$ are fixed positive real numbers. This is called the weighted Euclidean inner product with weights $a_1,\dots,a_n$. Note that symmetry and bilinearity are immediate, and positivity follows since $a_1,\ldots,a_n>0$.

On $V={\mathbb R}^4$, the expression $({\bf x},{\bf y})=x_1y_1+x_2y_2+x_3y_3-x_4y_4$ is not an inner product: symmetry and bilinearity hold, but positivity does not. For example, ${\bf x}=(1,0,0,1)$ is a non-zero vector with $({\bf x},{\bf x})=0$. However, it is of great importance: from Einstein, we know that this is the pseudo-inner product on flat space-time, with $x_4$ being the time coordinate. The vector space $\mathbb{R}^4$ with this pseudo-inner product is called Minkowski space.

On $V={\mathbb R}^2$, the expression $({\bf x},{\bf y})=x_1y_1+x_1y_2+x_2y_2$ is not an inner product: it is not symmetric. Eg ${\bf x}=(1,1)^T$ and ${\bf y}=(1,0)^T$ gives $({\bf x},{\bf y})=1$ but $({\bf y},{\bf x})=2$.

On $V={\mathbb R}^2$, the expression $({\bf x},{\bf y})=x_1y_1+x_1y_2+x_2y_1+x_2y_2$ is not an inner product: symmetry and bilinearity hold, but positivity does not. To see this, note that $({\bf x},{\bf x})=(x_1+x_2)^2$, so for example ${\bf x}=(1,-1)$ is a non-zero vector with $({\bf x},{\bf x})=0$.

On $V={\mathbb R}^2$, the expression $({\bf x},{\bf y})=x_1y_1+x_1y_2+x_2y_1+2x_2y_2$ is an inner product.

On $V={\mathbb R}^3$, consider the expression $({\bf x},{\bf y})=2x_1y_1+x_2y_2-2x_2y_3-2x_3y_2+kx_3y_3$, where $k\in{\mathbb{R}}$ is fixed. This satisfies symmetry and bilinearity, so the question is whether positivity holds. To investigate this, observe that $({\bf x},{\bf x})=2x_1^2+(x_2-2x_3)^2+(k-4)x_3^2$, so it is an inner product if and only if $k>4$. (Note that the case $k=4$ does not define an inner product.)

$V=C[a,b]$, the vector space of continuous functions $f:[a,b]\to{\mathbb R}$. (We assume $a<b$.) Recall that $V$ is infinite-dimensional, and that the zero vector is the function $f$ which is identically zero. Define \[(f,g)=\int_a^bf(t)\,g(t)\,dt.\] (This scalar product is precisely the scalar product inducing the so called $L^2$ norm over $C[a,b]$.) Note that this is symmetric and bilinear. Also \[(f,f)=\int_a^bf(t)^2\,dt\geq0\] and if $f(t)$ is not identically zero on $[a,b]$ then $(f,f)>0$. For in this case there exists $t_0\in[a,b]$ such that $f\left(t_0\right)=c\not=0$ and hence, using the continuity of $f$, there is a small neighbourhood $(t_0-\delta,t_0+\delta)$, with $\delta >0$, around $t_0$ where $f(t)^2\geq\frac{c^2}{4}$, so integrating gives $(f,f)=\int_a^bf(t)^2\,dt>0$. Conclusion: this is an inner product.

(Weighted version of the previous example.) Let $k\colon [a,b]\to{\mathbb R}$ be such that $k(t)>0$ for all $t\in(a,b)$, with $k$ a continuous function also called kernel, and define $(f,g)=\int_a^bk(t)\,f(t)\,g(t)\,dt$. More on this later.

(At this point you might want to go back to the Michaelmas lecture notes and refresh your memory about the vector space of polynomials with real coefficients.) Let ${\mathbb R}[x]_n$ denote the space of polynomials in $x$, of degree at most $n$, with real coefficients, i.e. ${\mathbb R}[x]_n = \{a_0+...+a_n x^n\,\mbox{with}\,a_i\in\mathbb{R}\}$. This is a vector space of dimension $n+1$. For any interval $[a,b]$, it is a subspace of the infinite-dimensional space $C[a,b]$ above. The form $(f,g)=\int_a^bf(t)\,g(t)\,dt$ is one choice of inner product. A natural basis is $\{1,x,\ldots,x^{n-1},x^n\}$; so if \[p(x)=p_nx^n+\ldots+p_0\] is a polynomial, then its components with respect to this basis are ${\bf p}=(p_0,\ldots,p_n)$. The expression for $({\bf p},{\bf q})$ depends on $a$, $b$ and $n$. For $a=0$, $b=1$ and $n=1$ we get \[({\bf p},{\bf q})=p_0 q_0 + \frac{1}{2}p_1 q_0 + \frac{1}{2}p_0 q_1 + \frac{1}{3}p_1 q_1.\]

2.1.1 Matrix version

Pick a basis $\{{\bf v}_1,...,{\bf v}_n\}$ for $V=\mathbb{R}^n$. Given two vectors ${\bf u}$ and ${\bf w}$, we denote their coordinates in this basis as $x_i$ and $y_i$. That is, ${\bf u}= x_1{\bf v}_1 + \ldots + x_n{\bf v}_n$ and ${\bf w}= y_1{\bf v}_1+\ldots+y_n{\bf v}_n$.

Now, for any real bilinear form $B$ we have \[\begin{aligned} B({\bf u}, {\bf w}) & = B\left(\sum_{i=1}^n x_i {\bf v}_i, \sum_{j=1}^n y_j {\bf v}_j\right)\\ & = \sum_{i=1}^n\sum_{j=1}^n x_i B({\bf v}_i,{\bf v}_j) y_j\,,\end{aligned}\] If we think of the $x_i$ and $y_j$ as column vectors ${\bf x}$ and ${\bf y}$, and we define a matrix $A_{ij}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=B({\bf v}_j,{\bf v}_i)$ (note the index structure), we conclude that in the $\{{\bf v}_i\}$ basis \[\begin{equation} \label{eq:matrix-representation-of-bilinear-form} B({\bf u}, {\bf w}) = (A{\bf x})^t{\bf y}= {\bf y}^t A {\bf x}\, . \end{equation}\]

We refer to the matrix $A_{ij}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=B({\bf v}_j,{\bf v}_i)$ as the matrix form of $B$ in the basis $\{{\bf v}_i\}$.

Many authors choose a different convention for the matrix form of a bilinear map, given by $A_{ij}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=B({\bf v}_i,{\bf v}_j)$ (note the opposite ordering of the indices). The form satisfies $B({\bf u},{\bf w})={\bf x}^t A {\bf y}$ instead. Which form you choose is a matter of conventions; we will choose the conventions in definition [def:matrix-form].

We now want to understand now how the properties of being symmetric and positive definite translate into for the matrix form, and for this we need a couple of definitions.

We say that an $n\times n$ matrix $B$ is symmetric if \[B^t=B,\] and anti-symmetric or skew-symmetric if \[B^t=-B.\]

Every matrix $B$ can be decomposed into it symmetric part $B_+$ and anti-symmetric part $B_-$ simply via \[B= \frac{B+B^t}{2} + \frac{B-B^t}{2}=B_+ +B_-\,,\] where $B_+ \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\frac{B+B^t}{2}$ and $B_-\mathrel{\mathop:}\mathrel{\mkern-1.2mu}= \frac{B-B^t}{2}$. Clearly $B_+$ is a symmetric matrix and is $B_-$ an anti-symmetric matrix.

A symmetric $n\times n$ matrix $B$ is positive-definite if \[{\bf v}^t B {\bf v}> 0\] for all non-zero vectors ${\bf v}\in\mathbb{R}^n$.

A symmetric matrix $B$ is positive-definite if and only if its eigenvalues are all positive.

For the $(\implies)$ part of the proof, for each eigenvalue $\lambda_i$ choose an eigenvector ${\bf v}_i$ with eigenvalue $\lambda_i$. For a positive-definite matrix we then have \[0 < {\bf v}_i^t B {\bf v}_i = \lambda_i {\bf v}_i^t{\bf v}_i = \lambda_i {\bf v}_i\cdot {\bf v}_i\, .\] But ${\bf v}_i\cdot{\bf v}_i>0$, so we conclude that $\lambda_i>0$.

For the $(\Longleftarrow)$ implication: we will see in §2.6 that given a real $n\times n$ symmetric matrix $B$, one can form a basis for $\mathbb{R}^n$ formed by eigenvectors ${\bf v}_i$ of $B$, with eigenvalue $\lambda_i$, and such that ${\bf v}_i\cdot {\bf v}_j=0$ for $i\neq j$. If we express an arbitrary non-zero vector ${\bf v}$ in this basis as ${\bf v}=\sum_{i=1}^n c_i {\bf v}_i$ (with at least one $c_i\neq 0$) we have \[\begin{split} {\bf v}^t B {\bf v}& = \left(\sum_{i=1}^n c_i {\bf v}_i\right) \cdot \left(B\sum_{j=1}^n c_j {\bf v}_j\right)\\ & = \left(\sum_{i=1}^n c_i {\bf v}_i\right)\cdot \left(\sum_{j=1}^n c_j \lambda_j {\bf v}_j\right)\\ & = \sum_{i=1}^n c_i^2 ({\bf v}_i\cdot{\bf v}_i) \lambda_i\, . \end{split}\] where in going from the second to the third lines we have used ${\bf v}_i\cdot {\bf v}_j$ for $i\neq j$. The last line is greater than 0 if all the eigenvalues are positive.

(Without proof.) An alternative, equivalent, criterion for a real symmetric matrix $B$ to be positive-definite is that all $n$ of its upper-left square submatrices, i.e.

the upper left 1-by-1 corner of $B$,
the upper left 2-by-2 corner of $B$,
the upper left 3-by-3 corner of $B$,
...
$B$ itself,

have positive determinant. This is called Sylvester’s criterion, and these subdeterminants are called the leading principal minors of $B$.

For example [ex:inner-product-e] above, we have \[B=\left(\begin{matrix} 1 & 1 \\ 1 & 1\end{matrix}\right);\] the two subdeterminants are 1 and 0, so $B$ is not positive-definite.

For example [ex:inner-product-g] above, we have \[B=\left(\begin{matrix} 2 & 0 & 0 \\ 0 & 1 & -2 \\ 0 & -2 & k \end{matrix}\right);\] the three subdeterminants are $2$, $2$ and $2(k-4)$, so $B$ is positive-definite if and only if $k-4>0$.

Consider a bilinear form $B$ on $\mathbb{R}^n$, and denote by $A$ its matrix form for some given choice of basis for $\mathbb{R}^n$, as in $\eqref{eq:matrix-representation-of-bilinear-form}$. $B$ defines an inner product on $\mathbb{R}^n$ if and only if its matrix representation $A$ is a positive-definite symmetric matrix.

The proof is just a matter of comparing the definitions for inner product and positive-definite symmetric matrix, and is left as an exercise.

2.2 Hermitian inner products

We now repeat the analysis above for complex vector spaces.

A complex (or hermitian) inner product on a complex vector space $V$ assigns to each pair of vectors ${\bf u},{\bf v}\in V$ a complex number $\langle{\bf u},{\bf v}\rangle$, satisfying the following for all ${\bf u},{\bf v},{\bf w}\in V$ and all $\lambda\in{\mathbb C}$:

$\langle{\bf u}+{\bf v},{\bf w}\rangle= \langle{\bf u},{\bf w}\rangle+\langle{\bf v},{\bf w}\rangle$;
$\langle\lambda {\bf u},{\bf v}\rangle =\lambda\langle{\bf u},{\bf v}\rangle$;
$\langle{\bf u},{\bf v}\rangle= \overline{\langle{\bf v},{\bf u}\rangle}$ (hermiticity);
$\langle{\bf v},{\bf v}\rangle\geq0$ with $\langle{\bf v},{\bf v}\rangle=0\iff {\bf v}=0$ (positivity).

In these notes $\overline{z}=\mbox{Re}(z)-{\rm i}\,\mbox{Im}(z)$ denotes the complex conjugate of the complex number $z=\mbox{Re}(z)+{\rm i}\,\mbox{Im}(z)$.

Note that $\langle{\bf v},{\bf v}\rangle$ is always real, by hermiticity.

The first two properties constitute linearity in the first factor. Note that the inner product is not quite linear in the second factor, since we see that $\langle{\bf u},\lambda{\bf v}\rangle =\overline{\lambda}\langle{\bf u},{\bf v}\rangle$, this property is sometimes called sesquilinearity.

We are using $(\ ,\ )$ for real inner products, and $\langle\ ,\ \rangle$ for hermitian inner products, but this notation is not used by all authors.

Simplest case: $V={\mathbb C}$. The standard inner product on ${\mathbb C}$ is $\langle z,w\rangle=z\overline{w}$. Note that $\langle z,z\rangle=|z|^2$.

The standard hermitian inner product on ${\mathbb C}^n$ is given by $\langle{\bf z},{\bf w}\rangle=z_1\overline w_1+\dots+z_n\overline w_n$. Note that $\langle{\bf z},{\bf z}\rangle=|z_1|^2+\ldots+|z_n|^2$.

$V={\mathbb C}^2$ and $\langle{\bf z},{\bf w}\rangle =3z_1\overline w_1+2z_2\overline w_2$.

$V={\mathbb C}^2$ and $\langle{\bf z},{\bf w}\rangle =3z_1\overline w_1+2z_2\overline w_2+{\rm i}z_1\overline w_2- {\rm i}z_2\overline w_1$. Note that hermiticity is satisfied, and the linearity properties are satisfied since the right hand side is linear in $z_1,z_2$. For positivity, we have \[\begin{aligned} \langle{\bf z},{\bf z}\rangle &= 3z_1\overline z_1+2z_2\overline z_2 +{\rm i}z_1\overline z_2-{\rm i}z_2\overline z_1\\ &= 2|z_1|^2+|z_2|^2+|z_1-iz_2|^2\\ &\geq0,\end{aligned}\] with equality if and only if $z_1=z_2=0$. So positivity is satisfied as well.

$V={\mathbb C}^2$ and $\langle{\bf z},{\bf w}\rangle =z_1\overline w_1+{\rm i}z_1\overline w_2-{\rm i}z_2\overline w_1+z_2\overline w_2$ satisfies linearity and hermiticity, but positivity fails, since $\langle{\bf z},{\bf z}\rangle=|z_1-{\rm i}z_2|^2$ which is zero for ${\bf z}=(1,-{\rm i})$.

$V={\mathbb C}^2$ and $\langle{\bf z},{\bf w}\rangle =z_1\overline w_1+{\rm i}z_1\overline w_2+{\rm i}z_2\overline w_1+z_2\overline w_2$ is not hermitian.

Let $V$ be the set of continuous complex-valued functions $f:[a,b]\to{\mathbb C}$, with $b>a$ and \[\langle f,g\rangle=\int_a^bf(t)\,\overline{g(t)}\,dt\] As an exercise, check that this defines an inner-product space.

Let ${\mathbb C}[x]_n$ denote the space of polynomials in $x$, of degree at most $n$, with complex coefficients. This is a complex vector space of complex dimension $n+1$. For any interval $[a,b]$, it is a subspace of the infinite-dimensional space $V$ above. A natural basis is $\{1,x,\ldots,x^{n-1},x^n\}$; so if \[p(x)=p_nx^n+\ldots+p_0\] is a polynomial, then its components with respect to this basis are ${\bf p}=(p_0,\ldots,p_n)^T$. As for the real case $\mathbb{R}[x]_n$, we could use $\int_a^b$ as an inner product. For $a=0$, $b=1$ and $n=1$ we get \[\langle{\bf p},{\bf q}\rangle=p_0 \overline{q_0} + \frac{1}{2}p_1 \overline{q_0} + \frac{1}{2}p_0 \overline{q_1} + \frac{1}{3}p_1 \overline{q_1},\] that can also be rewritten as \[\langle{\bf p},{\bf q}\rangle= (\overline{q_0},\overline{q_1}) \left( \begin{matrix} 1 && 1/2\\ 1/2 &&1/3 \end{matrix} \right) \left(\begin{matrix} p_0\\p_1 \end{matrix}\right)\,.\]

2.2.1 Matrix version of the hermitian product

As we did in the real case using bilinearity we can use the sesquilinearity to represent an hermitian inner product in matrix form after having fixed a basis. The properties of hermiticity and positive definiteness of the hermitian inner-product then translate to the requirements that its matrix form is hermitian and positive-definite. We will define these concepts below.

The Hermitian transpose (or complex-conjugate transpose) $B^*$ of a matrix $B$ is the matrix with components \[\left(B^*\right)_{ij} = \overline{B_{ji}}\, .\]

Note that sometimes in textbooks $B^*$ is denoted with $B^\dag$.

If we denote by $\overline{B}$ the matrix such that \[\left(\overline{B}\right)_{ij} = \overline{B_{ij}}\] (that is, the matrix whose entries are the complex conjugates of those of $B$) then \[B^* = \left(\overline{B}\right)^t = \overline{B^t}\, .\]

If $A$ and $B$ are matrices, then $(AB)^*=\overline{(AB)^t}=\overline{B^tA^t}= B^*A^*$.

If $V={\mathbb C}^n$ with basis $\{{\bf v}_i\}$, then an inner product on $V$ is realized by a $n\times n$ matrix $B$ as \[\langle{\bf u},{\bf w}\rangle={\bf y}^* B{\bf x}\] where ${\bf x}$ and ${\bf y}$ are the vector of components of ${\bf u}$ and ${\bf w}$ in the $\{{\bf v}_i\}$ basis (so ${\bf u}=x_1{\bf v}_1+\ldots+x_n{\bf v}_n$ and ${\bf w}=y_1{\bf v}_1+\ldots+y_n{\bf v}_n$) and $B$ is the matrix with components \[B_{ij} = \langle {\bf v}_j,{\bf v}_i \rangle \,.\] (Note the labels of the indices.) The matrix $B$ is furthermore hermitian and positive-definite, properties to be defined below.

The argument is analogous to the one in the real case. That $B$ has the components given in the proposition follows from expanding ${\bf u},{\bf w}\in V$ in the $\{{\bf v}_i\}$ basis, and using (sesqui)linearity. That the resulting matrix $B$ is hermitian and positive-definite follows from the definitions below, and is left as an exercise.

We say that a matrix $B$ is hermitian if $B^*=B$. Similarly $B$ is an anti-hermitian matrix if $B^*=-B$.

$A$ is Hermitian if and only if ${\bf y}^*A{\bf x}= (A{\bf y})^*{\bf x}$ for all ${\bf x},{\bf y}\in{\mathbb C}^n$. If we choose the standard inner product on ${\mathbb C}^n$, this can be written as $\left\langle A{\bf x},{\bf y}\right\rangle=\left\langle{\bf x},A{\bf y}\right\rangle$ for all ${\bf x},{\bf y}\in{\mathbb C}^n$.

This follows by choosing ${\bf x}$ and ${\bf y}$ the standard basis vectors of ${\mathbb C}^n$. For instance, choosing ${\bf x}=\mathbf{e}_i$ and ${\bf y}=\mathbf{e}_j$ we have \[\left\langle\mathbf{e}_i,A\mathbf{e}_j\right\rangle = \left\langle\mathbf{e}_i,\sum_{k=1}^nA_{kj}\mathbf{e}_k\right\rangle = \overline{A_{ij}}\] and similarly $\left\langle A\mathbf{e}_i,\mathbf{e}_j\right\rangle=A_{ji}$.

If $B$ is real then it is (anti-)hermitian if and only if it is (anti-)symmetric.

As for the real case, every matrix can be decomposed as a sum of two matrices, one hermitian and the other anti-hermitian: \[B=\frac{B+B^*}{2} + \frac{B-B^*}{2}= B_+ + B_-\,,\] where $B_+=\frac{B+B^*}{2}$ is hermitian and $B_-=\frac{B-B^*}{2}$ is anti-hermitian.

We say that a hermitian matrix $B$ is positive-definite if \[{\bf v}^* B {\bf v}> 0\] for all non-zero ${\bf v}\in{\mathbb C}^n$.

A hermitian matrix has real eigenvalues.

See the proof of proposition [prop:Hermitian-real-eigenvalues] below.

A hermitian matrix is positive-definite if and only if its eigenvalues are all positive.

The proof is entirely analogous to the one in the real case (proposition [prop:positivity-eigenvalues]), just by replacing ${\bf v}^t$ by ${\bf v}^*$ everywhere. Note that for a nonzero vector ${\bf v}$ we always have ${\bf v}^*{\bf v}> 0$.

(Sylvester’s criterion, without proof.) A hermitian matrix $B$ is positive-definite if and only if the determinants of all of its leading minors are positive.

For example [ex:hermitian-product-d] above, we have \[B=\left(\begin{matrix} 3 & -{\rm i}\\ {\rm i}& 2\end{matrix}\right)\] which is hermitian; the two subdeterminants are 3 and 5, so $B$ is positive-definite.

For example [ex:hermitian-product-e] above, we have \[B=\left(\begin{matrix} 1 & {\rm i}\\ {\rm i}& 1\end{matrix}\right),\] which is not hermitian. (It is symmetric, but that is not what is needed in the complex case.)

2.3 Norm, orthogonality and the Cauchy-Schwarz inequality

As we will shortly see, once we endow a real (complex) vector space with an (hermitian) inner product we will be able to define what the length of a vector is in that given inner product. However we can define a sensible notion of “length” without requiring the presence of an inner product and this very general concept that goes under the name of norm.

Given a vector space $V$ over $\mathbb{R}$ (or $\mathbb{C}$), a norm on $V$ is a function usually denoted by $||\cdot ||: V\to \mathbb{R}$ with the following properties

$|| a\,{\bf v}|| = \vert a \vert\,|| {\bf v}||$ (absolute homogeneity),
$|| {\bf v}+\bf{u} || \leq || {\bf v}|| + || \bf{u} ||$ (triangle inequality or subadditivity),
If $|| {\bf v}|| = 0$ then ${\bf v}=0$ (separation of points),

for all $a\in \mathbb{R}$ (or $\mathbb{C}$) and all ${\bf v},\bf{u}\in V$.

Note that from the absolute homogeneity property we have $|| \bf{0} || =0$ and $|| {\bf v}||=||-{\bf v}||$, so from the triangle inequality it follows that \[|| {\bf v}|| \geq 0\qquad\forall \,{\bf v}\in V \qquad\mbox{(non-negativity)}\,.\]

The triangle inequality takes its name from the well known fact that in a triangle the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side.

2.3.1 The norm induced by a real inner product

From the definition above it is clear that in a certain sense a norm on a vector space gives us a good definition for the “length” of vectors. If we only care about defining lengths of vectors there are various examples of norms that we can put on the same vector space (See Problem Sheets). However once we endow a vector space with an inner product we have a privileged norm usually referred to as the norm induced by the inner product.

If $(V,(\ ,\ ))$ is a real inner product space, then we define the norm $||{\bf v}||$ of a vector ${\bf v}\in V$ by \[||{\bf v}||\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\sqrt{({\bf v},{\bf v})}\, .\]

One could also call this the length of the vector, but using the word norm emphasizes that the inner product may not be the standard one.

A vector ${\bf v}\in V$ is a unit vector if $||{\bf v}||=1$.

Two vectors ${\bf u}$ and ${\bf v}$ are orthogonal if $({\bf u},{\bf v})=0$.

An orthonormal basis of $V$ is a basis consisting of mutually orthogonal unit vectors. More on this later on.

Since we just called $||{\bf v}||=\sqrt{({\bf v},{\bf v})}$ a norm, it is clear that it will satisfy the properties outlined in the definition above. It is straightforward to see that the absolute homogeneity and separation of points properties satisfied by a norm are satisfied following from the properties of the inner products. We are left with proving that the norm just introduced satisfies the triangle inequality.

(Pythagoras) [thm:pythagoras] If ${\bf u}$ and ${\bf v}$ are orthogonal, then $||{\bf u}+{\bf v}||^2=||{\bf u}||^2+||{\bf v}||^2$.

The proof is a direct calculation: expand $||{\bf u}+{\bf v}||^2$ using the linearity properties of the inner product.

(the real Cauchy-Schwarz inequality) [thm:real-Cauchy-Schwarz] We have \[({\bf u},{\bf v})^2\leq({\bf u},{\bf u})({\bf v},{\bf v})\] with equality if and only if ${\bf u}$ and ${\bf v}$ are linearly dependent. Equivalently, $|({\bf u},{\bf v})|\leq||{\bf u}||.||{\bf v}||$.

This is trivial if ${\bf u}=0$, so suppose that ${\bf u}\neq0$. Then for all real numbers $x$ we have \[||x{\bf u}-{\bf v}||^2=x^2||{\bf u}||^2-2x({\bf u},{\bf v})+ ||{\bf v}||^2 \geq0,\] with equality if and only if $x{\bf u}-{\bf v}=0$.

Let us think of this expression as a quadratic expression in $x$. If ${\bf u}$ and ${\bf v}$ are linearly independent then there is no $x$ such that $x{\bf u}={\bf v}$. This implies that the quadratic equation on $x$ has no real roots. For a quadratic equation, this happens if and only if the discriminant is negative: \[({\bf u},{\bf v})^2 - ||{\bf u}||^2. ||{\bf v}||^2< 0\, .\]

On the other hand, if ${\bf u}$ and ${\bf v}$ are linearly dependent, then you it follows from the linearity properties of the inner product that \[({\bf u},{\bf v})^2 - ||{\bf u}||^2. ||{\bf v}||^2 = 0\, .\]

(the triangle inequality.) For all ${\bf u},{\bf v}\in V$, we have \[||{\bf u}+{\bf v}||\leq ||{\bf u}||+||{\bf v}||\, .\]

\[\begin{aligned} \left(||{\bf u}||+||{\bf v}||\right)^2-||{\bf u}+{\bf v}||^2 &= ||{\bf u}||^2+2||{\bf u}||.||{\bf v}|| +||{\bf v}||^2-||{\bf u}||^2-2({\bf u},{\bf v})-||{\bf v}||^2\\ &= 2\left[||{\bf u}||.||{\bf v}||-({\bf u},{\bf v})\right]\\ &\geq 0.\end{aligned}\]

We see that, whenever we have a scalar product $(\cdot,\cdot)$ on a real vector space $V$, the function $|| \cdot||:V\to \mathbb{R}$ given by $||{\bf v}||=\sqrt{({\bf v},{\bf v})}$, defines a norm on $V$, usually we say that this is the norm induced by the inner product. Note however that our definition of norm is more general and does not rely on the presence of an inner product: there are norms which are not induced by any inner products.

For a real inner-product space, the angle $\theta\in[0,\pi]$ between ${\bf u}$ and ${\bf v}$, given by $({\bf u},{\bf v})=||{\bf u}||.||{\bf v}||.\cos\theta$ , is well-defined.

From the Cauchy-Schwarz inequality we know that $- ||{\bf u}||.||{\bf v}||\leq ({\bf u},{\bf v})\leq ||{\bf u}||.||{\bf v}||$. Additionally, if ${\bf u}$ and ${\bf v}$ are orthogonal vectors $({\bf u},{\bf v})=0$ so the angle between them is $\theta = \pm\pi/2$ as expected. If ${\bf u}$ and ${\bf v}$ are parallel to one another, i.e. linearly dependent with ${\bf v}= x\,{\bf u}$ with $x >0$, the angle is $\theta =0$, while if they are anti-parallel, i.e. linearly dependent ${\bf v}= x\,{\bf u}$ with $x <0$, the angle is $\theta =\pi$.

$V={\mathbb R}^2$ with the standard inner product. Take ${\bf u}=(1,0)$ and ${\bf v}=(\cos\theta,\sin\theta)$. Both are unit vectors, and $({\bf u},{\bf v})=\cos\theta$.

$V={\mathbb R}^2$ with inner product $({\bf u},{\bf v})=4u_1v_1+u_1v_2+u_2v_1+u_2v_2$. Let ${\bf u}=(1,0)$ and ${\bf v}=(0,1)$. Then $||{\bf u}||=2$, $||{\bf v}||=1$, and $\cos\theta=1/2$, so $\theta=\pi/3$.

$V=C[-1,1]$ with the inner product $(f,g)=\int_{-1}^{1}f(t)\,g(t)\,dt$. Then $||1||=\sqrt{2}$, $||t||=\sqrt{2/3}$, and $1,t$ are orthogonal. However, $(1,t^2)=\int_{-1}^{1}t^2\,dt=2/3$, so $1,t^2$ are not orthogonal.

$V=C[0,1]$ with the inner product $(f,g)=\int_0^{1}f(t)\,g(t)\,dt$. Then $||1||=1$, $||t||=\sqrt{1/3}$, and $(1,t)=1/2$. If $\theta$ is the angle between $1$ and $t$, then $\cos\theta =\sqrt{3}/2$, and so $\theta=\pi/6$.

2.3.2 The norm induced by a hermitian inner product

The same story can be repeated for the norm induced by an hermitian inner product $\langle\,,\,\rangle$ on a complex vector space $V$.

Let $(V,\langle\ ,\ \rangle)$ be an hermitian inner product space, then we define the norm $||{\bf v}||$ of a vector ${\bf v}\in V$ by \[||{\bf v}||\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\sqrt{\langle {\bf v},{\bf v}\rangle}\, .\]

A vector ${\bf v}\in V$ is a unit vector if $||{\bf v}||=1$.

Two vectors ${\bf u}$ and ${\bf v}$ are orthogonal if $\langle{\bf u},{\bf v}\rangle=0$.

The only unit vectors in ${\mathbb R}$ with the standard inner product are $\pm 1$; but in ${\mathbb C}$ with the standard inner product, $e^{{\rm i}\theta}=\cos(\theta)+{\rm i}\sin(\theta)$ is a unit vector for any $\theta\in[0,2\pi)$.

In ${\mathbb{C}}^2$ with the standard hermitian inner product, find all the unit vectors ${\bf v}$ which are orthogonal to ${\bf u}=({\rm i},1)$.

Solution. $\langle{\bf v},{\bf u}\rangle=-{\rm i}v_1+v_2$, so ${\bf v}=(c,{\rm i}c)$ for some $c\in{\mathbb{C}}$. Then $||{\bf v}||^2=2|c|^2$, so to get a unit vector we must impose $|c|=1/\sqrt{2}$. The solution is ${\bf v}=(1,{\rm i}){\rm e}^{{\rm i}\theta}/\sqrt{2}$ for $\theta\in[0,2\pi)$.

As for the real case seen above, also for the norm induced by an hermitian product $\langle\,,\,\rangle$, it is straightforward to see that the absolute homogeneity and separation of points properties of a norm are satisfied, simply following from the properties of the inner products. We are left with proving that the norm just introduced satisfies the triangle inequality.

(the complex Cauchy-Schwarz inequality) [thm:complex-Cauchy-Schwarz] Let $(V,\langle\,,\,\rangle)$ be an hermitian inner product space, then for all vectors ${\bf u},\,{\bf v}\in V$ we have \[|\langle{\bf u},{\bf v}\rangle|^2\leq||{\bf u}||^2||{\bf v}||^2\,,\] with equality if and only if ${\bf u}$ and ${\bf v}$ are linearly dependent.

Essentially the same as the real case. Let us assume ${\bf u}\neq 0$ and $\langle{\bf u},{\bf v}\rangle \neq 0$, otherwise the statement is trivial and let us consider the following quadratic expression in $x\in \mathbb{C}$ \[||x{\bf u}-{\bf v}||^2=|x|^2||{\bf u}||^2-2\mathop{\mathrm{Re}}\left( x \langle{\bf u},{\bf v}\rangle\right)+ ||{\bf v}||^2 \geq0,\] with equality if and only if $x{\bf u}-{\bf v}=0$.

Since this expression must be non-negative for all $x\in \mathbb{C}$ we can choose in particular \[x = \lambda\frac{|\langle {\bf u},{\bf v}\rangle|}{\langle {\bf u},{\bf v}\rangle}\,,\] with $\lambda \in \mathbb{R}$.

The above inequality becomes \[\lambda^2||{\bf u}||^2-2\lambda | \langle{\bf u},{\bf v}\rangle| + ||{\bf v}||^2\geq 0\,,\] and at this point we can apply the same argument we used in the real Cauchy-Schwarz inequality (theorem [thm:real-Cauchy-Schwarz]).

Using this theorem we can show that the norm induced by an hermitian inner product satisfies the triangle inequality.

(the triangle inequality) For all ${\bf u},{\bf v}\in V$, we have \[||{\bf u}+{\bf v}||\leq ||{\bf u}||+||{\bf v}||\, .\]

\[\begin{aligned} \left(||{\bf u}||+||{\bf v}||\right)^2-||{\bf u}+{\bf v}||^2 &= ||{\bf u}||^2+2||{\bf u}||.||{\bf v}|| +||{\bf v}||^2+\\ &\phantom{=}-||{\bf u}||^2-\langle{\bf u},{\bf v}\rangle-\langle{\bf v},{\bf u}\rangle-||{\bf v}||^2\\ &= 2\left[\,||{\bf u}||.||{\bf v}||-\mathop{\mathrm{Re}}(\langle {\bf u},{\bf v}\rangle)\, \right]\\ & \geq 2\left[\,||{\bf u}||.||{\bf v}||-\vert \langle {\bf u},{\bf v}\rangle \vert\,\right]\\ &\geq 0, \end{aligned}\] where we used the fact that $\langle{\bf u},{\bf v}\rangle+\langle{\bf v},{\bf u}\rangle= 2\mathop{\mathrm{Re}}(\langle{\bf u},{\bf v}\rangle)$, from the hermiticity property of the inner product, and finally $\mathop{\mathrm{Re}}(\langle{\bf u},{\bf v}\rangle)\leq \vert \langle {\bf u},{\bf v}\rangle\vert$.

We were thus justified in calling $\sqrt{\langle {\bf v},{\bf u}\rangle}$ the norm induced by the inner product $\langle\,,\,\rangle$, since it satisfies the defining properties of a norm.

Finally, note that we cannot define the angle between ${\bf u},{\bf v}$ as in the real case, since $\langle{\bf u},{\bf v}\rangle$ is complex. However, it still makes sense to say that complex vectors are orthogonal, so we can still define the notion of orthonormal basis.

An orthonormal basis of $V$ is a basis consisting of mutually orthogonal unit vectors.

In ${\mathbb{C}}^3$ with the standard hermitian inner product, the following is an orthonormal basis: \[{\bf v}_1=(1,0,0), \quad {\bf v}_2=(0,-1,0), \quad {\bf v}_3=(0,0,{\rm i}).\]

2.4 The Gram-Schmidt procedure

We have now the notion of orthogonality between vectors and the notion of vectors with norm one so a natural question to ask is whether, given an inner product on a vector space, we can construct a basis similar to the basis for $\mathbb{R}^3$ given by the versors $\hat{x},\hat{y},\hat{z}$, i.e. the three orthogonal unit versors.

It is often useful to have an orthonormal set of vectors in an inner-product space. If there are $n$ such vectors and the vector space is $n$-dimensional, then they form an orthonormal basis.

There is a systematic method to produce an orthonormal set of vectors from any given set of linearly independent vectors, called the Gram-Schmidt procedure, which works as follows.

Let ${\bf v}_1,\,\ldots,\,{\bf v}_k$ be linearly independent vectors in an inner-product space $V$. These span a $k$-dimensional subspace $U$ of $V$. Our task is to construct an orthonormal basis ${\bf u}_1,\dots,{\bf u}_k$ for $U$. The procedure is as follows.

Define ${\bf u_1}=\frac{{\bf v_1}}{||{\bf v_1}||}$. Then ${\bf u_1}$ is a unit vector, and ${\rm span}\left\{{\bf u_1}\right\}={\rm span}\left\{{\bf v_1}\right\}$.
Define ${\bf \tilde v}_2 ={\bf v}_2-\left({\bf v}_2,{\bf u}_1\right){\bf u_1}$. Then $\left({\bf u}_1,{\bf \tilde v}_2\right)=0$ and ${\bf \tilde v}_2\neq0$, since ${\bf v_2}\notin {\rm span}\left\{{\bf u_1}\right\}$. Now define ${\bf u}_2={{\bf \tilde v}_2\over||{\bf \tilde v}_2||}$. Then ${\bf u_1},{\bf u_2}$ are mutually orthogonal unit vectors, and
${\rm span}\left\{{\bf u}_1,{\bf u}_2\right\}= {\rm span}\left\{{\bf v}_1,{\bf v}_2\right\}$ (because clearly $u_1,u_2\in \mathop{\mathrm{span}}(v_1,v_2)$ and also $v_1,v_2\in \mathop{\mathrm{span}}(u_1,u_2)$).
Suppose now that mutually orthogonal unit vectors ${\bf u_1},\dots,{\bf u_r}$ have been found with ${\rm span}\left\{{\bf u_1},\dots,{\bf u_r}\right\}= {\rm span}\left\{{\bf v_1},\dots,{\bf v_r}\right\}$ for some $r$ with $1\leq r\leq k$. If $r<k$, then define \[{\bf \tilde v}_{r+1}={\bf v}_{r+1}-\left({\bf v}_{r+1},{\bf u}_1\right) {\bf u}_1-\ldots- \left({\bf v}_{r+1},{\bf u}_r\right){\bf u_r}\,.\] Then \[\left({\bf \tilde v_{r+1}},{\bf u_1}\right)=\ldots =\left({\bf \tilde v_{r+1}},{\bf u_r}\right)=0,\] and ${\bf \tilde v_{r+1}}\neq0$ since ${\bf v_{r+1}}\notin {\rm span}\left\{{\bf u_1},\ldots,{\bf u_r}\right\} ={\rm span}\left\{{\bf v_1},\ldots,{\bf v_r}\right\}$.
Define ${\bf u_{r+1}}={{\bf \tilde v_{r+1}}\over||{\bf \tilde v_{r+1}}||}$. Then ${\bf u_1},\dots,{\bf u_{r+1}}$ are mutually orthogonal unit vectors, and ${\rm span}\left\{{\bf u_1},\dots,{\bf u_{r+1}}\right\} = {\rm span}\left\{{\bf v_1},\dots,{\bf v_{r+1}}\right\}$.

By this inductive process, we construct an orthonormal basis for $U$. In particular, if $U=V$ with $V$ of finite dimension, then this enables us to construct an orthonormal basis for $V$.

2.4.1 Examples

In ${\mathbb{R}}^3$ with the standard inner product, apply Gram-Schmidt to \[{\bf v_1}=(1,1,0),\quad{\bf v_2}=(1,0,1),\quad{\bf v_3}=(0,1,1).\] Solution. \[\begin{aligned} {\bf u_1} & = \frac{1}{\sqrt2}(1,1,0), \\ {\bf \tilde v_2} & = {\bf v_2}-\left({\bf v_2},{\bf u_1}\right){\bf u_1} =(1,0,1)-\frac{1}{2}(1,1,0)=\frac{1}{2}(1,-1,2), \\ {\bf u_2}& = \frac{1}{\sqrt6}(1,-1,2),\\ {\bf \tilde v_3} & = {\bf v_3}-\left({\bf v_3},{\bf u_1}\right){\bf u_1} -\left({\bf v_3},{\bf u_2}\right){\bf u_2}\\ & =(0,1,1)-\frac{1}{2}(1,1,0)-\frac{1}{6}(1,-1,2)=\frac{2}{3}(-1,1,1),\\ {\bf u_3} &= \frac{1}{\sqrt3}(-1,1,1).\end{aligned}\] Remember, there is a simple way to check your answer — always do so!

$V={\mathbb R}^2$ with $||{\bf{ v }}||^2=4x_1^2+2x_1x_2+x_2^2$ for ${\bf v}=(x_1,x_2)$, ${\bf v}_1=(1,0)$ and ${\bf v}_2=(0,1)$.

Solution. The first step is to deduce the inner product from the norm. This can always be done when the norm is induced by an inner product, since by expanding $||{\bf u}+{\bf v}||^2$ we find: \[2({\bf u},{\bf v})=||{\bf u}+{\bf v}||^2 - ||{\bf u}||^2 - ||{\bf v}||^2\, .\] In this case we have that $({\bf u},{\bf v})=4u_1v_1+u_1v_2+u_2v_1+u_2v_2$. Applying Gram-Schmidt, we have \[\begin{aligned} {\bf u_1}&= \frac{1}{2}(1,0),\\ {\bf \tilde v_2}&= {\bf v_2}-\left({\bf v_2},{\bf u_1}\right){\bf u_1} =(0,1)-\frac{1}{4}(1,0)=\frac{1}{4}(-1,4),\\ {\bf u_2} & = \frac{1}{2\sqrt3}(-1,4).\end{aligned}\]

$V=C[-1,1]$ with the inner product $(f,g)=\int_{-1}^{1}f(t)\,g(t)\,dt$, and $f_1=1, f_2=t, f_3=t^2$. Find orthonormal $\{g_1,g_2,g_3\}$.

Solution. In example [ex:function-inner-product] we computed $||f_1||^2=2$, $||f_2||^2=2/3$ and $\left(f_1,f_2\right)=0$. So we immediately get $g_1=1/\sqrt{2}$ and $g_2=\sqrt{3/2}\,t$. Then \[\begin{aligned} \left(f_3,g_1\right) &= \frac{1}{\sqrt2}\int_{-1}^1t^2\,dt={\sqrt2\over3}, \\ \left(f_3,g_2\right) & = \sqrt{3\over2}\int_{-1}^1t^3\,dt=0, \\ \tilde{f_3} & = f_3-\left(f_3,g_1\right)g_1-\left(f_3,g_2\right)g_2\\ & = t^2-{\sqrt2\over3}\frac{1}{\sqrt2}\\ & = t^2-\frac{1}{3}, \\ ||\tilde{f_3}||^2 & = \int_{-1}^1\left(t^4-\frac{2}{3}t^2+\frac{1}{9}\right)\,dt\\ & = \frac{2}{5}-\frac{2}{3}\times\frac{2}{3}+\frac{2}{9}\\ & = \frac{2}{5}-\frac{2}{9}\\ & = \frac{8}{45}, \\ g_3 & = {3\sqrt5\over2\sqrt2}\left(t^2-\frac{1}{3}\right).\end{aligned}\]

Apply Gram-Schmidt to ${\bf v_1}=(1,0)$ and ${\bf v_2}=(0,1)$ on ${\mathbb{C}}^2$ with the inner product $\langle{\bf z},{\bf w}\rangle =3z_1\overline w_1+2z_2\overline w_2+iz_1\overline w_2-iz_2\overline w_1$.

Solution. \[\begin{aligned} ||{\bf v_1}||^2 & = 3, \\ {\bf u_1} & = \frac{1}{\sqrt3}(1,0), \\ {\bf\tilde v_2} & = {\bf v_2}-\langle{\bf v_2},{\bf u_1}\rangle{\bf u_1}\\ & = (0,1)-\frac{1}{3}(-{\rm i})(1,0)=\frac{1}{3}({\rm i},3), \\ ||{\bf\tilde v_2}||^2 & = \frac{5}{3}, \\ {\bf u_2} & = \frac{1}{\sqrt{15}}({\rm i},3).\end{aligned}\]

Take $V={\mathbb{R}}^3$ with the standard inner product. Find an orthonormal basis for the subspace $U$ given by $x_1+x_2+x_3=0$, and extend it to an orthonormal basis for ${\mathbb{R}}^3$.

Solution. A basis for $U$ is ${\bf v}_1=(1,-1,0)$ and ${\bf v}_2=(0,1,-1)$. Apply Gram-Schmidt to this, giving \[\begin{aligned} {\bf u_1} & = \frac{1}{\sqrt{2}}(1,-1,0), \\ {\bf\tilde v_2} &= (0,1,-1)-\frac{1}{2}(-1)(1,-1,0)=\frac{1}{2}(1,1,-2), \\ {\bf u_2} & = \frac{1}{\sqrt{6}}(1,1,-2).\end{aligned}\] So $\{{\bf u}_1,{\bf u}_2\}$ is an orthonormal basis for $U$. To extend it, take ${\bf v}_3=(0,0,1)$, say, and apply Gram-Schmidt one more time: \[\begin{aligned} {\bf\tilde v_3} &= (0,0,1)-(0,0,0)-\frac{1}{6}(-2)(1,1,-2)= \frac{1}{3}(1,1,1), \\ {\bf u_3} & = \frac{1}{\sqrt{3}}(1,1,1).\end{aligned}\] Now $\{{\bf u}_1,{\bf u}_2,{\bf u}_3\}$ is an orthonormal basis for $V$.

2.5 Orthogonal complement and orthogonal projection

Let us explore a little bit more the consequences of orthogonality between vectors in a vector space $V$.

Let $U$ be a vector subspace of an inner product space $V$. Then the orthogonal complement of $U$ is the set \[U^{\bot}=\left\{{\bf v}\in V:({\bf u},{\bf v})=0\ \mbox{for all}\ {\bf u}\in U\right\},\] of vectors in $V$ which are orthogonal to all the vectors in $U$.

Note that $U^{\bot}$ is a vector subspace of $V$.

Given any vector ${\bf v}\in V$, there is a unique vector ${\bf u}\in U$ called the orthogonal projection of ${\bf v}$ onto $U$ (at least if $U$ is finite-dimensional), and one may find this ${\bf u}$ explicitly as follows.

If $U$ is finite-dimensional, then there always exists a unique decomposition ${\bf v}={\bf u}+{\bf \tilde u}$, where ${\bf u}\in U$ and ${\bf \tilde u}\in U^{\bot}$. So we can write $V = U \oplus U^\perp$.

Let $\{{\bf u}_1,\ldots,{\bf u}_k\}$ be an orthonormal basis of $U$, and set \[{\bf u}=\left({\bf v},{\bf u}_1\right){\bf u}_1 +\ldots+\left({\bf v},{\bf u}_k\right){\bf u}_k, \qquad{\bf \tilde u}={\bf v}-{\bf u}\,.\] Clearly ${\bf u}\in U$; and for each ${\bf u}_i$ note that $({\bf u},{\bf u}_i) =({\bf v},{\bf u}_i).({\bf u}_i,{\bf u}_i)=({\bf v},{\bf u}_i)$. So for each ${\bf u}_i$ we have \[\begin{aligned} \left({\bf \tilde u},{\bf u}_i\right) & = \left({\bf v}-{\bf u},{\bf u}_i\right)\\ & = \left({\bf v},{\bf u}_i\right)-\left({\bf u},{\bf u}_i\right)\\ & = \left({\bf v},{\bf u}_i\right)-\left({\bf v},{\bf u}_i\right)\\ & = 0.\end{aligned}\] Hence ${\bf \tilde u}\in U^{\bot}$. This shows that $V=U+U^\perp$. To show that $V=U\oplus U^\perp$ we need to show additionally that $U \cap U ^\perp=\{\bf 0\}$. Take ${\bf w}=w_1{\bf u}_1+\ldots w_k{\bf u}_k$. Clearly ${\bf w}\in U$, and let us assume that also ${\bf w}\in U^\perp$. In this case, by the definition of $U^\perp$, we must have that $({\bf w},{\bf u}_i)=0$ for all $i\in1,\ldots,k$. Since the $\{{\bf u}_1,\ldots,{\bf u}_k\}$ form an orthonormal basis, this implies (by linearity of the inner product) $w_i=0$ for all $i$, and therefore ${\bf w}={\bf 0}$.

To show uniqueness, assume that there is a second decomposition ${\bf v}= {\bf u}+ {\bf \tilde u}={\bf u}' + {\bf \tilde u}'$, with ${\bf u}'\in U$ and ${\bf \tilde u}'\in U^\perp$. This implies ${\bf u}' - {\bf u}= {\bf \tilde u} - {\bf \tilde u}'$. The left hand side of this equality lives in $U$, while the right hand side lives in $U^\perp$. Since $U \cap U ^\perp=\{\bf 0\}$ this implies ${\bf u}' - {\bf u}= {\bf 0}$ and ${\bf \tilde u} - {\bf \tilde u}' = {\bf 0}$, so the decomposition is indeed unique.

Given a finite-dimensional vector subspace $U$ of the inner product space $(V,\,(\,,\,))$ we can consider the orthogonal projection operator $P$ onto $U$, defined by \[P({\bf v})\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\left({\bf v},{\bf u}_1\right){\bf u}_1 +\ldots+\left({\bf v},{\bf u}_k\right){\bf u}_k\,,\]where $\{{\bf u}_k,\ldots,{\bf u}_k\}$ is an orthonormal basis of $U$.

Let us assume that the inner product space $V$ has finite dimension $\dim\,V=n$, then the matrix form of the projection onto $U$ takes a very simple form if we use a smart basis. We know that $V = U \oplus U^\perp$, so we can use Gram-Schmidt to obtain a basis $\{{\bf u}_1,...,{\bf u}_k\}$ for $U$, made of orthonormal vectors, with $k=\dim \,U$, and similarly a basis $\{{\bf v}_1,...,{\bf v}_{n-k}\}$ for $U^\perp$, made of orthonormal vectors, with $n-k = \dim\,U^\perp$. The set $\{{\bf u}_1,...,{\bf u}_k,{\bf v}_1,...,{\bf v}_{n-k}\}$ forms an orthonormal basis for $V$. The orthogonal projection onto $U$ can be easily evaluated on these basis elements: \[P( {\bf u}_i) = {\bf u}_i\,,\qquad\forall \,i=1,...k,\qquad\mbox{while}\qquad P({\bf v}_j) = 0\,,\qquad\forall\,j=1,...,n-k\,,\] so $P = {\rm{diag}}( 1,...,1,0,...,0)$ where the number of $1$s on the diagonal is $k = \dim U$, while the number of $0$s on the diagonal is $n-k = \dim U^\perp$.

This projection operator satisfies the following properties:

$P$ is a linear operator $P\colon V\to V$;
the image of $P$ is $\mathop{\mathrm{Im}}P = U$, while its kernel is $\ker P = U^\perp$;
$P$ is idempotent, i.e. $P^2 = P$;
$P$ restricted to $U$ is the identity operator, which we usually write as $P\vert_U = I$, i.e. $P({\bf u} ) = {\bf u}$ for all $\bf u \in U$;
$(I-P) P = P(I-P ) = 0$.

The proofs are left as an exercise.

The last equation is simply telling us that if we project first a vector ${\bf v}$ onto $U$, and then take the component of this projection orthogonal to $U$, or if we take the component of a vector ${\bf v}$ orthogonal to $U$ and then project it onto $U$, we always find the zero vector.

From the idempotency condition is really easy to see the eigenvalues of $P$. Let ${\bf v}$ be an eigenvector of $P$, with eigenvalue $\lambda$, i.e. $P {\bf v}= \lambda {\bf v}$, then since $P^2 = P$ we get \[\lambda {\bf v}= P {\bf v}= P^2 {\bf v}= \lambda^2 {\bf v}\,,\] so the only possible eigenvalues are $\lambda = 0$ and $\lambda =1$ since ${\bf v}\neq {\bf 0}$.

In a certain sense, the projection $P({\bf v})$ of a vector ${\bf v}$ onto the vector subspace $U$, is the vector $P({\bf v})={\bf u}\in U$ which is “nearest” to ${\bf v}$. More mathematically we have the following proposition.

Let $U$ be a finite-dimensional subspace of an inner product space $V$, and let ${\bf v_0}\in V$. If ${\bf u_0}$ is the orthogonal projection of ${\bf v_0}$ onto $U$, then for all ${\bf u}\in U$ we have \[||{\bf u}-{\bf v_0}|| \geq ||{\bf u_0}-{\bf v_0}||,\] with equality if and only if ${\bf u}={\bf u_0}$. Thus ${\bf u_0}$ gives the nearest point on $U$ to ${\bf v_0}$.

Write ${\bf v_0}={\bf u_0}+{\bf \tilde u_0}$. Then \[\begin{aligned} ||{\bf v_0}-{\bf u}||^2 & = ||{\bf v_0}-{\bf u_0}+{\bf u_0}-{\bf u}||^2 \\ & = ||{\bf v_0}-{\bf u_0}||^2+||{\bf u_0}-{\bf u}||^2\end{aligned}\] by Pythagoras (theorem [thm:pythagoras]), since ${\bf v_0}-{\bf u_0}={\bf \tilde u_0}\in U^{\bot}$ and ${\bf u_0}-{\bf u}\in U$. Thus \[||{\bf v_0}-{\bf u}||^2\geq ||{\bf v_0}-{\bf u_0}||^2\, ,\] with equality if and only if $||{\bf u_0}-{\bf u}||^2=0$.

In some contexts, ${\bf u_0}$ is known as a least-squares approximation to ${\bf v_0}$.

Consider $V=\mathbb{R}^4$ equipped the standard inner product (and the norm induced by it), find the point in \[U={\rm span}\left\{{\bf v}_1=\left( \begin{matrix}0 \\ 2 \\ -1 \\ 2 \end{matrix}\right),\ {\bf v}_2=\left(\begin{matrix} 2 \\ 3 \\ 1 \\ 2 \end{matrix}\right)\right\}\] nearest to \[{\bf v}_0=\left(\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}\right)\,.\]

Solution. We first need an orthonormal basis for $U$. Now $\Vert{\bf v}_1\Vert^2=9$, so the first unit vector in the Gram-Schmidt process is \[{\bf u}_1 =\frac{1}{3}\left(\begin{matrix} 0 \\ 2 \\ -1 \\ 2 \end{matrix}\right).\] Then \[\begin{aligned} {\bf \tilde v}_2 & = {\bf v}_2-\left({\bf v}_2,{\bf u}_1\right){\bf u_1} = \left(\begin{matrix} 2 \\ 3 \\ 1 \\ 2 \end{matrix}\right) -(6-1+4)\frac{1}{9}\left(\begin{matrix} 0 \\ 2 \\ -1 \\ 2 \end{matrix}\right) =\left( \begin{matrix} 2 \\ 1 \\ 2 \\ 0 \end{matrix}\right).\end{aligned}\] Then $\Vert {\bf \tilde v}_2 \Vert^2=9$, so define \[{\bf u}_2=\frac{1}{3}\left(\begin{matrix} 2 \\ 1 \\ 2 \\ 0 \end{matrix}\right).\] Then $\{{\bf u}_1,\,{\bf u}_2\}$ is an orthonormal basis for $U$. The orthogonal projection of ${\bf v}_0$ onto $U$ is \[\begin{aligned} {\bf u}_0 & = \left({\bf v}_0,{\bf u}_1\right){\bf u}_1 +\left({\bf v}_0,{\bf u}_2\right){\bf u}_2 ={3\over9}\left(\begin{matrix} 0 \\ 2 \\ -1 \\ 2 \end{matrix}\right) +{5\over9}\left(\begin{matrix} 2 \\ 1 \\ 2 \\ 0 \end{matrix}\right) =\frac{1}{9}\left(\begin{matrix} 10 \\ 11 \\ 7 \\ 6 \end{matrix}\right).\end{aligned}\] Using Pythagoras’ theorem gives \[\begin{aligned} \Vert {\bf v}_0-{\bf u}_0\Vert^2 & = \Vert {\bf v}_0\Vert ^2 -\Vert {\bf u}_0\Vert^2 \\ & = ({\bf v}_0,{\bf v}_0)- ({\bf v}_0,{\bf u}_1)^2-({\bf v}_0,{\bf u}_2)^2 \\ & = 4-1-25/9 \ = \ 2/9.\end{aligned}\] Note that \[{\bf v}_0-{\bf u}_0 =\frac{1}{9}\left(\begin{matrix} -1 \\ -2 \\ 2 \\ 3 \end{matrix}\right),\] and $\Vert {\bf v}_0-{\bf u}_0\Vert^2=(1+4+4+9)/81=2/9$. This agrees!

Let $V=C[-1,1]$ with the inner product \[(f,g)=\int_{-1}^1 f(t)g(t)\, dt.\] Find the linear polynomial $f$ closest to $g(t)=t^2+t$.

Solution. The space of linear polynomials is $U={\rm span}(1,t)$. From example [ex:orthonormal-polynomial-basis], an orthonormal basis for $U$ is \[\left\{ f_1=\frac{1}{ \sqrt{2}},\ f_2={\sqrt{3}\over\sqrt{2}}\,t\right\}.\] We have \[\begin{aligned} (g,f_1) & = \int_{-1}^1 \frac{1}{\sqrt{2}}(t^2+t)\, dt = \frac{1}{\sqrt{2}}\left[{t^3\over 3}+{t^2\over 2}\right]_{-1}^1 = \frac{\sqrt{2}}{3}, \\ (g,f_2) & = \int_{-1}^1\frac{\sqrt{3}}{\sqrt{2}}(t^3+t^2)\, dt = \frac{\sqrt{3}}{\sqrt{2}}\left[{t^4\over 4}+{t^3\over 3}\right]_{-1}^1 = \frac{\sqrt{2}}{\sqrt{3}}.\end{aligned}\] The orthogonal projection of $g$ onto $U$ is \[f(t)=(g,f_1)f_1+(g,f_2)f_2 ={\sqrt{2}\over 3}\,\frac{1}{ \sqrt{2}} +{\sqrt{2}\over \sqrt{3}}\,{\sqrt{3}\over\sqrt{2}}\,t =t+\frac{1}{ 3}.\] So this is the “best” (least-squares) straight-line fit to the parabola.

We can check this. Suppose that $f=at+b$ then we want to find $a$ and $b$ that minimise $\Vert g-f\Vert=\Vert t^2+t-at-b\Vert$. Now \[\begin{aligned} \Vert t^2+t-at-b\Vert^2 = & \int_{-1}^1dt\,\left( t^4+2t^3+t^2-2at^3-2at^2-2bt^2-2bt+\right.\\ &\qquad\,\, \left.+a^2t^2+2abt+b^2\right) \\ & = \left[\frac{t^5}{5}+\frac{t^4}{2}+\frac{t^3}{3} -\frac{at^4}{2}-\frac{2(a+b)t^3}{3}-bt^2 +\frac{a^2t^3}{3}+abt^2+b^2t\right]_{-1}^1 \\ & = \frac{2}{5}+\frac{2}{3}-\frac{4a}{3}-\frac{4b}{3}+\frac{2a^2}{3}+2b^2 \\ & = \frac{8}{45}+\frac{2}{3}(a-1)^2+2\left(b-\frac{1}{3}\right)^2,\end{aligned}\] where we completed the square in the last line. This is minimised when $a=1$ and $b=1/3$. That is $f(t)=t+1/3$ as above.

As one would expect from the well known $3$-dimensional case of $\mathbb{R}^3$, when we take the projection of a vector, we generically obtain a “shorter” vector. This holds in the more general context of inner product spaces discussed in this Section, where the notion of length of a vector is now replaced by the more general concept of norm of a vector.

Let $V$ be an inner product space and $U$ a finite-dimensional subspace. If ${\bf v}\in V$, and ${\bf u}\in U$ is the orthogonal projection of ${\bf v}$ onto $U$, then $||{\bf u}||^2\leq||{\bf v}||^2$. In particular if $\{{\bf u}_1,\dots,{\bf u}_k\}$ is an orthonormal basis for $U$ and ${\bf u}=\lambda_1{\bf u}_1+\dots+\lambda_k{\bf u}_k$ then \[\sum_{i=1}^k\lambda_i^2\leq||{\bf v}||^2.\]

Writing ${\bf v}={\bf u}+{\bf \tilde u}$ as usual, we have \[\sum_{i=1}^k\lambda_i^2 =||{\bf u}||^2\leq||{\bf u}||^2+||{\bf \tilde u}||^2=||{\bf v}||^2.\] The fact that $||{\bf u}||^2 = ({\bf u},{\bf u}) = \sum_{i=1}^k \lambda_i^2$ follows from bilinearity of the inner product together with orthonormality of the $\{{\bf u}_1,\ldots,{\bf u}_k\}$ basis.

2.6 Orthogonal and unitary diagonalization

We want to understand now, if, via a change of basis, we can put every real or hermitian inner product in some standard form, in the same spirit of our discussion about diagonalization of matrices. First, the important idea of orthogonal and unitary matrices.

A real $n\times n$ matrix $M$ is orthogonal if \[M^t M = M M^t = I\, ,\] or equivalently if $M^{-1}=M^t$.

The set (which is actually a group as we will see later on) of $n\times n$ orthogonal matrices is known as the orthogonal group, and is denoted by \[O(n) \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\{ M \in GL(n,\mathbb{R}) \,\,\text{such that}\,\, M^t M = M M^t = I\}.\]

Note that since $\det(M) =\det(M^t)$, orthogonal matrices must have $\mbox{det} M = \pm 1$.

The subset (which is actually a subgroup) of all the orthogonal matrices with determinant $+1$ is known as the special orthogonal group, and is denoted by \[SO(n) \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\{ M \in O(n) \,\,\text{such that}\,\, \mbox{det} M = +1\}.\]

There are similar notions in the complex realm.

A complex $n\times n$ matrix $U$ is called unitary if \[UU^*=U^*U=I\,,\] or equivalently if $U^{-1}=U^*$.

The set (which is actually a group as we will see later on) of $n\times n$ unitary matrices is known as the unitary group, and is usually denoted by \[U(n) \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\{ U \in GL(n,\mathbb{C}) \,\,\text{such that}\,\, U^* U = U U^* = I\}\,.\]

Note that $\det(U^*) = \overline{\det(U)}$, so a unitary matrix has $\vert \det(U)\vert^2 =1$. In particular notice that $U(1) = \{z\in\mathbb{C}\,\mbox{ such that }\, z\,z^* =z^*\,z= z\overline{z}=\vert z \vert ^2 =1\}$, so the determinant of every unitary matrix $U$ is an element of $U(1)$. A complex number with unit modulus, i.e. an element of $U(1)$, is simply a point on the unit circle in the complex plane, so $U(1) = \{ e^{i\phi}\,\mbox{with}\,\phi\in[0,2\pi]\}$, and the determinant of every unitary matrix takes the form $\det(U)= e^{i \varphi}$ for some $\varphi \in [0,2\pi]$.

The subset (which is actually a subgroup) of all the unitary matrices with determinant $+1$ known as the special unitary group, and is usually denoted by \[SU(n) \mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\{ M \in U(n) \,\,\text{such that}\,\, \det(M) = +1\}.\]

A matrix is orthogonal (or unitary) if and only if its columns form an orthonormal set of vectors, using the standard inner product on ${\mathbb R}^n$ (or ${\mathbb C}^n$).

Let us prove the complex case (the real case follows from taking the special case in which the vectors are all real). Think of a complex matrix $U$ as a matrix of column vectors $({\bf u}_1,\ldots,{\bf u}_n)$. The matrix $U^*$ is then the matrix of row vectors $({\bf u}_1^*, \ldots, {\bf u}_n^*)$. By the usual rules of matrix multiplication we have that $(U^*U)_{ij}=\left\langle{\bf u}_j,{\bf u}_i\right\rangle$ with the standard hermitian inner product in ${\mathbb C}^n$. These are the entries of the identity matrix if and only if the ${\bf u}_i$ are orthonormal.

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}1 & 2 \\ 2 & 3\end{matrix}\right) \quad\mbox{is symmetric but not orthogonal.}\]

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}1 & -1 \\ 1 & 1\end{matrix}\right) \quad\mbox{is orthogonal but not symmetric.}\]

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}3 & 1 \\ 1 & 1\end{matrix}\right) \quad\mbox{is symmetric with $\det(M)=1$ but not orthogonal.}\]

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}1 & 1 \\ 1 & -1\end{matrix}\right) \quad\mbox{is both symmetric and orthogonal.}\]

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}1 & {\rm i}\\ -{\rm i}& 2\end{matrix} \right) \quad\mbox{is hermitian but not unitary.}\]

\[\frac{1}{\sqrt{2}}\left(\begin{matrix}1 & -{\rm i}\\ -{\rm i}& 1\end{matrix} \right) \quad\mbox{is unitary but not hermitian.}\]

Let $A$ be a complex Hermitian (or real symmetric) $n\times n$ matrix. Then the eigenvalues of $A$ are real.

Let $\lambda$ be an eigenvalue of $A$, so $A{\bf x}=\lambda {\bf x}$ for some ${\bf x}\neq{\bf 0}$. Then we have \[\begin{aligned} {\bf x}^*A{\bf x} & = \lambda {\bf x}^*{\bf x}, \\ {\bf x}^*A^*{\bf x} & = \overline\lambda {\bf x}^*{\bf x} \quad{\rm (conjugate\ transpose)} \\ {\bf x}^* A{\bf x} & = \overline\lambda {\bf x}^*{\bf x} \quad(A=A^* \hbox{ as $A$ is hermitian}) \\ \lambda {\bf x}^* {\bf x} & = \overline{\lambda} {\bf x}^*{\bf x} \quad({\rm using\ } A{\bf x}=\lambda {\bf x}) \\ \lambda & = \overline\lambda\phantom{{\bf x}^*{\bf x}}\quad({\bf x}^*{\bf x}>0).\end{aligned}\]

Note that if $A$ is symmetric real, then the eigenvectors can be taken to be real as well, because they satisfy $(A-\lambda I){\bf v}=0$ with $A-\lambda I\in M_{n\times n}({\mathbb R})$, which has nonzero solutions over ${\mathbb R}$ since $\det(A-\lambda I)=0$ (recall §3.6 of the Michaelmas lecture notes).

Let $A$ be a complex Hermitian (or real symmetric) $n\times n$ matrix. Eigenvectors corresponding to different eigenvalues of $A$ are mutually orthogonal using the standard inner product in ${\mathbb C}^n$ (or ${\mathbb R}^n$).

Suppose that $\lambda,\mu$ are eigenvalues (which are real, as we have just shown), with $\lambda\neq\mu$. Then for some ${\bf x},{\bf y}$ we have \[A{\bf x}=\lambda {\bf x},\ {\bf x}\neq{\bf 0}, \qquad A{\bf y}=\mu {\bf y},\ {\bf y}\neq{\bf 0}\] and \[\begin{aligned} {\bf x}^*A{\bf y} & = \mu {\bf x}^*{\bf y}, \\ {\bf y}^*A{\bf x} & = \lambda {\bf y}^*{\bf x}, \\ {\bf x}^*A{\bf y} & = \overline{{\bf y}^* A^* {\bf x}} = \overline{{\bf y}^* A {\bf x}} =\overline{{\bf y}^* \lambda {\bf x}} = \overline{\lambda} {\bf x}^*{\bf y} \ =\ \lambda {\bf x}^*{\bf y}, \\ {\rm so}\quad (\lambda-\mu){\bf x}^*{\bf y} & = 0, \\ {\bf x}^*{\bf y} & = 0 \end{aligned}\] and thus ${\bf x}\bot {\bf y}$.

Proposition [prop:orthogonality-of-eigenvectors] implies that if the eigenvalues of $A$ are all different, then we have $n$ orthogonal (and linearly independent, by proposition [prop:diagonalizability-different-eigenvalues]) eigenvectors, so we can diagonalize by an orthogonal $P$, after rescaling the eigenvectors so they have unit norm (recall proposition [prop:orthonormal-change-of-basis]). This fact remains true even if two eigenvalues approach each other: the eigenvectors remain orthogonal, and one can still diagonalize via a unitary (or orthogonal, in the real case) transformation even in the limit when eigenvalues coincide, as we now show.

If $A$ is real and symmetric, then there exists an orthogonal $n\times n$ matrix $P$ such that $P^tAP=D$ with $D$ diagonal. If $A$ is a complex hermitian $n\times n$ matrix, then there exists a unitary $n\times n$ matrix $P$ such that $P^*AP=D$ with a $D$ diagonal real matrix. The entries of $D$ are the eigenvalues of $A$. In each case, the columns of $P$ are an orthonormal basis of eigenvectors of $A$.

Let us prove the hermitian case, so we are working on ${\mathbb C}^n$. Pick an eigenvalue $\lambda_1$, with associated unit norm eigenvector ${\bf u}_1$. We denote $U_1=\mathop{\mathrm{span}}({\bf u}_1)$, and construct, using the standard inner product on ${\mathbb C}^n$, the orthogonal complement $U_1^\perp$, so that $\mathbb{C}^n=U_1\oplus U_1^\perp$ (recall proposition [prop:orthogonal-complement-direct-sum]).

Now note that $A$ acting on any element ${\bf v}\in U_1^\perp$ gives another element in $U_1^\perp$ (which recall is defined to be those elements in ${\mathbb C}^n$ orthogonal to $U_1$): \[\left\langle A{\bf v},{\bf u}_1\right\rangle = ({\bf u}_1)^* A {\bf v}= (A{\bf u}_1)^* {\bf v}= \overline{\lambda_1} {\bf u}_1^*{\bf v}= \overline{\lambda_1}\left\langle{\bf v},{\bf u}_1\right\rangle=0\, ,\] where in the second equality we have used hermiticity of $A$ (to show this, expand in components, or use the result in remark [rk:hermitian-transpose-of-product]).

So we can talk about the restriction of $A$ to $U_1^\perp$, which we denote by $A|_{U_1^\perp}\colon U_1^\perp\to U_1^\perp$. This restriction $A|_{U_1^\perp}$ is a hermitian operator acting on $U_1^\perp\cong \mathbb{R}^{n-1}$ (the existence of the isomorphism was proven in theorem 7.3.12 in the Michaelmas lecture notes), so we can iterate the process: pick an eigenvalue of $\lambda_2$ for $A|_{U_1^\perp}$, with associated unit norm eigenvector ${\bf u}_2$, define $U_2\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\mathop{\mathrm{span}}({\bf u}_2)$, write $U_1^\perp=U_2\oplus U_2^\perp$, and so on, until you achieve a full decomposition \[{\mathbb C}^n = U_1\oplus \ldots \oplus U_n\] with each $U_i=\mathop{\mathrm{span}}({\bf u}_i)$ and all ${\bf u}_i$ orthonormal. Because by construction the ${\bf u}_i$ are an orthonormal basis of ${\mathbb C}^n$, proposition $\eqref{prop:orthonormal-change-of-basis}$ implies that the change of basis matrix $P$ is unitary.

If $A$ is orthogonally diagonalizable, then $A$ is symmetric.

If $P^tAP=D$ with $P$ orthogonal and $D$ diagonal, then $A=PDP^t$ which is clearly symmetric.

If $A$ is unitary diagonalizable and the eigenvalues of $A$ are all real, then $A$ is hermitian.

If $P^*AP=D$ with $P$ unitary and $D$ diagonal, then $A=PDP^*$ which is clearly hermitian because we assumed that the eigenvalues of $A$ were all real, i.e. $D^*=D$.

\[A=\left(\begin{matrix} 2& -2\\ -2& 5\end{matrix}\right).\] The eigenvalues are $\lambda=6,1$, and we get \[P=\frac{1}{\sqrt{5}}\left(\begin{matrix} 1& 2\\ -2& 1\end{matrix}\right).\]

\[A=\left(\begin{matrix} 2& -2& 0\\ -2& 1& -2\\ 0& -2& 0\end{matrix}\right)\] Characteristic polynomial: \[\begin{aligned} \det\left(\begin{matrix} t-2& 2& 0\\ 2& t-1& 2\\ 0& 2& t\end{matrix}\right) & = (t-2)[(t-1)t-4]-2\times2t\\ & = (t-2)[t^2-t-4]-4t\\ & = t^{3}-3\,t^{2}-6\,t+8\\ & = \left (t-1\right )\left (t+2\right )\left (t-4\right ).\\ \lambda & = 4,1,-2.\end{aligned}\] Eigenvectors:

$\lambda=4$ \[(4I-A){\bf x} = \left(\begin{matrix} 2& 2& 0\\2& 3& 2\\ 0& 2& 4\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \end{matrix}\right) = \left(\begin{matrix}2x_1&+2x_2& \\ 2x_1&+3x_2&+2x_3\\ &2x_2&+4x_3\end{matrix}\right) = \left(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}\right).\] Unit eigenvector \[\frac{1}{3}\left(\begin{matrix} 2 \\ -2 \\ 1\end{matrix}\right).\]
$\lambda=1$ \[(I-A){\bf x} = \left(\begin{matrix} -1& 2& 0\\ 2& 0& 2\\ 0& 2& 1\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \end{matrix}\right) = \left(\begin{matrix} -x_1& +2x_2&\\ 2x_1&& +2x_3\\ & 2x_2& +x_3\end{matrix}\right) = \left(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}\right).\] Unit eigenvector \[\frac{1}{3}\left(\begin{matrix} 2 \\ 1 \\ -2\end{matrix}\right).\]
$\lambda=-2$ \[(-2I-A){\bf x} = \left(\begin{matrix} -4& 2& 0\\ 2& -3& 2\\ 0& 2& -2\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \end{matrix}\right) =\left(\begin{matrix} -4x_1& +2x_2&\\ 2x_1& -3x_2& +2x_3\\& 2x_2& -2x_3\end{matrix}\right) = \left(\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}\right).\] Unit eigenvector \[\frac{1}{3}\left(\begin{matrix} 1 \\ 2 \\ 2 \end{matrix}\right).\]

Thus \[P=\frac{1}{3}\left(\begin{matrix} 2& 2& 1\\ -2& 1& 2\\ 1& -2& 2\end{matrix}\right)\qquad P^tAP=\left(\begin{matrix} 4& 0& 0\\ 0& 1& 0\\ 0& 0& -2\end{matrix}\right)\] : All the eigenvalues in the example are distinct, so the corresponding eigenvectors are automatically orthogonal to each other. If the eigenvalues are not distinct, then for repeated eigenvalues we must choose mutually orthogonal eigenvectors.

\[A = \left(\begin{matrix}4 & 1 -i \\ 1 + i & 5 \end{matrix}\right).\] Then $A$ is hermitian, and the characteristic polynomial of $A$ is given by \[\begin{aligned} \det (\lambda I - A) & = \det \left(\begin{matrix}\lambda - 4 & -1 +i \\ -1 - i & \lambda - 5 \end{matrix}\right) \\ & = \lambda^2 - 9 \lambda + 18 \\ & = (\lambda - 3)(\lambda - 6).\end{aligned}\] Thus the eigenvalues are $\lambda = 3, 6$.

Corresponding unit eigenvectors:

$\lambda = 3$: \[\frac{1}{\sqrt 3}\left(\begin{matrix}-1+i \\ 1 \end{matrix}\right).\] $\lambda = 6$: \[\frac{1}{{\sqrt 6}}\left(\begin{matrix} 1-i \\ 2 \end{matrix}\right).\] (Note that these are mutually orthogonal.) Thus \[P = \left(\begin{matrix} -\frac{{1-i}}{{\sqrt 3}} & \frac{{1-i}}{{\sqrt 6}} \\ \frac{1}{{\sqrt 3}} & \frac{2}{{\sqrt 6}} \end{matrix}\right)\] is unitary and \[P^*AP =\left(\begin{matrix} 3&0 \\ 0&6 \end{matrix}\right).\]

(Diagonalization of real inner products)

Every real inner product on ${\mathbb R}^n$ may be written as $({\bf u},{\bf v})={\bf y}^t A {\bf x}$, where $A$ is a symmetric $n\times n$ matrix and ${\bf x},\,{\bf y}$ are the coordinates of the two vectors ${\bf u},{\bf v}$ in the chosen basis.

If we change basis, the coordinates representing the two vectors ${\bf u},{\bf v}$ will change to ${\bf x}=M{\bf z},\,{\bf y}=M{\bf w}$ for some invertible matrix $M$, where ${\bf z}$ and ${\bf w}$ are the coordinates of the vectors ${\bf u},{\bf v}$ in the new basis.

In the new basis the inner product will take the form $({\bf u},{\bf v})={\bf y}^t A {\bf x} = {\bf w}^t M^t A M{\bf z} = {\bf w}^t B {\bf z}$, where the symmetric matrix $B=M^t A M$ represents the same inner product but in a different basis.

As we have mentioned before, every symmetric matrix $A$ can be orthogonally diagonalized, i.e. there exists an orthogonal matrix $P$ such that $PAP^t=D$, for $D$ diagonal, then we have \[({\bf u},{\bf v})={\bf y}^t P^t D P {\bf x}=(P{\bf y})^t D (P{\bf x})={\bf {w}}^t D {\bf { z}}.\] So ${\bf x}\mapsto {\bf z}=P{\bf x},\,{\bf y}\mapsto {\bf w}=P{\bf y}$ is a change of basis which diagonalizes the inner product. In particular, \[({\bf v},{\bf v})=(P{\bf x})^t D (P{\bf x})= \lambda_1(z_1)^2+\ldots+\lambda_n(z_n)^2\] is positive if and only if the $\lambda_i$ are all positive, in other words if and only if the matrix $A$ is positive-definite.

(Diagonalization of hermitian inner products)

Similarly, every complex inner product on ${\mathbb C}^n$ may be written as $\langle{\bf u},{\bf v}\rangle={\bf y}^* A {\bf x}$, where $A$ is an hermitian $n\times n$ matrix and ${\bf x},\,{\bf y}$ are the coordinates of the two vectors ${\bf u},{\bf v}$ in the chosen basis.

In the new basis the inner product will take the form $({\bf u},{\bf v})={\bf y}^* A {\bf x} = {\bf w}^* M^* A M{\bf z} = {\bf w}^* B {\bf z}$, where the hermitian matrix $B=M^* A M$ represents the same inner product but in a different basis.

We can use the fact that every hermitian matrix can be unitarily diagonalized to write $PAP^\ast=D$, with $P$ unitary and $D$ diagonal, so that we have \[\langle {\bf u},{\bf v}\rangle={\bf y}^\ast P^* D P {\bf x}=(P{\bf y})^\ast D (P{\bf x}).\] So ${\bf x}\mapsto {\bf z}=P{\bf x},\,{\bf y}\mapsto {\bf w}=P{\bf y}$ is a change of basis which diagonalizes the inner product. In particular, \[\langle{\bf v},{\bf v}\rangle=(P{\bf x})^* D (P{\bf x})= \lambda_1\vert z_1\vert^2+\ldots+\lambda_n\vert z_n\vert^2\] is positive if and only if the $\lambda_i$ are all positive, in other words if and only if the matrix $A$ is positive-definite.

3 Special polynomials

Prologue. In the 18th and 19th centuries, mathematicians modelling things like vibrations encountered ODEs such as \[(1-x^2)y'' - 2xy' - \lambda y = 0,\] with $\lambda$ constant. This particular one is called the Legendre equation. They came up again in the 20th century, for example this equation arises in atomic physics.

Unlike the case of constant coefficients, the general solution of such equations is messy. But for certain values of $\lambda$, there are polynomial solutions, and these turn out to be the physically-relevant ones. Note that we can write the equation as ${\cal L}y=\lambda y$, where \[{\cal L}= (1-x^2)\frac{d^2}{dx^2} - 2x\frac{d}{dx},\] is a linear operator on a vector space of functions. If this function space is a space of polynomials, then the special values of $\lambda$ are the eigenvalues of ${\cal L}$. In applications these often correspond to natural freqencies of vibration, or of spectral lines etc.

3.1 Orthogonal polynomials

Introduction. Let ${\mathbb{R}}[x]$ denote the (infinite-dimensional) vector space of all real polynomials in $x$, and let ${\mathbb{R}}[x]_n$ denote the $(n+1)$-dimensional subspace of all real polynomials of degree at most $n$. The inner product is taken to be \[(f,g)=\int_a^b f(x)\, g(x)\, k(x)\,dx,\] where $k(x)$ is some fixed, continuous function such that $k(x)>0$ for all $x\in[a,b]$, with $a,b$ fixed numbers.

Note that this is indeed an inner product over the space of continuous, real valued functions $C[a,b]$, over the interval $[a,b]$, and hence also on its vector subspaces ${\mathbb{R}}[x]$ and ${\mathbb{R}}[x]_n$. The proof was given in example [ex:inner-product-functions], let us repeat it here (explained a bit more in detail) for convenience:

It is symmetric and homogeneous thanks to the properties of the integral \[\begin{aligned} (\lambda f_1+ \mu f_2 , g ) &= \int_a^b (\lambda \,f_1(x)+ \mu\,f_2(x)) \, g(x)\, k(x) \\ &= \lambda \int_a^b f_1(x) \, g(x)\, k(x) +\mu \int_a^b f_2(x) \, g(x)\, k(x) \\ & = \lambda (f_1,g)+\mu (f_2,g) = \lambda( g,f_1)+\mu (g,f_2)\,,\end{aligned}\] for all constants $\lambda,\mu \in \mathbb{R}$, and all $f_1,f_2,g\in C[a,b]$.
It is positive definite \[(f,f) = \int_a^b f(x)^2 \,k(x) \,dx \geq 0\] since $f(x)^2 \geq 0$ and $k(x)$ was chosen by hypothesis to be positive over the interval $[a,b]$.
It is non degenerate, i.e. $(f,f) = 0$ if and only if $f=0$, using the properties of continuous functions and the fact that $k(x)>0$ is strictly positive. In fact if $f\neq 0$, it means that there exists $x_0 \in [a,b]$ such that $f(x_0) = c \neq 0$. By continuity we know that if $f$ is non-vanishing at $x_0$, there must be an interval around $x_0$ such that $f$ is non-zero on the whole interval, i.e. there exists $\delta >0$ such that $| f(x) | > \epsilon$ for all $x \in (x_0-\delta , x_0+\delta)$ for some $\epsilon>0$ (for simplicity let us assume that this interval is entirely contained in $[a,b]$). On the interval $(x_0-\delta , x_0+\delta)$, the function $f^2(x)$, and $k(x)$ as well, are strictly bigger than zero, i.e. $f^2(x) k(x)> \tilde{\epsilon}$, for some $\tilde{\epsilon}>0$ related to $\epsilon, c$ and the minimum of $k(x)$ in the interval (which is strictly bigger than zero). So \[(f,f) = \int_a^b f^2(x)\,k(x)\,dx \geq \int_{x_0-\delta}^{x_0+\delta} f^2(x) \,k(x)\,dx > 2\,\tilde{\epsilon}\, \delta>0\,.\] The crucial points are continuity and the fact that $k(x)$ is positive definite over the interval $[a,b]$.

[Note: This proof uses crucially properties of continuous functions so you might want to go back to your analysis notes and refresh your memory about proofs involving $\delta$ and $\epsilon$.]

Suppose that we want an orthonormal basis $\{f_0,f_1,f_2,\ldots,f_n\}$, where $f_n\in{\mathbb{R}}[x]_n$. One way of obtaining such a basis is to start with $\{1,x,x^2,\ldots,x^n\}$ as a “first guess”, and apply the Gram-Schmidt procedure. But in some special cases there is also a more interesting way, related to the fact that many such examples arose in the study of differential equations.

Example: Legendre polynomials. If we take $k(x)=1$ and $(a,b)=(-1,1)$, then we get a basis of polynomials of which the first few are \[P_0(x)=\sqrt{\frac{1}{2}}, \quad P_1(x)=\sqrt{\frac{3}{2}}x, \quad P_2(x)=\sqrt{\frac{5}{8}}(3x^2-1), \quad, \ldots.\] These are called the normalized Legendre polynomials. They are defined by the following requirements:

$P_n$ has degree $n$, and is orthogonal to $P_j$ for all $j<n$;
The norm of $P_n$ is 1 (sometimes alternative conventions are used, where one imposes that the leading coefficient of $P_n$ is 1, but we will use the convention where the $P_n$ are unit vectors).

Their properties include the following.

By definition, they are orthonormal: \[\int_{-1}^1P_j(x)\, P_k(x)\,dx=\left\{\begin{array}{ll} 0 & \mbox{when $j\neq k$}, \\ 1 & \mbox{when $j=k$}. \end{array}\right.\]
There is a formula for $P_k(x)$, namely Rodrigues’ formula \[P_k(x) = \frac{1}{{2^kk!}} \, \sqrt{\frac{2k+1}{2}}\, \frac{d^k}{dx^k}(x^2-1)^k.\]
$P_k(x)$ is a solution of the differential equation \[(1 - x^2)y'' - 2xy' + k(k+1)y = 0,\] which is called the Legendre equation of order $k$. (This is a second-order linear ordinary differential equation. You know how to solve such equations if they have constant coefficients, but the Legendre equation does not have constant coefficients. It arose in various real-world applications, and Legendre polynomials arose as its solutions. The Legendre equation has two independent solutions, of which $P_k(x)$ is one.)

More examples. Here is a list of historically-important examples, named after various 19th-century mathematicians, for which the same sort of story holds.

Legendre: $k(x)=1$ and $(a,b)=(-1,1)$ ;

[Multipole expansion for the gravitational potential associated to a point mass or the Coulomb potential associated to a point charge. Schrodinger equation in spherical polar coordinates.]
Chebyshev-I: $k(x)=1/\sqrt{1-x^2}$ and $(a,b)=(-1,1)$ ;
Chebyshev-II: $k(x)=\sqrt{1-x^2}$ and $(a,b)=(-1,1)$ ;

[Numerical solutions to PDEs and integral equations. Finite elements methods.]
Hermite: $k(x)=\exp(-x^2)$ and $(a,b)=(-\infty,\infty)$ ;

[Eigenstates of the quantum harmonic oscillator.]
Laguerre: $k(x)=\exp(-x)$ and $(a,b)=(0,\infty)$ .

[Radial functions for the hydrogen atom wave functions.]

We shall see in the next section that, in each of these cases, the orthogonal polynomials arise as special solutions of differential equations, and in fact as eigenfunctions of linear differential operators.

Another famous example is that of Fourier series: in this case, we have orthonormal functions \[1/\sqrt{2},\, \cos(2\pi t),\, \sin(2\pi t),\, \ldots,\, \cos(2\pi nt),\, \sin(2\pi nt),\,\ldots,\] with the inner product $(f,g)=2 \int_{0}^1f(t)g(t)\,dt$. Since these are not polynomials, they do not quite fall into the framework being discussed here, but they are closely related.

3.2 Linear differential operators

Given an inner product on a vector space $V$ there is a very special class of linear operators called symmetric linear operators. We use the word “operators” to remind ourselves that we might be talking about vector spaces of polynomials or functions for which linear transformations might include derivatives as well.

If $V$ is a real vector space with inner product $(\,,\,)$, and ${\cal L}:V\to V$ is a linear operator, then ${\cal L}$ is said to be symmetric with respect to that inner product if $({\cal L}{\bf v},{\bf w})=({\bf v},{\cal L}{\bf w})$ for all ${\bf v},{\bf w}\in V$.

If $V={\mathbb{R}}^n$ with the standard Euclidean inner product, and ${\cal L}$ is represented by an $n\times n$ matrix $M$, then ${\cal L}$ is symmetric (according to the above definition) if and only if $M$ is symmetric (in the sense that $M^t=M$).

That symmetry of $M$ implies symmetry of ${\cal L}$ follows from \[({\cal L}{\bf v},{\bf w})={\bf w}^t M {\bf v}=(M{\bf w})^t{\bf v}=({\bf v},{\cal L}{\bf w}).\] The converse statement can be obtained by choosing ${\bf v}$ and ${\bf w}$ to be basis vectors of $\mathbb{R}^n$, for instance ${\bf v}=\mathbf{e}_i$ and ${\bf w}=\mathbf{e}_j$. We have \[({\cal L}\mathbf{e}_i, \mathbf{e}_j) = \mathbf{e}_j^t M \mathbf{e}_i = M_{ji}\] and similarly \[(\mathbf{e}_i, {\cal L}\mathbf{e}_j) = \mathbf{e}_j^t M^t \mathbf{e}_i = M_{ij}\, ,\] so symmetry of ${\cal L}$ implies $M=M^t$.

Symmetric linear differential operators. Here is a list of linear operators associated with each of the five examples of the previous section, describing its action on a polynomial $f(x)$.

Legendre: ${\cal L}_Lf=(1-x^2)f''-2xf'$ ;
Chebyshev-I: ${\cal L}_If=(1-x^2)f''-xf'$ ;
Chebyshev-II: ${\cal L}_{II}f=(1-x^2)f''-3xf'$ ;
Hermite: ${\cal L}_Hf=f''-2xf'$ ;
Laguerre: ${\cal L}_lf=xf''+(1-x)f'$ .

In each case, the corresponding differential equation for $f$ \[\mathcal{L} f =\lambda f\] says that $f$ is an eigenfunction of the operator ${\cal L}$ with eigenvalue $\lambda$. For example, the Legendre equation is ${\cal L}_Lf=-k(k+1)f$.

Each of these operators is symmetric with respect to the associated inner product (i.e. the inner product with the same name); here is the proof for the Laguerre case.

If $(f,g)=\int_0^\infty f(x)\, g(x)\, \exp(-x)\,dx$, then the operator ${\cal L}_l$ is symmetric.

This is just a calculation: \[\begin{aligned} ({{\cal L}_l}f,g) &= \int_0^\infty[xf''+(1-x)f']\,g(x)\,{\rm e}^{-x}\,dx \\ &= \int_0^\infty\left[x\,{\rm e}^{-x}f'\right]'g(x)\,dx \\ &= \left[x\,{\rm e}^{-x}f'g\right]_{x=0}^{x=\infty}-\int_0^\infty x\,{\rm e}^{-x}f'(x)\,g'(x)\,dx\\ & = -\int_0^\infty x\,{\rm e}^{-x}f'(x)\,g'(x)\,dx \end{aligned}\] since the first term vanishes. Repeating the same calculation for $(f,{\cal L}_l g)=({\cal L}_l g,f)$ leads to the same result (with $f$ and $g$ exchanged, but this makes no difference).

If $\{P_0,P_1,P_2,\ldots\}$ is an orthonormal set of polynomials, with $P_j$ of degree $j$, and ${{\cal L}}$ is a symmetric linear operator such that the degree of ${\cal L}P$ is less than or equal to the degree of $P$, then each $P_j$ is an eigenfunction of ${\cal L}$.

Note that the extra condition about degrees is satisfied for the five operators listed above.

Let $n$ be a positive integer. We need to show that $P_n$ is an eigenfunction of ${\cal L}$. Put $V={\mathbb{R}}[x]_n$ and $W={\mathbb{R}}[x]_{n-1}$, so $W$ is a subspace of $V$. Now the orthogonal complement $W^\perp$ has dimension $\dim(V)-\dim(W)=(n+1)-n=1$, and the single polynomial $P_n$ spans $W^\perp$ (since $P_n$ is othogonal to $P_0,P_1,\ldots,P_{n-1}$, and so $P_n\in W^\perp$). Thus any polynomial in $W^\perp$ has the form $\lambda P_n$ for some constant $\lambda$. Now if $P\in W$, then \[({{\cal L}}P_n,P)=(P_n,{{\cal L}}P)=0,\] because ${{\cal L}}$ is symmetric and ${\cal L}P\in W$. This means that ${\cal L}P_n\in W^\perp$, which in turn means that ${\cal L}P_n=\lambda P_n$ for some $\lambda$. In other words, $P_n$ is an eigenfunction of ${\cal L}$.

So in the above five special cases, the corresponding (Legendre, Chebyshev etc) polynomials are eigenfunctions of the corresponding (Legendre, Chebyshev etc) operator.

Let $V={\mathbb{R}}[x]_3$. Write down the matrix $M$ representing the operator ${\cal L}_L$ acting on $V$, using the basis $\{1,x,x^2,x^3\}$ of $V$. Use this to construct the Legendre polynomials $\{f_0,f_1,f_2,f_3\}$.

Solution. The operator ${\cal L}_L$ maps $1$ to $0$, $x$ to $-2x$, $x^2$ to $2-6x^2$ and $x^3$ to $-12x^3+6x$; so its matrix is \[M=\left[\begin{array}{cccc}0 & 0 & 2 & 0 \\ 0 & -2 & 0 & 6 \\ 0 & 0 & -6 & 0 \\ 0 & 0 & 0 & -12 \end{array}\right].\]

$M$ had to turn out to be upper-triangular, because of the “extra condition” referred to in remark [remark:extra-condition] regarding the degrees of ${\cal L}_L P$ and $P$, i.e. $\mbox{deg}\,{\cal L}_L P\leq \mbox{deg} \,P$ . So to get the eigenvalues is easy. The Legendre case has ${\cal L}x^n=-n(n+1)x^n+\text{(lower order terms)}$, so $\lambda=-n(n+1)$ for $n=0,1,2,3,\ldots$.

The eigenvalues here are $\lambda=0,-2,-6,-12$.

For $\lambda=0$, the eigenvector is ${\bf v}_0=[1\,\, 0\,\, 0\,\,0]^t$, and the corresponding eigenfunction is $f_0=1$.

For $\lambda=-2$, the eigenvector is ${\bf v}_1=[0\,\, 1\,\, 0\,\,0]^t$, and the corresponding eigenfunction is $f_1=x$.

For $\lambda=-6$, the eigenvector is ${\bf v}_2=[-1\,\, 0\,\, 3\,\,0]^t$, and the corresponding eigenfunction is $f_2=3x^2-1$.

For $\lambda=-12$, the eigenvector is ${\bf v}_3=[0\,\, -3\,\,0 \,\,5]^t$, and the corresponding eigenfunction is $f_3=5x^3-3x$.

The corresponding normalized functions are the first four normalized Legendre polynomials.

A function $f$, satisfying the differential equation \[\mathcal{L} f =\lambda f\] for some linear differential operator ${\cal L}$ and some number $\lambda$ is called an eigenfunction of ${\cal L}$ corresponding to the eigenvalue $\lambda$.

If we rewrite this eigenfunction $f$ using some basis, i.e. the basis of polynomials $\{ 1,x,...,x^n,...\}$, the vector containing the coordinates of $f$ in the basis used is an eigenvector of the associated matrix problem.

For example $f_3=5x^2-3x$, computed above, is an eigenfunction of the Legendre operator with eigenvalue $\lambda=-12$, while in the standard basis of $\mathbb{R}[x]_3$, the vector ${\bf v}_3=(0,-3,0,5)$ represents the eigenfunction $f_3$, and it is an eigenvector of the Legendre operator in matrix form.

3.3 Complex version: hermitian operators

Let $V$ be a complex vector space. A linear operator ${\cal L}:V\to V$ is said to be hermitian if $\langle {\cal L}{\bf u},{\bf v}\rangle = \langle {\bf u},{\cal L}{\bf v}\rangle$. (This property is also called self-adjoint.) Similarly, we will say that an operator ${\cal L}_A:V\to V$ is anti-hermitian if $\langle {\cal L}_A{\bf u},{\bf v}\rangle = -\langle {\bf u},{\cal L}_A{\bf v}\rangle$.

Note that if ${\cal L}$ is hermitian, then so is ${\cal L}^n$ for positive integer $n$ (the proof is left as an exercise).

Note that thanks to the properties of the complex inner product $\langle\,\,,\,\rangle$ an anti-hermitian operator ${\cal L}_A$ can always be written as ${\cal L}_A = i\,{\cal L}$ with ${\cal L}$ hermitian \[\langle {\cal L}_A{\bf u},{\bf v}\rangle = i \langle {\cal L}{\bf u},{\bf v}\rangle =i \langle {\bf u},{\cal L}{\bf v}\rangle =-\langle {\bf u},i{\cal L}{\bf v}\rangle=-\langle {\bf u},{\cal L}_A {\bf v}\rangle\,.\]

On ${\mathbb{C}}^n$ with the standard basis and the standard inner product, a linear operator represented by the matrix $M$ is hermitian if and only if the matrix is hermitian ($M^*=M$).

If ${\cal L}$ is hermitian, then $\langle {\cal L}{\bf v},{\bf v}\rangle$ is real for all ${\bf v}\in V$.

\[\langle {\cal L}{\bf v},{\bf v}\rangle=\langle{\bf v},{\cal L}{\bf v}\rangle=\overline{\langle {\cal L}{\bf v},{\bf v}\rangle},\] from the hermiticity property of the inner product.

In particular we have the following corollaries.

The eigenvalues of an hermitian operator are all reals.

In fact if ${\bf v}$ is an eigenvector of an hermitian operator ${\cal L}$ with eigenvalue $\lambda$, from the proposition above we know that $\langle {\cal L}{\bf v},{\bf v}\rangle\in\mathbb{R}$, but since ${\cal L}{\bf v}= \lambda {\bf v}$ we deduce that $\lambda \langle {\bf v},{\bf v}\rangle \in\mathbb{R}$, hence $\lambda\in \mathbb{R}$.

If ${\cal L}_A$ is anti-hermitian, then $\langle {\cal L}_A{\bf v},{\bf v}\rangle$ is purely imaginary for all ${\bf v}\in V$.

\[\langle {\cal L}_A{\bf v},{\bf v}\rangle=-\langle{\bf v},{\cal L}_A{\bf v}\rangle=-\overline{\langle {\cal L}_A{\bf v},{\bf v}\rangle},\] from the hermiticity property of the inner product. We could have also used the fact that $i {\cal L}_A$ is an hermitian operator if ${\cal L}_A$ is anti-hermitian.

The eigenvalues of an anti-hermitian operator are all purely imaginary, i.e. if ${\cal L}_A{\bf v}= \lambda {\bf v}$ then $\lambda = i \,c$ with $c\in\mathbb{R}$.

Let $\mathcal{L}$ and $\mathcal{M}$ be hermitian linear operators on $V$. Then \[4|\langle {\cal L}\mathcal{M}{\bf v},{\bf v}\rangle|^2 \geq |\langle({\cal L}\mathcal{M}-\mathcal{M}{\cal L}){\bf v},{\bf v}\rangle|^2.\]

Write $c=\langle {\cal L}\mathcal{M}{\bf v},{\bf v}\rangle$. Then \[\begin{aligned} c &= \langle {\cal L}\mathcal{M}{\bf v},{\bf v}\rangle \\ &= \langle \mathcal{M}{\bf v},{\cal L}{\bf v}\rangle \quad\mbox{(since ${\cal L}$ is hermitian)}\\ &= \langle {\bf v},\mathcal{M}{\cal L}{\bf v}\rangle \quad\mbox{(since $\mathcal{M}$ is hermitian)}\\ &= \overline{\langle \mathcal{M}{\cal L}{\bf v},{\bf v}\rangle} \quad\mbox{(hermiticity property of inner product)}.\end{aligned}\] So $\langle \mathcal{M}{\cal L}{\bf v},{\bf v}\rangle=\bar{c}$, and $|\langle({\cal L} \mathcal{M}-\mathcal{M}{\cal L}){\bf v},{\bf v}\rangle|^2=|c-\bar{c}|^2=4[\mathop{\mathrm{Im}}(c)]^2\leq4|c|^2$.

Let $V$ be the space of differentiable complex-valued functions $f(x)$ on ${\mathbb{R}}$, such that $x^nf(x)\to0$ as $x\to\pm\infty$ for every fixed value of $n$. Define an hermitian inner product on $V$ by \[\langle f,g\rangle = \int_{-\infty}^{\infty} f(x)\, \overline{g(x)}\,dx.\] Some examples of hermitian linear operators on $V$ are:

$X:f(x)\mapsto xf(x)$, multiplication by $x$;
$X^2:f(x)\mapsto x^2 f(x)$, multiplication by $x^2$;
$P:f(x)\mapsto if'(x)$, differentiation plus multiplication by $i$.

Show that $P:f(x)\mapsto if'(x)$ is hermitian.
Solution. Integrate by parts and use $f(x)\to0$ as $x\to\pm\infty$: \[\begin{aligned} \langle Pf,g\rangle &= i\int_{-\infty}^\infty f'(x)\,\overline{g(x)}\,\,dx \\ &= -i\int_{-\infty}^\infty f(x)\, \overline{g'(x)}\,\,dx \\ &= \int_{-\infty}^\infty f(x)\,\overline{\left[i\frac{dg(x)}{dx}\right]}\,\,dx \\ &= \langle f,Pg\rangle. \end{aligned}\] Note that the $i$ factor in the definition of $P$ is essential to the hermiticity property of $P$, without that factor the operator $-i P = d/dx$ would be anti-hermitian.

3.4 * An application: quantum mechanics

This section is not examinable.

Particle on a line. Suppose we have a particle on the $x$-axis. Its state at a particular time is described by a unit vector $f\in V$, with $V$ the function space in example [ex:self-adjoint]. The interpretation of $f$ is that $|f(x)|^2$ is a probability distribution: the probability of the particle being in the interval $[a,b]$ on the $x$-axis is $\int_a^b |f(x)|^2\,dx$. So $||f||=1$ just says that the particle has to be somewhere.

Now any observable quantity $q$ corresponds to an hermitian linear operator $Q$ on $V$. The expected value of the quantity $q$ in the state $f$ is $\langle Q\rangle\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\langle Qf,f\rangle$. The significance of $Q$ being hermitian is that it guarantees that $\langle Q\rangle$ is real: this follows from proposition [prop:hermitian-real-vev] in the previous section.

One example is the position operator $X:\psi(x)\mapsto x\psi(x)$. If we prepare the same state many times, and measure the position of the particle each time, we get a distribution of answers, the mean or average of which is $\langle X\rangle$.

What about $X^2:\psi(x)\mapsto x^2\psi(x)$? This is related to the variance $\Delta^2_x$ of $x$, which is defined to be the expected value of $(X-\langle X\rangle)^2$. The quantity $\Delta_x$ (standard deviation) measures the uncertainty in the position, or the spread of position measurements in the state $f$.

What about $L:f(x)\mapsto i f'(x)$? For historical reasons, one multiplies this by a certain real constant $\beta$, and the observable quantity is then called momentum $p$. So $P:f(x)\mapsto i\beta f'(x)$. [Actually $\beta=-h/(2\pi)$, where $h$ is Planck’s constant.]

Variance and uncertainty. The variance in position is $\langle A^2\rangle$ where $A=X-\langle X\rangle$, and the variance in momentum is $\langle B^2\rangle$ where $B=P-\langle P\rangle$. Note that $A$ and $B$ are hermitian, since $X$ and $P$ are. Now \[\begin{aligned} \langle A^2\rangle \langle B^2\rangle &= \langle A^2f,f\rangle \langle B^2f,f\rangle \\ &= \langle Af,Af\rangle \langle Bf,Bf\rangle \quad\qquad\mbox{(since $A,B$ are hermitian)}\\ &\geq |\langle Af,Bf\rangle|^2\qquad\qquad\qquad\mbox{(Cauchy-Schwarz, \ref{thm:complex-Cauchy-Schwarz})}\\ &= |\langle BAf,f\rangle|^2\qquad\qquad\qquad\mbox{($B$ hermitian)}\\ &\geq |\langle(BA-AB)f,f\rangle|^2/4\,\,\quad\mbox{(Lemma~\ref{lemma:commutator-inequality})}\end{aligned}\]

The next step is to compute $(BA-AB)f$. Almost all the terms of this expression cancel, and the only one that remains comes from the fact that $(xf)'-xf'=f$; in fact (check as an exercise), we get $(BA-AB)f=i\beta f$. Using this, together with the fact that $\langle f,f\rangle=1$, gives \[\langle A^2\rangle \langle B^2\rangle\geq\left(\frac{\beta}{2}\right)^2,\] and hence we get the Heisenberg Uncertainty Rule \[\Delta_x \Delta_p \geq \frac{|\beta|}{2}=\frac{h}{4\pi}\,.\] So the accuracy with which both position and momentum can be determined is limited. In standard kg-m-s units, the value of Planck’s constant is very small ($h\approx6.6\times10^{-34}$), so this is not a limitation that we are familiar with from everyday life — but it is there.

The harmonic oscillator. Another hermitian operator on $V$ is $H:f\mapsto-f''(x)+U(x)f(x)$, where $U(x)$ is some fixed real function. The corresponding observable (with suitable constants omitted here) is energy. The function $U(x)$ is the potential energy. The eigenvalues of $H$ are the allowable energy levels of the system.

The simplest non-trivial case is $U(x)=x^2$, which is the harmonic oscillator. To find the eigenfunctions in this case, substitute $f(x)=g(x)\exp(-x^2/2)$ into $Hf=\lambda f$. This gives \[g''-2xg'+(\lambda-1)g=0.\] Now ${\cal L}:g\mapsto g''-2xg'$ is the Hermite operator, and its eigenvalues are $0,-2,-4,-6,\ldots$. So $\lambda=2n+1$ for $n=0,1,2,3,\ldots$, and these are the discrete energy levels of the one-dimensional harmonic oscillator. With constants added, the energy eigenvalues are \[E_n = (2n+1)\frac{h}{4\pi}\sqrt{\frac{k}{m}},\] where $m$ is the mass of the particle and $k$ is the “spring constant”, ie. $U(x)=\frac{1}{2}kx^2$.

4 Groups

During our journey we have encountered sets of objects behaving “nicely” under certain operations, for examples the set $\mathrm{GL}(n,\mathbb{R})$ of invertible matrices behave “nicely” under matrix multiplication in the sense that, if $A$ and $B$ are invertible, i.e. $A,B\in\mathrm{GL}(n,{\mathbb R})$, so is their product $A\cdot B \in \mathrm{GL}(n,{\mathbb R})$. This “closure” of the set under a certain operation is one of the key defining properties of groups.

4.1 Axioms and examples

A group is a set $G$ equipped with an operation $\bullet:G\times G \to G$ such that

if $x,y\in G$ then $x\bullet y\in G$ (closure);
$(x\bullet y)\bullet z = x\bullet(y\bullet z)$ for all $x,y,z\in G$ (associativity);
$G$ contains an identity element $e$ such that $x\bullet e=e\bullet x=x$ for all $x\in G$;
each $x\in G$ has an inverse $x^{-1}\in G$ such that $x^{-1}\bullet x=x\bullet x^{-1}=e$.

A group is abelian (or commutative) if $x\bullet y=y\bullet x$ for all $x,y\in G$.

If $(G,\bullet)$ is a group, the identity element $e$ is unique.

Suppose that $e_1,e_2\in G$ are both identity elements, this means that \[\begin{aligned} e_1 \bullet x = x\bullet e_1 = x\,,\\ e_2 \bullet x = x\bullet e_2 =x\,, \end{aligned}\] for all elements $x\in G$. In particular if we put $x = e_2$ in the first equation, we get \[e_1 \bullet e_2 = e_2 \bullet e_1 = e_2 \,,\] while if we substitute $x=e_1$ in the second equation we get \[e_2 \bullet e_1 = e_1 \bullet e_2 = e_1 \,,\] so $e_1=e_2$, and the identity element is unique.

We say that $(H,\bullet)$ is a subgroup of the group $(G,\bullet)$ if $H$ is a subset of $G$ and $(H,\bullet)$, with the group operation the one inherited from $(G,\bullet)$, is a group in its own right.

Any vector space $V$ with the operation $+$. In this case, the zero vector ${\bf0}$ is the identity, and the inverse of ${\bf v}$ is $-{\bf v}$. This group structure is abelian. If $V$ is a real 1-dimensional vector space, then this is the group ${\mathbb{R}}$ of real numbers under addition (ie. the operation is $+$).

The group $\mathrm{GL}(n,{\mathbb{R}})$ of $n\times n$ real invertible matrices, with $P\bullet Q=PQ$ (matrix multiplication). Here the identity element is the identity $n\times n$ matrix ${\bf I}$, and the inverse of a matrix in the group sense is just its inverse in the usual matrix sense. The fact that matrix multiplication is associative was shown in the Michaelmas term (proposition 2.3.1 of the Michaelmas term lecture notes). This group $\mathrm{GL}(n,{\mathbb{R}})$ is not abelian, except for the case $n=1$ when it is just the group of non-zero real numbers with multiplication as the operation.

The group $\mathrm{GL}(n,{\mathbb{C}})$ of $n\times n$ complex invertible matrices. The details are the same as for the previous example. Note that $\mathrm{GL}(n,{\mathbb{R}})$ is a subgroup of $GL(n,{\mathbb{C}})$.

The group $\mathrm{SL}(n,{\mathbb{R}})$ of $n\times n$ real matrices $M$ having $\det(M)=1$, is a subgroup of $\mathrm{GL}(n,{\mathbb{R}})$:

closure follows from $\det(PQ)=\det(P)\,\det(Q)=1$;
associativity is inherited from $\mathrm{GL}(n,{\mathbb{R}})$;
${\bf I}$ is in the set, since $\det({\bf I})=1$;
if $\det(M)=1$, then $\det(M^{-1})=1$.

The group $O(n)$ of $n\times n$ real orthogonal matrices, is a subgroup of $\mathrm{GL}(n,{\mathbb{R}})$:

if $A$ and $B$ are orthogonal, then $(AB)(AB)^t=ABB^tA^t=AA^t={\bf I}$, so $AB$ is orthogonal;
associativity is inherited from $\mathrm{GL}(n,{\mathbb{R}})$;
${\bf I}$ is orthogonal;
if $A$ is orthogonal, then $A^{-1}=A^t$ by definition, so $(A^{-1})(A^{-1})^t=A^tA={\bf I}$, so $A^{-1}$ is orthogonal.

The group $SO(n)$ of $n\times n$ real orthogonal matrices $M$ having $\det(M)=1$. This is a subgroup both of $O(n)$ and of $\mathrm{SL}(n,{\mathbb{R}})$. The group $SO(2)$ is particularly simple: if $P\in SO(2)$ we can write \[P=\left(\begin{matrix} a & b \\ c& d\end{matrix}\right)\,,\] and \[P^t=\left(\begin{matrix} a & c \\ b& d\end{matrix}\right)\,,\] with $a,b,c,d\in {\mathbb R}$. The orthogonality condition $P^tP=P\,P^t=I$ plus the extra condition that $\det(P) = 1$ is equivalent to the system of equations \[\left\lbrace \begin{matrix} a^2 + c^2=1 \,,\phantom{\qquad(\mbox{det}P=+1)}\\ a b + c d=0 \,,\phantom{\qquad(\mbox{det}P=+1)}\\ b^2 + d^2=1\,,\phantom{\qquad(\mbox{det}P=+1)}\\ ad-bc =1\,. \,\,\quad(\det(P)=+1) \end{matrix}\right.\] We can easily solve this system and the most generic $P\in SO(2)$ takes the form \[P=\left(\begin{matrix} \cos\,\theta & -\sin\,\theta \\ \sin\,\theta& \cos\,\theta\end{matrix}\right)\,,\] where the rotation angle $\theta \in [0,2\pi]$. We say that $SO(2)$ is isomorphic to $S^1 = \{(a,c)\in\mathbb{R}^2 \,\mbox{such that} \,a^2+c^2=1\}$.

Try to visualize how an $SO(2)$ matrix $P$ transforms a generic vector ${\bf v}\in {\mathbb R}^2$ if we consider it as a linear application $P:{\bf v}\to P\,{\bf v}$. [Hint: Use the form for $P\in SO(2)$ written above]
Solution. Consider at the beginning the vector ${\bf v}= (1,0)$, the application $P\,{\bf v}=(\cos(\theta),- \sin(\theta))$ which correspond to an anticlockwise rotation of the initial vector ${\bf v}= (1,0)$ of an angle $\theta$. For a generic vector ${\bf v}= (x,y)$ the application $P\,{\bf v}=(\cos(\theta) x+\sin(\theta) y ,- \sin(\theta) x+\cos(\theta) y)$ still correspond to an anticlockwise rotation of an angle $\theta$ of the vector ${\bf v}= (x,y)$, we just have to write the vector in polar coordinates, i.e. ${\bf v}= \vert {\bf v}\vert(\cos(\alpha),\sin(\alpha))$. Compute now (Exercise for you) $P\,{\bf v}$ and you will see that $P\,{\bf v}= \vert {\bf v}\vert (\cos(\alpha+\theta),\sin(\alpha+\theta))$, which corresponds precisely to a rotation of an angle $\theta$ of the vector ${\bf v}$.

The group $\mathrm{SL}(n,{\mathbb{C}})$ of $n\times n$ complex matrices $M$ having $\det(M)=1$; the group $U(n)$ of $n\times n$ complex unitary matrices; and the group $SU(n)$ of $n\times n$ complex unitary matrices $M$ having $\det(M)=1$.

Try to parametrize $SU(2)$ in way similar to the one obtained for $SO(2)$.
Solution. Start with writing a generic $2\times 2$ matrix as \[M = \left(\begin{matrix} a & b \\ c & d\end{matrix}\right)\,,\] with $a,b,c,d\in \mathbb{C}$. For $M$ to be in $SU(2)$ we must have $M^\ast M = M\, M^\ast = I$ and $\det M=1$, with \[M^\ast = \left(\begin{matrix} \bar{a} & \bar{c} \\ \bar{b} & \bar{d}\end{matrix}\right)\,.\] The unitarity condition $M^\ast M=M\,M^\ast=I$ plus the extra condition that $\det(M) = 1$ is equivalent to the system of equations \[\left\lbrace \begin{matrix} \vert a\vert^2 + \vert c\vert ^2=1 \,,\,\phantom{\qquad(\mbox{det}P=+1)}\\ c\,\bar{a}+ d\,\bar{b}=0\,, \,\phantom{\qquad(\mbox{det}P=+1)}\\ \vert b\vert^2 + \vert d\vert ^2=1\,,\,\phantom{\qquad(\mbox{det}P=+1)}\\ a d-bc=1\,,\, \qquad(\det(M)=+1)\\ \end{matrix}\right.\] We can solve for $b=-\bar{c}$ and $d=\bar{a}$, so that the most general $M\in SU(2)$ takes the form \[M = \left(\begin{matrix} a & -\bar{c} \\ c & \bar{a}\end{matrix}\right)\,,\] with $\vert a\vert^2 + \vert c\vert ^2=1$.

Writing $a=x_1+i x_2$ and $c = x_3+i x_4$ with $x_1,x_2,x_3,x_4\in\mathbb{R}$, the equation $\vert a\vert^2 + \vert c\vert ^2=1$ becomes $x_1^2 + x_2^2+x_3^2+x_4^2=1$. We say that $SU(2)$ is isomorphic to $S^3 =\{{\bf{x}}\in \mathbb{R}^4 \,\mbox{s.t.}\,x_1^2 + x_2^2+x_3^2+x_4^2=1\}$

Two groups $G$ and $H$ are isomorphic if there is a bijection $f\colon G\to H$ preserving the group operation, ie $f(g_1\bullet_G g_2)=f(g_1)\bullet_H f(g_2)$ for all $g_1,g_2\in G$. (Here “$\bullet_G$” and “$\bullet_H$” denote the group operations in $G$ and $H$ respectively.) In particular $f(e_G)=e_H$ where $e_G$ is the identity in $G$ and $e_H$ is the identity in $H$.

Show that the set $G$ of matrices of the form \[M_x=\left[\begin{array}{cc}1 & x \\ 0 & 1\end{array}\right],\] with $x\in{\mathbb{R}}$, forms a group under matrix multiplication isomorphic to $({\mathbb R},+)$.
Solution. $M_x M_y=M_{x+y}$, so $G$ is just the set of real numbers as an additive group, i.e. with group operation $+$. Note that if we change the form of the matrix $M_x$ a bit, for example by putting a 2 in the upper-left-hand corner, then it would no longer work (the closure property would fail).

4.2 Finite groups

A finite group $G$ is a group having a finite number of elements.

The order $|G|$ of a finite group $G$ is the number of elements in $G$.

(Lagrange). Let $G$ be a finite order group and $H$ a subgroup of $G$, then the order of $H$ divides the order of $G$.

The integers ${\mathbb{Z}}$ with operation $+$ forms an abelian group. If $n$ is an integer with $n\geq2$, then we may define an equivalence relation on ${\mathbb{Z}}$ by $p\sim q$ if and only if $n$ divides $p-q$; this is also written $p\equiv q$ (mod $n$). The set ${\mathbb{Z}}_n$ of equivalence classes forms a finite abelian group of order $n$, with addition as the group operation (note that $+$ is well-defined on equivalence classes). This group ${\mathbb{Z}}_n$ is also called the cyclic group of order $n$. (Alternative notations for this group are ${\mathbb{Z}}/n$ and ${\mathbb{Z}}/n{\mathbb{Z}}$.) For example, the group ${\mathbb{Z}}_4$ has elements $\{\overline{0}, \overline{1},\overline{2},\overline{3}\}$, and the group operation of addition modulo 4 is expressed in the group table

+	$\overline{0}$	$\overline{1}$	$\overline{2}$	$\overline{3}$

$\overline{0}$	$\overline{0}$	$\overline{1}$	$\overline{2}$	$\overline{3}$
$\overline{1}$	$\overline{1}$	$\overline{2}$	$\overline{3}$	$\overline{0}$
$\overline{2}$	$\overline{2}$	$\overline{3}$	$\overline{0}$	$\overline{1}$
$\overline{3}$	$\overline{3}$	$\overline{0}$	$\overline{1}$	$\overline{2}$

There are exactly two groups of order 4, namely ${\mathbb{Z}}_4$ as above, and the Klein group $V$ which has the table

$\bullet$	$e$	$a$	$b$	$c$
$e$	$e$	$a$	$b$	$c$
$a$	$a$	$e$	$c$	$b$
$b$	$b$	$c$	$e$	$a$
$c$	$c$	$b$	$a$	$e$

The fact that this group is abelian is encoded in the fact that this table is symmetric.

Note that the group table of ${\mathbb{Z}}_4$ has a quite different structure from that of $V$. One visualisation (or representation) of $V$ is as the group of rotations by $180^\circ$ about each of the $x$-, $y$- and $z$-axes in ${\mathbb{R}}^3$ (ie. the element $a$ corresponds to rotation by $\pi$ about the $x$-axis, etc). As an exercise: by using a marked cube such as a die, convince yourself that $a\bullet b=c$.

Another representation of $V$ is as the group of reflections across the $x-$axis, the $y-$axis, and a combined reflection across the $x-$axis and then the $y-$axis, in $\mathbb{R}^2$. Take \[e=\left[\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right],\quad a=\left[\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right],\quad b=\left[\begin{array}{cc}-1 & 0 \\ 0 & 1\end{array}\right],\quad c=\left[\begin{array}{cc}- 1 & 0 \\ 0 & -1\end{array}\right],\] and see that these matrices satisfy the group table of the Klein group using as group operation the matrix multiplication.

Let $g\in G$. The smallest positive integer $n$ such that $g^n=e$, if such an $n$ exists, is called the order of $g$, denoted by $\vert g \vert$. The identity element has order $1$.

For every element $g\in G$, the order of $g$ divides the order of the group.

Consider $H=\{g,g^2,...,g^n\}$ where $n=\vert g\vert$, this is a subgroup of $G$ called the subgroup generated by $g$. The order of $H$ is precisely equal to the order of $g$. From the Lagrange theorem the order of $H$ divides the order of the group hence the order of $g$ divides $\vert G \vert$.

In ${\mathbb{Z}}$, no element other than the identity has finite order. But in ${\mathbb{Z}}_n$, every element has finite order. Observe that the elements $a$, $b$, $c$ of the Klein group $V$ all have order $2$; whereas in ${\mathbb{Z}}_4$, $\overline{2}$ has order $2$ and $\overline{1},\overline{3}$ have order $4$. In a matrix group like $\mathrm{GL}(2,{\mathbb{C}})$, most elements have infinite order, but some have finite order: eg $A={\rm{diag}}(1,-1)$ has order $2$.

This remark shows that the Klein group $V$ and ${\mathbb{Z}}_4$ are not isomorphic, in fact we have the following proposition.

If the groups $G$ and $H$ are isomorphic, the order of every element $g\in G$ is equal to the order of the corresponding element $f(g) \in H$.

If the order of $g\in G$ is $n$ this means that $g^n = e_G$ where $e_G$ is the identity in $G$. Since $G$ and $H$ are isomorphic we have that $f(g)^n= f(g^n) =f(e_G)=e_H$, so $n\geq m\mathrel{\mathop:}\mathrel{\mkern-1.2mu}= |f(g)|$. Similarly, $g^m = (f^{-1}(f(g)))^m=f^{-1}(f(g)^m) = f^{-1}(e_H)=e_G$, so $n\leq m$, and we conclude $m=n$.

4.3 Direct products

At this point we can start “combining” groups together to obtain new groups.

Given two groups $G$ and $H$, the direct product $G\times H$ is defined as follows:

The elements of $G\times H$ are the ordered pairs $(g,h)$ where $g\in G$ and $h\in H$. We can also say that the set of elements of $G\times H$ is the Cartesian product of the sets $G$ and $H$;
The operation $\bullet$ is defined component by component, i.e. \[(g_1,h_1) \bullet(g_2,h_2) = (g_1 \cdot g_2, h_1 \cdot h_2)\,,\] where on the first component we use the operation $\cdot$ defined on $G$ and on the second component the one on $H$.

The direct product $G\times H$ of two groups $G$ and $H$, is a group, furthermore if $G$ has order $n$ and $H$ has order $m$, the direct product $G\times H$ has order $n\cdot m$.

Associativity descends from the associativity of the operations on $G$ and $H$. The identity element on $G\times H$ is simply $e = (e_G,e_H)$, where $e_G$ is the identity on $G$ and $e_H$ the identity on $H$. The inverse of an element $(g,h) \in G\times H$ is simply $(g^{-1},h^{-1})$ where $g^{-1}$ is the inverse of $g$ in $G$, and $h^{-1}$ is the inverse of $h$ in $H$.

That the order of $G\times H$ is the product of the order of $G$ times the order of $H$ follows from counting all the distinct pairs of elements $(g,h)$ with $g\in G$ and $h\in H$.

The direct product of two cyclic groups of order 2 is $\mathbb{Z}_2\times \mathbb{Z}_2$. This group has $4$ elements, i.e. $\{(\overline{0},\overline{0}),(\overline{1},\overline{0}),(\overline{0},\overline{1}),(\overline{1},\overline{1})\}$. The element $(\overline{0},\overline{0})$ is the identity element in $\mathbb{Z}_2\times \mathbb{Z}_2$ and the group table is given by

$+$	$(\overline{0},\overline{0})$	$(\overline{1},\overline{0})$	$(\overline{0},\overline{1})$	$(\overline{1},\overline{1})$

$(\overline{0},\overline{0})$	$(\overline{0},\overline{0})$	$(\overline{1},\overline{0})$	$(\overline{0},\overline{1})$	$(\overline{1},\overline{1})$
$(\overline{1},\overline{0})$	$(\overline{1},\overline{0})$	$(\overline{0},\overline{0})$	$(\overline{1},\overline{1})$	$(0,1)$
$(\overline{0},\overline{1})$	$(\overline{0},\overline{1})$	$(\overline{1},\overline{1})$	$(\overline{0},\overline{0})$	$(\overline{1},\overline{0})$
$(\overline{1},\overline{1})$	$(\overline{1},\overline{1})$	$(\overline{0},\overline{1})$	$(\overline{1},\overline{0})$	$(\overline{0},\overline{0})$

Comparing with example [ex:klein-group], we see that there is an isomorphism $\{(\overline{0},\overline{0}), (\overline{1},\overline{0}), (\overline{0},\overline{1}), (\overline{1},\overline{1})\}\to \{e,a,b,c\}$, so $\mathbb{Z}_2\times \mathbb{Z}_2$ is isomorphic to the Klein group.

The order of each element $(g,h)\in G\times H$ is the least common multiple of the orders of $g$ and $h$.

Take the direct product of two cyclic groups $\mathbb{Z}_4\times \mathbb{Z}_3$, and consider the element $(\overline 2, \overline 2)$. The order of $\overline 2$ in $\mathbb{Z}_4$ is 2 while the order of $\overline 2$ in $\mathbb{Z}_3$ is 3 so the order of $(\overline 2, \overline 2)$ in $\mathbb{Z}_4\times \mathbb{Z}_3$ is $6$. And indeed we have \[\begin{aligned} &(\overline 2, \overline 2)^1 =(\overline 2, \overline 2)\qquad (\overline 2, \overline 2)^2= (\overline 0, \overline 1)\qquad (\overline 2, \overline 2)^3 = (\overline 2, \overline 0)\\ & (\overline 2, \overline 2)^4 = (\overline 0, \overline 2)\qquad (\overline 2, \overline 2)^5 = (\overline 2, \overline 1)\qquad (\overline 2, \overline 2)^6 = (\overline 0, \overline 0). \end{aligned}\]

4.4 Orthogonal transformations, rotations and their subgroups

We work in the vector space ${\mathbb{R}}^n$ with the standard Euclidean inner product. Recall that if vectors ${\bf v}$ and ${\bf w}$ are thought of as column vectors, then the inner product is $({\bf v},{\bf w})={\bf w}^t{\bf v}$.

Orthogonal transformations are linear transformations ${\bf v}\mapsto M{\bf v}$ which preserve lengths and angles.

The group of orthogonal transformations in ${\mathbb R}^n$ is precisely the orthogonal group $O(n)$.

In order for $M$ to be an orthogonal transformation for all ${\bf v},{\bf w}$ we need \[{\bf w}^t{\bf v}=(M{\bf w})^t (M{\bf v})={\bf w}^t M^t M {\bf v}.\] This requires $M^t M={\bf I}$, ie. $M$ has to be an orthogonal matrix. So the $M$ form a group, which is exactly the orthogonal group $O(n)$.

For $n=1$ we have $O(1)=\{1,-1\}={\mathbb{Z}}_2$. The group operation is multiplication, the element 1 is the identity, and the element $-1$ is the reflection $x\mapsto-x$.

This group $O(1)$ is finite and abelian, but $O(n)$ for $n\geq2$ is neither finite nor abelian. For instance take the following two elements of $O(2)$: \[A=\begin{pmatrix}1&0\\0& -1\end{pmatrix} \qquad ;\qquad B = \begin{pmatrix}0&-1\\1& 0\end{pmatrix}\, .\] A short computation shows that $AB\neq BA$, so $O(2)$ is not abelian. It is also not finite, since $SO(2)$ is a subgroup, and $SO(2)$ is not finite: recall from example [eq:SO(2)] that we have an element of $SO(2)$ for each angle $\theta\in[0,2\pi)$. Finally, note that we can build non-commuting elements in $O(n)$ for any $n>2$ by taking the block diagonal matrices $C_n={\rm{diag}}(A,I_{n-2})$, $D_n={\rm{diag}}(B,I_{n-2})$, where $I_{n-2}$ is the $(n-2)\times (n-2)$ identity matrix.

Recall from remark [remark:orthogonal-determinant] that any orthogonal matrix $M$ has $\det(M)=\pm1$. Any element $M$ of $O(n)$ with $\det(M)=-1$ gives an element $P=MN$ with $\det(P)=+1$ when composed with another element $N$ of $O(n)$ with $\det(N)=-1$. An example of such an element is reflection along the first coordinate: \[R_1 = \begin{pmatrix} -1 \\ & 1 \\ && 1 \\ &&& \ddots \\ &&&& 1 \end{pmatrix}\] which clearly has $\det(R_1)=-1$. So we can view elements of $O(n)$ as elements with $\det(M)=+1$ composed with reflections. This motivates the following definition, encoding the geometric intuition that the transformations preserving inner products are rotations and reflections.

The group of rotations in ${\mathbb R}^n$ is the subgroup of $O(n)$ with $\det(M)=1$; or in other words $SO(n)$.

$SO(1)$ is trivial, $SO(2)$ is abelian, and $SO(n)$ for $n\geq3$ is not abelian. That $SO(2)$ is abelian follows from the explicit form we found in example [eq:SO(2)] (you can easily show that two matrices of that form always commute; intuitively it does not matter if you first rotate by and angle $\alpha$ and then by $\beta$, or in the opposite order). To show that $SO(3)$ is not abelian we can use the block diagonal matrices $X={\rm{diag}}(A,-1)$, $Y={\rm{diag}}(B,1)$, where $A$ and $B$ are as in remark [remark:O(n)]. You can easily check that $X,Y\in SO(3)$ and $XY\neq YX$. You can now extend $X$, $Y$ to non-commuting matrices in $SO(n)$ for any $n>3$ as we did in remark [remark:O(n)], by constructing block diagonal matrices ${\rm{diag}}(X,I_{n-3})$, ${\rm{diag}}(Y,I_{n-3})$.

Plane polygons. Consider a regular plane polygon with $n$ sides. The rotations leaving it unchanged make up a finite abelian group of order $n$: isomorphic to ${\mathbb{Z}}_n$, but with the matrix representation $M_j\in{SO(2)}$ for $j=1,2,\ldots,n$, where \[M_j=\left[\begin{array}{cc} \cos(2\pi j/n) & -\sin(2\pi j/n) \\ \sin(2\pi j/n) & \cos(2\pi j/n)\end{array}\right].\] For example, the pure rotations which preserve a square ($n=4$) are represented by the four special orthogonal matrices \[M_1=\left[\begin{array}{cc}0 & -1 \\ 1 & 0\end{array}\right],\quad M_2=\left[\begin{array}{cc}-1 & 0 \\ 0 & -1\end{array}\right],\quad M_3=\left[\begin{array}{cc}0 & 1 \\ -1 & 0\end{array}\right],\quad M_4={\bf I}.\] These form a finite subgroup of $SO(2)$.

The group of all symmetries of the regular $n$-polygon is the dihedral group $D_n$, which is a finite group of order $2n$. (Due to its order, some people call it $D_{2n}$ instead, so beware of confusion.) This includes reflections; so $D_n$ is finite subgroup of $O(2)$. The name comes from “dihedron”, namely a two-sided polyhedron.

For example, the group of symmetries of a square is $D_4$, the elements of which are represented by $\{M_1,M_2,M_3,M_4\}$ plus the four reflections \[R_1=\left[\begin{array}{cc}0 & -1 \\ -1 & 0\end{array}\right],\quad R_2=\left[\begin{array}{cc}-1 & 0 \\ 0 & 1\end{array}\right],\quad R_3=\left[\begin{array}{cc}0 & 1 \\ 1 & 0\end{array}\right],\quad R_4=\left[\begin{array}{cc}1 & 0 \\ 0 & -1\end{array}\right].\]

You can visualize $R_1$ as a reflection across the line $y=-x$, $R_2$ as a reflection across the $y$-axis, $R_3$ as a reflection across the line $y=x$ and finally $R_4$ a reflection across the $x$-axis.

Using the representation written above of the Klein group as group of reflections across the $x-$ and $y-$ axis in $\mathbb{R}^2$, you can prove that the Klein group is a subgroup of $D_4$.

4.5 More on finite order groups

4.5.1 The symmetric group and the alternating group

A permutation $\sigma$ of a set $A_n=\{a_1,\ldots,a_n\}$ of $n$ elements is a bijection from $A_n$ to itself.

A transposition $\tau_{ij}\colon A_n\to A_n$ is a permutation that exchanges two elements $a_i$ and $a_j$ of $A_n$, and leaves every other element invariant. That is, $\tau_{ij}(a_i)=a_j$, $\tau_{ij}(a_j)=a_i$, and $\tau_{ij}(a_k)=a_k$ for $i\neq k\neq j$.

The symmetric group of degree $n$ (we take $n>1$), denoted by $S_n$, is the group whose elements are all the permutations that can be performed on $n$ distinct symbols, and whose group operation is the composition of permutations.

The order of $S_n$ is $n!$.

Any permutation $\sigma \in S_n$ can be written as product of transpositions: $\sigma=\tau_{i_1 j_1}\tau_{i_2j_2}\cdots \tau_{i_k j_k}$.

The sign $\mathop{\mathrm{sgn}}(\sigma)$ of the permutation is defined to be $\mathop{\mathrm{sgn}}(\sigma) = +1$, if $\sigma$ is written as the product of an even number of transpositions, or $\mathop{\mathrm{sgn}}(\sigma) = -1$ if $\sigma$ is written as the product of an odd number of transpositions.

If $\sigma=\tau_{i_1 j_1}\tau_{i_2j_2}\cdots \tau_{i_k j_k}$ and $\sigma=\tau_{l_1 m_1}\tau_{l_2 m_2}\cdots \tau_{l_p m_p}$ are two decompositions of a given permutation $\sigma$ into transpositions, then $k\equiv p$ (mod 2). In particular $\mathop{\mathrm{sgn}}(\sigma)$ is well defined.

We will say that a permutation $\sigma \in S_n$ is even if $\mathop{\mathrm{sgn}}(\sigma) = +1$, i.e. if it can be decomposed as the product of an even number of transpositions, or odd otherwise.

Let $A_n=\{\sigma\in S_n \,\vert\,\mathop{\mathrm{sgn}}(\sigma)=+1\}$ be the subset of $S_n$ containing all the even permutations. $A_n$ is a subgroup of $S_n$ called the alternating group of degree $n$.

The order of $A_n$ is $n!/2$.

There is a one-to-one map between even and odd permutations, obtained by composing with the transposition of the first two elements. So there are as many odd as even permutations, and every permutation is either odd or even, so the result follows.

Note that the subset of $S_n$ given by the odd permutations does not define a subgroup: composing two odd permutations gives an even permutation, so closure does not hold.

4.5.2 Multiplicative groups

Consider the set of equivalence classes of integers modulo $3$, namely $S=\{\overline{0}, \overline{1},\overline{2}\}$. The operation of multiplication is well-defined on $S$, and $\overline{1}$ is an identity. Now $S$ is not a group, since $\overline{0}$ has no inverse. But if we omit $\overline{0}$, then we get the multiplicative group ${\mathbb{Z}}^{\times}_3=\{\overline{1},\overline{2}\}$. This group has order $2$ and table

$\times$	$\overline{1}$	$\overline{2}$

$\overline{1}$	$\overline{1}$	$\overline{2}$
$\overline{2}$	$\overline{2}$	$\overline{1}$

Note that ${\mathbb{Z}}^{\times}_3\cong{\mathbb{Z}}_2$.

What about the integers modulo $4$? The set $S=\{\overline{1},\overline{2},\overline{3}\}$ is not a group, since it is not closed. But if we omit $\overline{2}$, then we get a group ${\mathbb{Z}}^{\times}_4=\{\overline{1},\overline{3}\}$ with table

$\times$	$\overline{1}$	$\overline{3}$

$\overline{1}$	$\overline{1}$	$\overline{3}$
$\overline{3}$	$\overline{3}$	$\overline{1}$

which is again isomorphic to ${\mathbb{Z}}_2$. It is not hard to see that the set ${\mathbb{Z}}^{\times}_n$, consisting of classes $\overline{k}$ such that $k$ and $n$ do not have a common factor, forms a finite multiplicative group. If $n$ is prime, then this group has order $n-1$; but if $n$ is not prime, then the order is less than $n-1$.

${\mathbb{Z}}^{\times}_6$ has order $2$, and is isomorphic to ${\mathbb{Z}}_2$. First we consider the set $S=\{\overline{1},\overline{2},\overline{3},\overline{4},\overline{5}\}$, this is not a group, since it is not closed, i.e. $\overline{2} \cdot \overline{3} = 0 \,( \mbox{mod} \,6)$ and similarly $\overline{3} \cdot \overline{4} = 0 \,( \mbox{mod} \,6)$. If we omit $\overline{2},\overline{3},\overline{4}$ we get the group ${\mathbb{Z}}^{\times}_6=\{\overline{1},\overline{5}\}$ whose table is

$\times$	$\overline{1}$	$\overline{5}$

$\overline{1}$	$\overline{1}$	$\overline{5}$
$\overline{5}$	$\overline{5}$	$\overline{1}$

clearly ${\mathbb{Z}}^{\times}_6\cong{\mathbb{Z}}_2$.

${\mathbb{Z}}^{\times}_5$ has order $4$, and table

$\times$	$\overline{1}$	$\overline{2}$	$\overline{3}$	$\overline{4}$

$\overline{1}$	$\overline{1}$	$\overline{2}$	$\overline{3}$	$\overline{4}$
$\overline{2}$	$\overline{2}$	$\overline{4}$	$\overline{1}$	$\overline{3}$
$\overline{3}$	$\overline{3}$	$\overline{1}$	$\overline{4}$	$\overline{2}$
$\overline{4}$	$\overline{4}$	$\overline{3}$	$\overline{2}$	$\overline{1}$

In fact, ${\mathbb{Z}}^{\times}_5\cong{\mathbb{Z}}_4$. (Exercise for you: find the explicit isomorphism by comparing the tables.)

5 * Jordan normal form and the pseudoinverse

None of the material in this section is examinable.

During this course we have focused almost entirely on what to do when matrices are diagonalizable, i.e. how to diagonalize a matrix using its eigenvectors and perhaps apply Gram-Schmidt to the eigenvectors to perform orthogonal/unitary diagonalization. However, as discussed in the first chapter, not every matrix is diagonalizable! Whenever the geometric multiplicity of an eigenspace does not equal the algebraic multiplicity of the eigenvalue we know that we cannot diagonalize the matrix. For this reason we might want to understand what is the “best” form we can put a matrix into via a change of basis if it is not diagonalizable? The answer to this question goes under the name of Jordan Normal form.

5.1 * Jordan normal form

Any square matrix is similar to one in Jordan normal form (defined below), which is almost (but maybe not quite) diagonal. Let us start with the $2\times2$ case.

Let the $2\times2$ matrix $A$ have repeated eigenvalue $\lambda$. Then either $A=\lambda I$ diagonal, or $A$ is similar to the matrix \[J=\left(\begin{matrix} \lambda & 1 \\ 0 & \lambda \end{matrix}\right).\]

The Cayley-Hamilton theorem tells us that $(A-\lambda I)^2=0$. If $A-\lambda I=0$, then the first possibility holds; so suppose that $A-\lambda I\neq0$. This implies that there is some vector ${\bf v}$ such that ${\bf w}=(A-\lambda I){\bf v}\neq0$. Clearly $(A-\lambda I){\bf w}=0$, because $(A-\lambda I)^2=0$, so $A{\bf w}=\lambda{\bf w}$ and $A{\bf v}=\lambda{\bf v}+{\bf w}$. The vectors ${\bf v}$ and ${\bf w}$ are linearly independent (if they are linearly dependent we would have ${\bf v}=\alpha {\bf w}$ with $\alpha$ some non-zero constant, and then $(A-\lambda I){\bf w}=0$ would imply $(A-\lambda I){\bf v}=0$, a contradiction), so the matrix $M=({\bf w}, {\bf v})$ is invertible. We then have $AM=MJ$ with $M$ invertible, which proves our result.

Solve the system of ODEs \[\begin{aligned} \dot{x} & = x+y, \\ \dot{y} & = -x+3y. \end{aligned}\] Solution. The system has the form $\dot{\bf u}=A{\bf u}$, where \[A=\left(\begin{matrix} 1 & 1 \\ -1 & 3 \end{matrix}\right).\] The characteristic polynomial of $A$ is $p(t)=(t-2)^2$, so $A$ has the double eigenvalue $\lambda=2$. There is only one eigenvector, namely ${\bf w}=(1,1)$, so $A$ cannot be diagonalized. To get ${\bf v}$ we solve $(A-2I){\bf v}={\bf w}$, and one solution is ${\bf v}=(1,2)$ — this is not the only solution, but that doesn’t matter. Thus $M^{-1}AM=J$, where \[M=\left(\begin{matrix} 1 & 1 \\ 1 & 2 \end{matrix}\right), \quad J=\left(\begin{matrix} 2 & 1 \\ 0 & 2 \end{matrix}\right).\] If ${\bf s}=M^{-1}{\bf u}$, then the original system becomes $\dot s_1=2s_1+s_2$, $\dot s_2=2s_2$, which is easily solved to give $s_2(t)=c_2\exp(2t)$, $s_1(t)=(c_1+c_2t)\exp(2t)$. (Exercise for you: obtain this.) Finally, transform back to ${\bf u}=M{\bf s}$, giving \[x(t)=(c_1+c_2+c_2t)\exp(2t), \quad y(t)=(c_1+2c_2+c_2t)\exp(2t).\] (Exercise for you: check this by substitution.)

We can generalise the previous discussion to arbitrary square matrices, as follows.

A Jordan block $J_n^{(\lambda)}$ of size $n$ associated with an eigenvalue $\lambda$ is a $n\times n$ matrix of the form \[J_n^{(\lambda)} = \begin{pmatrix} \lambda & 1 & \\ & \lambda & 1 \\ & & \ddots & \ddots\\ & & & \lambda & 1\\ & & & & \lambda \end{pmatrix}\, .\] That is, a $n\times n$ matrix with $\lambda$ on the diagonal, ones immediately above the diagonal, and zeros everywhere else.

Explicitly, for $n\leq 3$ we have: \[J_1^{(\lambda)} = \begin{pmatrix} \lambda \end{pmatrix} \qquad ; \qquad J_2^{(\lambda)} = \begin{pmatrix} \lambda & 1 \\ 0 & \lambda \end{pmatrix} \qquad ; \qquad J_3^{(\lambda)} = \begin{pmatrix} \lambda & 1 & 0 \\ 0 & \lambda & 1 \\ 0 & 0 & \lambda \end{pmatrix}\, .\]

A $n\times n$ matrix $N$ is in Jordan normal form if it is block diagonal, with Jordan blocks $J_{n_i}^{(\lambda i)}$ along the diagonal \[N = \begin{pmatrix} J_{n_1}^{\lambda_1} \\ & J_{n_2}^{\lambda_2} \\ & & \ddots \\ & & & J_{n_k}^{\lambda_k} \end{pmatrix}\, .\] Here $i$ goes from 1 to $k$ for some $k\leq n$, and $n_1+n_2+\ldots+n_k=n$. We do not require that the $\lambda_i$ are distinct.

Here are some properties of $N$:

Since the $J_{n_i}^{(\lambda_i)}$ are therefore $N$ are upper triangular $p_N(t)=(\lambda_1-t)^{n_1}\cdots (\lambda_k-t)^{n_k}$. So $\lambda_i$ are the eigenvalues of $N$.
Fix some eigenvalue $\lambda$ of $N$. Then the algebraic multiplicity of $\lambda$ is the sum of the $n_i$ where $\lambda_i=\lambda$.
On the other hand, the geometric multiplicity of some eigenvalue $\lambda$ is the number of Jordan blocks $J_{n_i}^{(\lambda_i)}$ such that $\lambda_i=\lambda$. To show this, notice (by solving $J^{(\lambda_i)}_{n_i}{\bf v}=\lambda_i {\bf v}$) that the geometric multiplicity associated to each Jordan block is precisely one.
The two statements above imply that $N$ will be diagonalizable if and only if all $n_i=1$, because only in this case do the geometric and algebraic multiplicities of all eigenvalues coincide. So a matrix $N$ in Jordan normal form is diagonalizable if and only if it is diagonal.

(Without proof.) Every square matrix $A$ is similar to some matrix $N$ in Jordan normal form. We refer to $N$ as the Jordan normal form of $A$. $N$ is unique up to changing the ordering of the Jordan blocks.

Find the Jordan normal form of \[A=\left(\begin{matrix} 2 & 2 & -2 \\ 1 & -1 & 1 \\ 2 & 0 & 0 \end{matrix}\right).\]

Solution. We first calculate the characteristic polynomial: \[p_A(t)=\det(A-tI) =\det\left(\begin{matrix} 2-t & 2 & -2 \\ 1 & -1-t & 1 \\ 2 & 0 & -t\end{matrix}\right) =-t^3+t^2=(1-t)t^2.\] Thus the eigenvalues are $\lambda=1$ and $\lambda=0$ (twice).

$\lambda=1$ \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right)=(A-I){\bf x} = \left(\begin{matrix} 1 & 2 & -2 \\ 1 & -2 & 1 \\ 2 & 0 & -1\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] So $x_3=2x_1$ and $2x_2=-x_1+2x_3=3x_1$. An eigenvector is \[{\bf u}_1=\left(\begin{matrix} 2 \\ 3 \\ 4\end{matrix}\right).\]
$\lambda=0$ \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right)=(A-0I){\bf x}= A{\bf x} = \left(\begin{matrix} 2 & 2 & -2 \\ 1 & -1 & 1 \\ 2 & 0 & 0\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] So $x_1=0$ and $x_2=x_3$. An eigenvector is \[{\bf u}_2=\left(\begin{matrix} 0 \\ 1 \\ 1\end{matrix}\right).\] The $0$-eigenspace is spanned by ${\bf u}_2$. So to calculate the Jordan normal form, we must solve $A{\bf x}={\bf u}_2$, which gives \[\left(\begin{matrix} 2 & 2 & -2 \\ 1 & -1 & 1 \\ 2 & 0 & 0\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right) = \left(\begin{matrix} 0 \\ 1 \\ 1\end{matrix}\right).\] Hence we can take \[{\bf x}={\bf u}_3=\left(\begin{matrix} 1/2 \\ -1/2 \\ 0\end{matrix}\right).\]

Let $M$ be the matrix whose columns are ${\bf u}_1$, ${\bf u}_2$ and ${\bf u}_3$. Then \[M=\left(\begin{matrix} 2 & 0 & 1/2 \\ 3 & 1 & -1/2 \\ 4 & 1 & 0\end{matrix}\right),\qquad M^{-1}=\left(\begin{matrix} 1 & 1 & -1 \\ -4 & -4 & 5 \\ -2 & -4 & 4 \end{matrix}\right),\] and \[M^{-1}AM=\left(\begin{matrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0\end{matrix}\right).\] This is the Jordan normal form of $A$. It has a $1\times 1$ block associated with $\lambda=1$, and a $2\times 2$ block associated with $\lambda=0$.

If $\lambda$ is a triple eigenvalue, and there is only one eigenvector ${\bf w}$, then we just iterate the process. Solve $(A-\lambda I){\bf v}={\bf w}$ to get ${\bf v}$, then solve $(A-\lambda I){\bf u}={\bf v}$ to get ${\bf u}$; and take $M$ to have the three columns ${\bf w}$, ${\bf v}$, ${\bf u}$.

Find the Jordan normal form of \[A=\left(\begin{matrix} 1 & 1 & 0 \\ 0 & 2 & 1 \\ 1 & -1 & 3\end{matrix}\right).\]

Solution. We first calculate the characteristic polynomial of $A$: \[p_A(t)=\det(A-tI) =\det\left(\begin{matrix} 1-t & 1 & 0 \\ 0 & 2-t & 1 \\ 1 & -1 & 3-t\end{matrix}\right)=(2-t)^3.\] Therefore the only eigenvalue is $\lambda=2$, of algebraic multiplicity 3. We first solve \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right)=(A-2I){\bf x} =\left(\begin{matrix} -1 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & -1 & 1\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] The solutions satisfy $x_1=x_2$, $x_3=0$. So we can take \[{\bf u}_1=\left(\begin{matrix} 1 \\ 1 \\ 0\end{matrix}\right).\] Now we look for a solution of $(A-2I){\bf x}= {\bf u}_1$, which gives \[{\bf x}= {\bf u}_2=\left(\begin{matrix} 0 \\ 1 \\ 1\end{matrix}\right).\] Finally, we solve $(A-2I){\bf x}={\bf u}_2$, which gives the third root vector \[{\bf x}={\bf u}_3=\left(\begin{matrix} 0 \\ 0 \\ 1\end{matrix}\right).\] Now if $M$ is the matrix whose columns are ${\bf u}_1$, ${\bf u}_2$ and ${\bf u}_3$, we have \[M=\left(\begin{matrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 0 & 1 & 1\end{matrix}\right),\qquad M^{-1}=\left(\begin{matrix} 1 & 0 & 0 \\ -1 & 1 & 0 \\ 1 & -1 & 1\end{matrix}\right),\] and \[M^{-1}AM=\left(\begin{matrix} 2 & 1 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2\end{matrix}\right).\] This is the Jordan normal form of $A$. It has a single $3\times 3$ Jordan block associated with $\lambda=2$.

Note that these choices are not unique. We could have chosen \[{\bf u}_2=\left(\begin{matrix} 1 \\ 2 \\ 1\end{matrix}\right);\] and then we can choose \[{\bf u}_3=\left(\begin{matrix} -1 \\ 0 \\ 2\end{matrix}\right)\quad \hbox{ or }\quad \left(\begin{matrix} 0 \\ 1 \\ 2\end{matrix}\right).\]

Find the Jordan normal form of \[A=\left(\begin{matrix} -2 & -6 & 6 & 15 \\ 0 & 7 & 0 & -15 \\ -2 & -4 & 5 & 10 \\ 0 & 2 & 0 & -4 \end{matrix}\right),\] using the fact that $p_A(t)=(t-2)^2(t-1)^2$.

Solution.

$\lambda =2$ \[{\bf 0}=(A-2I){\bf x}= \left(\begin{matrix} -4 & -6 & 6 & 15 \\ 0 & 5 & 0 & -15 \\ -2 & -4 & 3 & 10 \\ 0 & 2 & 0 & -6 \end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4\end{matrix}\right)\] has solutions \[\begin{aligned} -4x_1-6x_2+6x_3+15x_4 &= 0, \\ x_2-3x_4 &= 0,\\ -2x_1-4x_2+3x_3+10x_4 &= 0. \end{aligned}\] Subtracting to the first two times the third one gives $2x_2-5x_4=0$ which combined with the second equation gives $x_2 = x_4=0$ and hence $2x_1=3x_3$. Therefore \[\dim \ker \bigl((A-2I)\bigr)=1,\] and it is spanned by \[{\bf u}_1=\left(\begin{matrix} 3 \\ 0 \\ 2 \\ 0\end{matrix}\right)\] Next \[{\bf 0}=(A-2I)^2{\bf x}= \left(\begin{matrix} 4 & 0 & -6 & 0 \\ 0 & -5 & 0 & 15 \\ 2 & 0 & -3 & 0 \\ 0 & -2 & 0 & 6 \end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4\end{matrix}\right)\] has solutions \[\begin{aligned} 2x_1-3x_3 &= 0,\\ x_2-3x_4 &= 0. \end{aligned}\] Therefore \[\dim \ker \bigl((A-2I)^2\bigr)=2,\] and it is spanned by \[{\bf u}_1=\left(\begin{matrix} 3 \\ 0 \\ 2 \\ 0\end{matrix}\right),\ {\bf u}_2=\left(\begin{matrix} 0 \\ 3 \\ 0 \\ 1\end{matrix}\right),\ \] In fact, $A{\bf u}_2=2{\bf u}_2-{\bf u}_1$. The associated Jordan block is \[\left(\begin{matrix} 2 & 1 \\ 0 & 2 \end{matrix}\right).\]
$\lambda=1$ \[{\bf 0}=(A-I){\bf x}= \left(\begin{matrix} -3 & -6 & 6 & 15 \\ 0 & 6 & 0 & -15 \\ -2 & -4 & 4 & 10 \\ 0 & 2 & 0 & -5 \end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3 \\ x_4\end{matrix}\right)\] This has solutions \[\begin{aligned} -x_1-2x_2+2x_3+5x_4 &= 0, \\ 2x_2-5x_4 &=0. \end{aligned}\] Thus \[\dim \ker \bigl((A-I)\bigr)=2,\] and it is spanned by \[{\bf u}_3=\left(\begin{matrix} 2 \\ 0 \\ 2 \\ 0\end{matrix}\right),\ {\bf u}_4=\left(\begin{matrix} 0 \\ 5 \\ 0 \\ 2\end{matrix}\right).\]

Thus we obtain two Jordan blocks of the form $(1)$.

Hence the Jordan normal form of $A$ is \[J=\left(\begin{matrix} 2 & 1 & 0 & 0 \\ 0 & 2 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{matrix}\right).\] Indeed, if $M$ is the matrix whose columns are $-{\bf u}_1$, ${\bf u}_2$, ${\bf u}_3$, ${\bf u}_4$, then $M^{-1}AM=J$.

Find the Jordan normal form of \[A=\left(\begin{matrix} 0 & 4 & 4 \\ -2 & 6 & 4 \\ 1 & -2 & 0\end{matrix}\right).\]

Solution. We first calculate the characteristic polynomial of $A$: \[p_A(t)=\det(A-tI) =\det\left(\begin{matrix} -t & 4 & 4 \\ -2 & 6-t & 4 \\ 1 & -2 & -t \end{matrix}\right)=(2-t)^3.\] Next we solve \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right)=(A-2I){\bf x} =\left(\begin{matrix} -2 & 4 & 4 \\ -2 & 4 & 4 \\ 1 & -2 & -2\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] These solutions lie on the plane $x_1-2x_2-2x_3=0$, which is 2-dimensional. We choose two independent vectors in the plane \[{\bf u}_1=\left(\begin{matrix} 2 \\ 1 \\ 0\end{matrix}\right) \qquad \text{and} \qquad {\bf u}_2\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\left(\begin{matrix} 2 \\ 2 \\ -1\end{matrix}\right)\, .\] To find a third vector, we solve $(A-2I){\bf u}_3={\bf u}_2$. This equation will have a solution since $\dim \ker \bigl((A-2I)\bigr)=2$ and $\dim \ker \bigl((A-2I)^2\bigr)=3.$

We take \[{\bf u}_3=\left(\begin{matrix} -1 \\ 0 \\ 0\end{matrix}\right).\]

Define \[M=\left(\begin{matrix} 2 & 2 & -1 \\ 1 & 2 & 0 \\ 0 & -1 & 0 \end{matrix}\right), \qquad M^{-1}=\left(\begin{matrix} 0 & 1 & 2 \\ 0 & 0 & -1 \\ -1 & 2 & 2 \end{matrix}\right),\] and then \[M^{-1}AM=\left(\begin{matrix} 2 & 0 & 0 \\ 0 & 2 & 1 \\ 0 & 0 & 2 \end{matrix}\right).\]

Find the Jordan normal form of \[A=\left(\begin{matrix} 1 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2\end{matrix}\right).\]

Solution. We first calculate the characteristic polynomial of $A$: \[p_A(t)=\det(A-tI) =\det\left(\begin{matrix} 1-t & 1 & 0 \\ 0 & 2-t & 0 \\ 0 & 0 & 2-t \end{matrix}\right)=(1-t)(2-t)^2.\] So the eigenvalues are $\lambda=1$ and $\lambda=2$.

$\lambda=1$ \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right) = (A-I){\bf x}= \left(\begin{matrix} 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] The solutions are $x_2=x_3=0$, so we take \[{\bf u}_1=\left(\begin{matrix} 1 \\ 0 \\ 0\end{matrix}\right).\]
$\lambda=2$ \[\left(\begin{matrix} 0 \\ 0 \\ 0\end{matrix}\right) = (A-2I){\bf x}= \left(\begin{matrix} -1 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0\end{matrix}\right) \left(\begin{matrix} x_1 \\ x_2 \\ x_3\end{matrix}\right).\] The solutions are $x_1=x_2$. This is 2-dimensional, so we take two linearly independent eigenvectors \[{\bf u}_2=\left(\begin{matrix} 1 \\ 1 \\ 0\end{matrix}\right), \qquad {\bf u}_3=\left(\begin{matrix} 0 \\ 0 \\ 1\end{matrix}\right).\]

Define $M$ to be the matrix whose columns are the ${\bf u}_j$: \[M=\left(\begin{matrix} 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}\right),\qquad P^{-1}=\left(\begin{matrix} 1 & -1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}\right).\] Then \[M^{-1}AM=\left(\begin{matrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{matrix}\right).\]

5.2 * The pseudoinverse

Given an $n\times n$ matrix $A$ and an $n$-vector ${\bf k}$, consider the equation $A{\bf x}={\bf k}$. If $\det(A)\neq0$, then this has a unique solution ${\bf x}=A^{-1}{\bf k}$. But what if $\det(A)=0$? Consider the case ${\bf k}={\bf 0}$, ie. $A{\bf x}={\bf 0}$. Suppose that $m$ rows of $A$ are linearly independent with $m<n$, so in effect we have $m$ equations in $n$ unknowns. The solutions form a vector space of dimension $n-m$. In other words, there are many solutions. We say that the system is underdetermined.

If the matrix $M$ has eigenvalue $\lambda$, then $A=M-\lambda{\bf I}$ has $\det(A)=0$, and the solutions of $A{\bf x}={\bf 0}$ are exactly the eigenvectors in the eigenspace $V_{\lambda}$.

What if ${\bf x}$ is an $n$-vector, and $A$ an $m\times n$ matrix, with $m>n$? Then the system $A{\bf x}={\bf k}$ is overdetermined, and in general it has no solutions. The question is then: what is the choice of ${\bf x}$ that brings us closest to ${\bf y}$, using the standard norm in ${\mathbb R}^m$?

Given a $m\times n$ matrix $A$, we have \[\text{nullspace}(A) = \text{nullspace}(A^tA)\, .\]

Clearly every vector ${\bf v}$ in the nullspace of $A$, that is, such that $A{\bf v}={\bf 0}$, is in the nullspace of $A^tA$, since $(A^tA){\bf v}= A^t(A{\bf v})=A^t{\bf 0}=\bf 0$. So $\text{nullspace}(A)\subseteq \text{nullspace}(A^tA)$.

Now take a vector ${\bf v}$ in the nullspace of $A^t A$. We have $A^tA {\bf v}=\bf 0$, which implies ${\bf v}^t A^t A{\bf v}=0$. On the other hand, if we choose the standard norm in ${\mathbb R}^m$, this is saying that $||A{\bf v}||^2=0$, so by the separation of point property of norms (recall definition [def:norm]), this implies $A{\bf v}= \bf 0$. So $\text{nullspace}(A^tA)\subseteq \text{nullspace}(A)$ and therefore $\text{nullspace}(A) = \text{nullspace}(A^tA)$.

If the columns of a $m\times n$ matrix $A$ are linearly independent then the $n\times n$ matrix $A^tA$ is invertible.

Given any $n$-vector ${\bf v}$, the $m$-vector $A{\bf v}$ is a linear combination of the columns of $A$, with coefficients the components $v_i$ of ${\bf v}$. By assumption these columns are linearly independent, so a linear combination of them with coefficients $v_i$ can only vanish if all $v_i$ vanish, and therefore ${\bf v}=0$. So we learn that in this case $\text{nullspace}(A)=\{\bf 0\}$, and by lemma [lemma:A^tA-nullspace] we then have $\text{nullspace}(A^tA)=\{\bf 0\}$, so $A^tA$ is invertible (lemma 3.6.1 of the Michaelmas lectures).

We define the pseudoinverse of an $m\times n$ matrix $A$ with linearly independent columns to be the $n\times m$ matrix $\psi(A)=(A^tA)^{-1}A^t$.

Note that \[A{\bf x}={\bf k}\Rightarrow A^tA{\bf x}=A^t{\bf k}\Rightarrow {\bf x}=(A^tA)^{-1}A^t{\bf k}=\psi(A){\bf k}\] which partially justifies the name “pseudoinverse” for $\psi(A)$. Additionally, if $m=n$, then \[\psi(A)=(A^tA)^{-1}A^t=A^{-1}(A^t)^{-1}A^t=A^{-1}\, .\]

Let $(x_1,y_1),(x_2,y_2),\ldots,(x_n,y_n)$ be a set of $n$ data points. Let $y_{{\bf a}}(x)=a_0+a_1x+\ldots+a_{m-1}x^{m-1}\in{\mathbb{R}}[x]_{m-1}$ be a polynomial that is to be fitted to these data points, with $m<n$. Define \[X\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\left(\begin{array}{cccc} 1 & x_1 & \ldots & x_1^{m-1}\\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_n & \ldots & x_n^{m-1} \end{array} \right), \qquad {\bf a}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\left(\begin{array}{c} a_0\\\vdots\\a_{m-1}\end{array}\right), \qquad {\bf y}\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\left(\begin{array}{c} y_1\\\vdots\\y_n\end{array}\right).\] Assume (as is generically the case) that the $x_i$ are such that the columns of $X$ are linearly independent. Then $y_{{\bf a}}(x)$ with ${\bf a}=\psi(X){\bf y}$ gives the polynomial which is the least-squares fit to the data.

Note that with these definitions the polynomial $y_{{\bf a}}(x)$ would pass through the given data points if ${\bf y}=X{\bf a}$. In general it is not possible choose an ${\bf a}$ such that this holds, since the system is overdetermined, so we try to find the polynomial that ends up closest to the given data points.

Think of ${\bf y}\in{\mathbb{R}}^n$ with its standard Euclidean inner product. We have $X\colon{\mathbb{R}}^m\to{\mathbb{R}}^n$, so its image $W\mathrel{\mathop:}\mathrel{\mkern-1.2mu}=\mathrm{Image}(X)$ is a subspace of ${\mathbb{R}}^n$ of dimension $m$. Let ${\bf y}'$ be the orthogonal projection of ${\bf y}$ onto $W$. Since ${\bf y}'\in W$, there is some ${\bf a}$ such that $X{\bf a}={\bf y}'$, so the points $(x_1,y'_1),(x_2,y'_2),\ldots,(x_n,y'_n)$ lie on some polynomial $y_{{\bf a}}(x)$. By proposition [prop:least-squares], this choice of ${\bf y}'$ minimizes \[\|{\bf y}'-{\bf y}\|^2=(y_1'-y_1)^2+\ldots+(y'_n-y_n)^n;\] so the polynomial $y_{\bf a}(x)$ is the least-squares fit.

Now a basis for $W$ is $\{{\bf w}_1,\ldots,{\bf w}_m\}$, where the ${\bf w}_i$ are the columns of $X$, which are linearly independent by assumption. We can equivalently write this as ${\bf w}_i=X \mathbf{e}_i$, with $\mathbf{e}_i$ the standard basis vectors of $\mathbb{R}^m$. For instance, \[{\bf w}_2= \begin{pmatrix} x_1^2 \\ \vdots \\ x_n^2 \end{pmatrix} = X\mathbf{e}_2=X\left(\begin{array}{c}0\\0\\1\\0\\\vdots\end{array}\right).\] The condition for ${\bf y}'-{\bf y}$ to be orthogonal to $W$, and therefore for ${\bf y}'$ to be the orthogonal projection of ${\bf y}$ to $W$, is \[\begin{aligned} & {\bf w}_i^t ({\bf y}'-{\bf y}) = 0 \qquad \quad\forall i=1,\ldots,m\\ \iff & \mathbf{e}_i^t X^t ({\bf y}'-{\bf y}) = 0 \qquad \forall i=1,\ldots,m\\ \iff & X^t({\bf y}'-{\bf y})={\bf 0}\\ \iff & X^t{\bf y}'=X^t {\bf y}\\ \iff & X^tX{\bf a}=X^t{\bf y}\\ \iff & {\bf a}=(X^tX)^{-1}X^t{\bf y}= \psi(X){\bf y}. \end{aligned}\]

Find the line which best fits the data points $(1,1)$, $(2,2.4)$, $(3,3.6)$, $(4,4)$ in ${\mathbb{R}}^2$.

We want to fit the line $y_{{\bf a}}(x)=a_0+a_1x$ to the data-points. Define, as in theorem [thm:pseudoinverse-fit], \[X=\left(\begin{matrix} 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ 1 & 4\end{matrix}\right), \qquad {\bf a}=\left(\begin{matrix}a_0 \\ a_1\end{matrix}\right), \qquad {\bf y}=\left(\begin{matrix}1 \\ 2.4 \\ 3.6 \\ 4\end{matrix}\right).\] Then \[X^tX=\left(\begin{matrix}4 & 10 \\ 10 & 30\end{matrix}\right) \Rightarrow \psi(X)=\frac{1}{10} \left(\begin{matrix}10 & 5 & 0 & -5\\ -3 & -1 & 1 & 3\end{matrix}\right).\] Thus \[{\bf a}=\psi(X){\bf y}= \left(\begin{matrix}0.2 \\ 1.02\end{matrix}\right),\] and so $y_{{\bf a}}(x)=0.2+1.02x$ is the best linear (least-squares) fit to the data points.

+	\(\overline{0}\)	\(\overline{1}\)	\(\overline{2}\)	\(\overline{3}\)

\(\overline{0}\)	\(\overline{0}\)	\(\overline{1}\)	\(\overline{2}\)	\(\overline{3}\)
\(\overline{1}\)	\(\overline{1}\)	\(\overline{2}\)	\(\overline{3}\)	\(\overline{0}\)
\(\overline{2}\)	\(\overline{2}\)	\(\overline{3}\)	\(\overline{0}\)	\(\overline{1}\)
\(\overline{3}\)	\(\overline{3}\)	\(\overline{0}\)	\(\overline{1}\)	\(\overline{2}\)

\(\bullet\)	\(e\)	\(a\)	\(b\)	\(c\)
\(e\)	\(e\)	\(a\)	\(b\)	\(c\)
\(a\)	\(a\)	\(e\)	\(c\)	\(b\)
\(b\)	\(b\)	\(c\)	\(e\)	\(a\)
\(c\)	\(c\)	\(b\)	\(a\)	\(e\)

\(+\)	\((\overline{0},\overline{0})\)	\((\overline{1},\overline{0})\)	\((\overline{0},\overline{1})\)	\((\overline{1},\overline{1})\)

\((\overline{0},\overline{0})\)	\((\overline{0},\overline{0})\)	\((\overline{1},\overline{0})\)	\((\overline{0},\overline{1})\)	\((\overline{1},\overline{1})\)
\((\overline{1},\overline{0})\)	\((\overline{1},\overline{0})\)	\((\overline{0},\overline{0})\)	\((\overline{1},\overline{1})\)	\((0,1)\)
\((\overline{0},\overline{1})\)	\((\overline{0},\overline{1})\)	\((\overline{1},\overline{1})\)	\((\overline{0},\overline{0})\)	\((\overline{1},\overline{0})\)
\((\overline{1},\overline{1})\)	\((\overline{1},\overline{1})\)	\((\overline{0},\overline{1})\)	\((\overline{1},\overline{0})\)	\((\overline{0},\overline{0})\)

\(\times\)	\(\overline{1}\)	\(\overline{2}\)

\(\overline{1}\)	\(\overline{1}\)	\(\overline{2}\)
\(\overline{2}\)	\(\overline{2}\)	\(\overline{1}\)

\(\times\)	\(\overline{1}\)	\(\overline{3}\)

\(\overline{1}\)	\(\overline{1}\)	\(\overline{3}\)
\(\overline{3}\)	\(\overline{3}\)	\(\overline{1}\)

\(\times\)	\(\overline{1}\)	\(\overline{5}\)

\(\overline{1}\)	\(\overline{1}\)	\(\overline{5}\)
\(\overline{5}\)	\(\overline{5}\)	\(\overline{1}\)