1 Spectral theory of linear operators

Throughout, let \((X,\|\cdot\|)\) be a complex Banach space (i.e. Banach space over the field \(\mathbb{C}\)), and let \(T:\mathcal{D}(T)\subset X\to X\) be a linear operator. Recall that \(B(X,Y)\) denotes the space of all bounded linear operators \(T:X\to Y\), and \(B(X)=B(X,X)\). In addition, let \(C(X,Y)\) denote the set of all closed linear operators \(T:\mathcal{D}(T)\subset X\to Y\), and \(C(X)=C(X,X)\). We denote by \(I\) the identity map in \(X\). Scalar multiples \(\lambda I\) of the identity map are often written short as \(\lambda\).

1.1 Spectrum and resolvent set

Problem: For given \(y\in X\) and \(\lambda\in \mathbb{C}\), we want to find \(x\in\mathcal{D}(T)\) such that \[(T-\lambda I)x=y \quad \text{(inhomogeneous eqn), or} \quad (T-\lambda I)x=0 \quad \text{(homogeneous eqn)}.\]

Solve the system \(\sum_{j=1}^n a_{ij} x_j-\lambda x_i=0\) for \(i=1,\dots,n\). This can be written as an eigenvalue problem of the linear operator \(T=(a_{ij})_{i,j=1}^n\) (matrix) in \(X=\mathbb{C}^n\), with eigenvalue \(\lambda\in\mathbb{C}\) and eigenvector \(x=(x_i)_{i=1}^n\in X\).
For a given continuous function \(k:[0,1]\times [0,1]\to\mathbb{C}\), solve \(\int_0^1 k(s,t) x(t)\,\mathrm{d}t -\lambda x(s)=0\) for (almost) all \(s\in [0,1]\). Here \(T\) is an integral operator on \(X=L^1([0,1])\), acting as \((Tx)(s)=\int_0^1 k(s,t) x(t)\,\mathrm{d}t\) for \(x\in L^1([0,1])\).
For a given continuous function \(q:[0,1]\to\mathbb{C}\) solve \(-x''(t)+q(t)x(t)-\lambda x(t)=0\) for all \(t\in [0,1]\), subject to the boundary conditions \(x(0)=x(1)=0\). In \(X=L^2([0,1])\) the operator \(T\) acting as \((Tx)(t)=-x''(t)+q(t)x(t)\) on \(\mathcal{D}(T)=\{x\in C^2([0,1]):\,x(0)=x(1)=0\}\) is called the Schrödinger operator with potential \(q\). Later on we will define the operator on a larger operator domain containing functions with less regularity but such that one can still define a derivative in some sense.

Define

the resolvent set of \(T\) as \[\rho(T):=\{\lambda\in\mathbb{C}:\,T-\lambda\text{ bijective from }\mathcal{D}(T) \text{ to }X\};\]
the spectrum of \(T\) as \(\sigma(T):=\mathbb{C}\backslash\rho(T)\);
the point spectrum (\(=\)set of eigenvalues) of \(T\) as \[\sigma_p(T):=\{\lambda\in\mathbb{C}:\,T-\lambda\text{ not injective}\};\]
the continuous spectrum of \(T\) as \[\sigma_c(T):=\{\lambda\in\mathbb{C}:\,T-\lambda\text{ injective},\,\overline{\mathcal{R}(T-\lambda)}=X,\,\mathcal{R}(T-\lambda)\neq X\};\]
the residual spectrum of \(T\) as \[\sigma_r(T):=\{\lambda\in\mathbb{C}:\,T-\lambda\text{ injective},\,\overline{\mathcal{R}(T-\lambda)}\neq X\}.\]

The spectrum \(\sigma(T)\) is the disjoint union of \(\sigma_p(T)\), \(\sigma_c(T)\), \(\sigma_r(T)\).
The set \(\mathbb{C}\) is the disjoint union of \(\sigma(T)\) and \(\rho(T)\).
Assume that \(T\in C(X)\). Then the resolvent set \(\rho(T)\) is the set of all \(\lambda\in\mathbb{C}\) for which \((T-\lambda)^{-1}: X\to \mathcal{D}(T)\) is well-defined and a bounded linear operator; we say that \((T-\lambda)\) is boundedly invertible. In fact, if \(T-\lambda:\mathcal{D}(T)\to X\) is bijective, then \((T-\lambda)^{-1}: X\to \mathcal{D}(T)\) is well-defined and \(\mathcal{D}((T-\lambda)^{-1})=X\) is closed, hence the closedness of \(T\) (which implies that \((T-\lambda)^{-1}\) is closed) together with the Closed Graph Theorem imply that \((T-\lambda)^{-1}\) is bounded.
If \(\dim(X)<\infty\), then \(\sigma_c(T)=\sigma_r(T)=\emptyset\) because (recall Linear Algebra!) \(T-\lambda\) is injective if and only if \(T-\lambda\) is surjective.

For \(\lambda\in\rho(T)\) the operator \((T-\lambda)^{-1}\) is called the resolvent of \(T\);
For \(\lambda\in\sigma_p(T)\), \(\mathcal N_{\lambda}(T):=\mathcal{N}(T-\lambda)\) is the geometric eigenspace (space of eigenvectors), and \(\mathcal L_{\lambda}(T):=\{x\in X:,(T-\lambda)^n x=0 \text{ for some }n\in\mathbb{N}\}\) is the algebraic eigenspace, which contains \(\mathcal N_{\lambda}(T)\) but may be strictly bigger.

In \(X=\ell_2(\mathbb{N})\), let \(S\) be the right-shift operator, acting on \(x\in \ell_2(\mathbb{N})\) as \(Sx=S(x_1,x_2,x_3,\dots):=(0,x_1,x_2,x_3,\dots)\). Then \(0\in\sigma_r(S)\) because \(e_1=(1,0,0,\dots)\perp \mathcal{R}(S)\), i.e. \(\overline{\mathcal{R}(S-0)}\neq \ell_2(\mathbb{N})\). Finding \(\sigma(S)\) is quite difficult (try it while revising for the exam!).
In \(X=L^2([0,1])\), let \(T\) be the multiplication operator acting as \((Tx)(t)=tx(t)\) (multiplication with independent variable). We show that \[\rho(T)=\mathbb{C}\backslash [0,1], \quad \sigma(T)=\sigma_c(T)=[0,1], \quad \sigma_p(T)=\sigma_r(T)=\emptyset.\] First, let \(\lambda\in\mathbb{C}\backslash [0,1]\). We show that \((T-\lambda):L^2([0,1])\to L^2([0,1])\) is bijective. For injectivity, assume that \((T-\lambda)x=0\), i.e. \((t-\lambda)x(t)=0\) for almost all \(t\in [0,1]\). Since \(t-\lambda\neq 0\) for all \(x\in[0,1]\), we have \(x(t)=0\) for almost all \(t\in [0,1]\), i.e. \(x=0\) (almost everywhere). To show surjectivity, let \(f\in L^2([0,1])\). Define \(x(t)=(t-\lambda)^{-1}f(t)\). Then \(x\in L^2([0,1])\) since \(\int_0^1|x(t)|^2\,\mathrm{d}t\leq \max_{t\in [0,1]}|t-\lambda|^{-2}\int_0^1|f(t)|^2\,\mathrm{d}t<\infty\).
Now let \(\lambda\in [0,1]\). We show that \(\lambda\in\sigma_c(T)\), i.e. \((T-\lambda)\) is injective, \(\overline{\mathcal{R}(T-\lambda)}=L^2([0,1])\) and \(\mathcal{R}(T-\lambda)\neq L^2([0,1])\). The injectivity follows since \((T-\lambda)x=0\) implies that, for all \(t\in [0,1]\), either \(t-\lambda=0\) or \(x(t)=0\); since the former is true only for \(t=\lambda\), the latter holds for almost all \(t\in [0,1]\), hence \(x=0\) (almost everywhere). To prove \(\overline{\mathcal{R}(T-\lambda)}=L^2([0,1])\), we show equivalently that \(\mathcal{R}(T-\lambda)^\perp =\{0\}\). To this end, let \(g\in \mathcal{R}(T-\lambda)^\perp\). Then, for all \(x\in L^2([0,1])\), \[0=\langle (T-\lambda)x,g\rangle=\int_0^1 (t-\lambda)x(t)\overline{g(t)}\,\mathrm{d}t.\] In particular, for \(x(t):=\overline{(t-\lambda)}g(t)\), we obtain \(0=\int_0^1 |t-\lambda|^2 |x(t)|^2\,\mathrm{d}t=\|(t-\lambda)x\|^2\), hence \((t-\lambda)x=0\), which implies \(x=0\), as in the injectivity proof. Finally, to prove \(\mathcal{R}(T-\lambda)\neq L^2([0,1])\), it satisfies to find one \(f\in L^2([0,1])\) that is not in \(\mathcal{R}(T-\lambda)\); we show this for \(f\equiv 1\). In fact, if there would exist \(x\in L^2([0,1])\) with \((T-\lambda)x=1\), then \(x(t)=1/(t-\lambda)\) for almost all \(t\in [0,1]\). But this function \(x\) does not belong to \(L^2([0,1])\) because the singularity at \(t=\lambda\) is too strong, \(\int_0^1 |t-\lambda|^{-2}\,\mathrm{d}t=\infty\). Thus \(f\equiv 1\) is not in \(\mathcal{R}(T-\lambda)\).

The following result is the operator analogue of a geometric series.

[Neumann series] Let \(T\in B(X)\) with \(\|T\|<1\). Then \((I-T)\) is boundedly invertible, with \[(I-T)^{-1}=\sum_{j=0}^\infty T^j, \quad \|(I-T)^{-1}\| \leq\frac{1}{1-\|T\|}.\]

First we show that the sequence of partial sums \(S_n:=\sum_{j=0}^n T^j\), \(n\in\mathbb{N}\), is convergent in \(B(X)\). To this end, it satisfies to prove that the sequence \((S_n)_n\) is Cauchy: For \(m,n\in\mathbb{N}\) with \(m>n\) we have \[\|S_m-S_n\|=\left\|\sum_{j=n+1}^m T^j\right\|\leq \sum_{j=n+1}^m \|T^j\|\leq \sum_{j=n+1}^{\infty} \|T\|^j.\] This converges to \(0\) as \(n\to\infty\) (geometric series!), which proves that \((S_n)_n\subset B(X)\) is a Cauchy sequence and thus its limit \(S:=\sum_{j=0}^\infty T^j\) is also in \(B(X)\), since \(B(X)\) is a Banach space (see Michaelmas term). Analogous to the previous calculation one obtains \[\|S\|=\lim_{n\to\infty}\|S_n\|\leq \sum_{j=0}^\infty \|T\|^j=\frac{1}{1-\|T\|}.\] Next we prove that \(S(I-T)=I\) and \((I-T)S=I\) in \(X\), which proves that \(I-T\) is bijective and \(S=(I-T)^{-1}\). In fact, for \(x\in X\), \[S(I-T)x=\lim_{n\to\infty}\sum_{j=0}^n T^j(I-T)x=\lim_{n\to\infty}\left(\sum_{j=0}^n T^j x-\sum_{j=0}^n T^{j+1}x\right)=\lim_{n\to\infty}\left(x-T^{n+1}x\right)=x,\] and analogously \((I-T)Sx=x\), which proves the claim.

The previous result has a few very useful consequences, which we prove in the following.

Let \(T\in C(X)\). Then \(\rho(T)\) is an open subset of \(\mathbb{C}\), and \(\sigma(T)\) is a closed subset of \(\mathbb{C}\).

It suffices to prove that \(\rho(T)\) is open, as then its complement (the spectrum) is closed. If \(\rho(T)=\emptyset\), this is open. If \(\rho(T)\neq\emptyset\), let \(\lambda_0\in\rho(T)\) be arbitrary. Then \((T-\lambda_0)\) is boundedly invertible. For any \(\lambda\in\mathbb{C}\): \[T-\lambda=T-\lambda_0-(\lambda-\lambda_0)=(T-\lambda_0)\left(I-(\lambda-\lambda_0)(T-\lambda_0)^{-1}\right).\] We prove that, if \(|\lambda-\lambda_0|<\|(T-\lambda_0)^{-1}\|^{-1}=:\varepsilon\), then the right hand side is boundedly invertible. Indeed, in this case, the bounded operator \(T':=(\lambda-\lambda_0)(T-\lambda_0)^{-1}\) satisfies \(\|T'\|<1\), thus Theorem [thm:NS] (Neumann series) implies that \((I-T')\) is boundedly invertible. Thus \((T-\lambda)\) is boundedly invertible, with \((T-\lambda)^{-1}=(I-T')^{-1}(T-\lambda_0)^{-1}\). Thus the open disc \(B_{\varepsilon}(\lambda_0)=\{\lambda\in\mathbb{C}:\,|\lambda-\lambda_0|<\varepsilon\}\) belongs to \(\rho(T)\), which is therefore open.

In fact, the previous proof gives a lower bound on the operator norm of the resolvent in terms of the distance of the point to the spectrum.

If \(\lambda_0\in\rho(T)\), then \[\|(T-\lambda_0)^{-1}\|\geq\frac{1}{\mathrm{dist}(\lambda_0,\sigma(T))}.\]

In addition, the proof gives a Taylor expansion of the resolvent.

If \(\lambda_0\in\rho(T)\) and \(|\lambda-\lambda_0|< \|(T-\lambda_0)^{-1}\|^{-1}\), then \[(T-\lambda)^{-1}=\left(I-(\lambda-\lambda_0)(T-\lambda_0)^{-1}\right)^{-1}(T-\lambda_0)^{-1} =\sum_{j=0}^\infty(\lambda-\lambda_0)^j (T-\lambda_0)^{-(j+1)}.\]

Let \(T\in B(X)\). Then \(\sigma(T)\) is a compact subset of \(\mathbb{C}\) (closed and bounded). In fact, \[\sigma(T)\subset\{\lambda\in\mathbb{C}:\,|\lambda|\leq \|T\|\}.\]

We already know that the spectrum is closed, so it suffices to prove that is bounded. For \(\lambda\in\mathbb{C}\) with \(|\lambda|>\|T\|\) we can write \[T-\lambda=-\lambda\left(I-\frac{1}{\lambda}T\right).\] Since \(T':=\frac{1}{\lambda}T\) satisfies \(\|T'\|<1\), Theorem [thm:NS] (Neumann series) implies that \((I-T')\) is boundedly invertible, and hence so is \(T-\lambda\). This proves that \(\lambda\in\rho(T)\).

[Spectral mapping theorem for polynomials] Let \(T\in B(X)\), and let \(p:\mathbb{C}\to\mathbb{C}\) be a polynomial. Then \(\sigma(p(T))=p(\sigma(T))\).

Let \(n=\deg p\).
\(\sigma(p(T))\subset p(\sigma(T))\): Let \(\mu\in\sigma(p(T))\). We factorise the polynomial \(p(z)-\mu=\beta(z-z_0)\cdots (z-z_n)\) with \(\beta\in\mathbb{C}\backslash\{0\}\) and \(z_i\in\mathbb{C}\) the zeros. Since \(\mu\in\sigma(p(T))\), there exists \(i_0\in\{0,\dots,n\}\) such that \(z_{i_0}\in\sigma(T)\), since otherwise \(p(T)-\mu=\beta (T-z_0)\cdots (T-z_n)\) would be bijective, which is not possible. Thus \(\mu=p(z_{i_0})\in p(\sigma(T))\).
\(\sigma(p(T))\supset p(\sigma(T))\): Let \(\mu\in p(\sigma(T))\). Then, by definition, there exists \(\lambda\in\sigma(T)\) such that \(\mu=p(\lambda)\). Thus there exists a polynomial \(q\) with \(\deg q= n-1\) and \(p(z)-\mu=(z-\lambda)q(z)\) for \(z\in\mathbb{C}\). But then \(p(T)-\mu=(T-\lambda)q(T)=q(T) (T-\lambda)\). Note that \(\lambda\in\sigma(T)\) implies that \((T-\lambda)\) is not injective or not surjective. But then the previous equalities imply that \(p(T)-\mu\) is not injective or not surjective, respectively. Thus \(\mu\in\sigma(p(T))\).

[First resolvent identity] Let \(\lambda,\mu\in\rho(T)\). Then \[(T-\lambda)^{-1}-(T-\mu)^{-1}=(\lambda-\mu)(T-\lambda)^{-1}(T-\mu)^{-1}.\]

A direct calculation yields \[(T-\lambda)^{-1}-(T-\mu)^{-1}=(T-\lambda)^{-1}\left((T-\mu)-(T-\lambda)\right)(T-\mu)^{-1}=(\lambda-\mu)(T-\lambda)^{-1}(T-\mu)^{-1}.\]

[Second resolvent identity] Let \(\lambda\in\rho(T)\cap\rho(S)\). If \(\mathcal{D}(S)\subset\mathcal{D}(T)\), then \[(T-\lambda)^{-1}-(S-\lambda)^{-1}=(T-\lambda)^{-1}(S-T)(S-\lambda)^{-1}.\]

A direct calculation yields \[(T-\lambda)^{-1}-(S-\lambda)^{-1}=(T-\lambda)^{-1}\left((S-\lambda)-(T-\lambda)\right)(S-\lambda)^{-1}=(T-\lambda)^{-1}(S-T)(S-\lambda)^{-1}.\]

Next we prove an analogue of the Linear Algebra result \(\sigma(A)\neq\emptyset\) for an \(n\times n\) matrix \(A\) (for which the spectrum consists entirely of eigenvalues). Here, instead, we prove the result for bounded linear operators.

Let \(X\neq\{0\}\) and \(T\in B(X)\). Then \(\sigma(T)\neq\emptyset\).

Assume by contradiction that \(\sigma(T)=\emptyset\), i.e. \(\rho(T)=\mathbb{C}\). Then the resolvent \((T-\lambda)^{-1}\in B(X)\) exists for all \(\lambda\in\mathbb{C}\). If \(|\lambda|>\|T\|\), then Theorem [thm:NS] (Neumann series) implies that \[\|(T-\lambda)^{-1}\|=\left\|-\frac{1}{\lambda}\left(I-\frac{1}{\lambda}T\right)^{-1}\right\|\leq \frac{1}{|\lambda|}\frac{1}{1-\|T\|/|\lambda|}=\frac{1}{|\lambda|-\|T\|}.\] We see that the right hand side converges to zero as \(|\lambda|\to \infty\). Therefore \(\sup_{\lambda\in\mathbb{C}}\|(T-\lambda)^{-1}\|<\infty\). Take an arbitrary bounded linear functional \(f\in X^*\) and an arbitrary \(x\in X\). Define \(F:\mathbb{C}\to\mathbb{C}\) by \(F(\lambda):=f((T-\lambda)^{-1}x)\). Then Corollary [cor:Taylor] implies that \(F\) is holomorphic (analytic) in all of \(\mathbb{C}\). In addition, since \(|F(\lambda)|\leq \|f\|\sup_{\lambda\in\mathbb{C}}\|(T-\lambda)^{-1}\|\|x\|<\infty\), the function is bounded in \(\mathbb{C}\). Now Liouville’s Theorem from Complex Analysis implies that \(F\) needs to be constant. However, since \(\lim_{|\lambda|\to\infty}F(\lambda)=0\), this constant is zero. Therefore \(F(\lambda)=f((T-\lambda)^{-1}x)=0\) for all \(f\in X^*\). By a Corollary of the Hahn–Banach Theorem (see Michaelmas term), \[\|y\|=\sup_{0\neq f\in X^*}\frac{|f(y)|}{\|f\|},\] thus if \(f(y)=0\) for all \(f\in X^*\), then \(y=0\). This implies that here \((T-\lambda)^{-1}x=0\), and this is true for all \(x\in X\). The obtained contradiction implies that the assumption \(\sigma(T)=\emptyset\) was wrong.

If \(T\) is unbounded, then \(\sigma(T)=\emptyset\) or \(\sigma(T)=\mathbb{C}\) are possible, see the following examples.

Let \(X=C([0,1])\), the space of continuous functions on \([0,1]\), equipped with the norm \(\|x\|_{\infty}=\sup_{t\in [0,1]}|x(t)|\). Consider the differential operator \(T\) on \(\mathcal{D}(T)=\{x\in C([0,1]):\,x'\in C([0,1])\}=C^1([0,1])\), acting as \((Tx)(t)=x'(t)\). Then \(\sigma(T)=\sigma_p(T)=\mathbb{C}\) since for every \(\lambda\in\mathbb{C}\), the equation \((T-\lambda)x=x'-\lambda x=0\) has the solution \(x(t)=C\mathrm{e}^{\lambda t}\) for a constant \(C\in\mathbb{C}\). Obviously this \(x\) is in \(\mathcal{D}(T)\), so it is an eigenfunction and thus \(\lambda\in\sigma_p(T)\).
Let \(X=\{x\in C([0,1]):\,x(0)=0\}\) (check that this is a Banach space equipped with \(\|\cdot\|_\infty\)). Consider the differential operator \(T\) on \(\mathcal{D}(T)=\{x\in C^1([0,1]):\,x(0)=0,\,x'(0)=0\}\), acting as \((Tx)(t)=x'(t)\). We show that \(\sigma(T)=\emptyset\): To prove injectivity of \((T-\lambda)\), we see that the solution of \((T-\lambda)x=0\) is \(x(t)=C \mathrm{e}^{\lambda t}\) (see part (1)), but the boundary condition \(x(0)=0\) implies \(x=0\). To prove surjectivity of \((T-\lambda)\), let \(f\in X\). Then \((T-\lambda)x=f\) is satisfied for \(x(t)=\mathrm{e}^{\lambda t}\int_0^t \mathrm{e}^{-\lambda s}f(s)\,\mathrm{d}s\), which is in \(\mathcal{D}(T)\).

For \(T\in B(X)\) the spectral radius of \(T\) is \(r(T):=\max\{|\lambda|:\,\lambda\in\sigma(T)\}\).

We know that the maximum is attained because \(\lambda\mapsto |\lambda|\) is continuous and \(\sigma(T)\) is compact for bounded operators.
Since \(\sigma(T)\subset\{\lambda\in\mathbb{C}:\,|\lambda|\leq\|T\|\}\) we have \(r(T)\leq \|T\|\).

There is a useful relation between the spectral radius and the norm of powers of the operator.

If \(T\in B(X)\), then \(r(T)=\lim_{n\to\infty}\|T^n\|^{1/n}\).

By Theorem [thm:mapping] (Spectral mapping theorem for polynomials), we have \(\sigma(T^n)=[\sigma(T)]^n\), which implies \(r(T^n)=r(T)^n\). Remark [rem:specrad] part (2) applied to \(T^n\) implies \(r(T^n)\leq \|T^n\|\). Altogether, \(r(T)=r(T^n)^{1/n}\leq \|T^n\|^{1/n}\) for all \(n\in\mathbb{N}\), hence \[r(T)\leq \liminf_{n\to\infty}\|T^n\|^{1/n}\leq \limsup_{n\to\infty}\|T^n\|^{1/n}.\] It remains to prove \(\limsup_{n\to\infty}\|T^n\|^{1/n}\leq r(T)\). To this end, recall that a complex power series \(\sum_{n=1}^\infty c_n z^n\) converges absolutely for \(|z|<R\) with radius of convergence given by \(R=(\limsup_{n\to\infty} |c_n|^{1/n})^{-1}\). The same is true for a sequence of bounded linear operators [this is too advanced to prove here]: If \(A_n\in B(X)\), \(n\in\mathbb{N}\), then \(\sum_{n=1}^\infty A_n z^n\) converges in norm (i.e. defines a limit in \(B(X)\)) for \(|z|<R\) with radius of convergence given by \(R=(\limsup_{n\to\infty} \|A_n\|^{1/n})^{-1}\). We know that \(\lambda\mapsto (T-\lambda)^{-1}\) is analytic in \(\rho(T)\) and thus in \(\{\lambda\in\mathbb{C}:\,|\lambda|>r(T)\}\subset\rho(T)\). Write \(z=1/\lambda\). Then \(z\mapsto (T-1/z)^{-1}\) is analytic in the open, connected set (a punctured disc) \[\Omega:=\left\{z\in\mathbb{C}:\,0<|z|<\frac{1}{r(T)}\right\}.\] Thus [again a bit advanced] it has a unique Laurent series in \(\Omega\) about the singularity \(z=0\): \[\forall z\in\Omega:\,(T-1/z)^{-1}=\sum_{j\in\mathbb{Z}}A_j z^j,\] for some \(A_j\in B(X)\), \(j\in\mathbb{Z}\). For \(|z|<1/\|T\|\) we have, by Theorem [thm:NS] (Neumann series), \[(T-1/z)^{-1}=-z(I-zT)^{-1}=-z\sum_{n=0}^\infty T^n z^n,\] which is a Taylor series (and thus a Laurent series). By the uniqueness of the Laurent series, we have \[\forall\,z\in\Omega:\,(T-1/z)^{-1}=-z\sum_{n=0}^\infty T^n z^n.\] The convergence radius of this series is \[R=\frac{1}{\limsup_{n\to\infty}\|T^n\|^{1/n}}.\] Because the series is convergent in \(\Omega\), we have \(1/r(T)\leq R\). Thus \(r(T)\geq 1/R=\limsup_{n\to\infty}\|T^n\|1/n\).

1.2 Adjoint operators

In Linear Algebra, the adjoint of an \(n\times n\) matrix \(A=(a_{i,j})_{i,j=1}^n\) is its conjugate transpose matrix \(A^*=(\overline{a_{ji}})_{i,j=1}^n\). It satisfies \(\langle A x,y\rangle=\langle x, A^*y\rangle\) where here \(\mathbb{C}^n\) is equipped with the scalar product \(\langle x,y\rangle=\sum_{i=1}^n x_i\overline{y_i}\). Now we want to introduce the adjoint of a linear operator between two Hilbert spaces. To this end, let \((H_1,\langle\cdot,\cdot\rangle_1)\), \((H_2,\langle\cdot,\cdot\rangle_2)\) be two Hilbert spaces.

Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) be densely defined, i.e. \(\overline{\mathcal{D}(T)}=H_1\). Define \[\mathcal{D}(T^*):=\{y\in H_2:\,f_y(x):=\langle Tx,y\rangle_2 \text{ defines a bounded (=continuous) linear functional on }\mathcal{D}(T)\}.\] Since \(\overline{\mathcal{D}(T)}=H_1\), \(f_y\) can be extended to a bounded linear functional on \(H_1\), i.e. \(f_y\in H_1^*\). By the Riesz’s Representation Theorem (see Michaelmas term), there exists a unique \(y^*\in H_1\) such that \(f_y(x)=\langle x,y^*\rangle_1\) for all \(x\in H_1\). Define \(T^*:\mathcal{D}(T^*)\subset H_2\to H_1\) by \(T^*y:=y^*\); this operator is called the (Hilbert space) adjoint operator of \(T\).

It is easy to check that \(T^*\) defines a linear operator.
We have \(\langle Tx,y\rangle_2=\langle x,T^*y\rangle_1\) for all \(x\in\mathcal{D}(T)\) and \(y\in\mathcal{D}(T^*)\). For a fixed \(y\), one finds \(T^*y\) by solving \(\langle Tx,y\rangle_2=\langle x,T^*y\rangle_1\) for all \(x\in\mathcal{D}(T)\).

If \(T\in B(H_1,H_2)\), then \(T^*\in B(H_2,H_1)\) and \(\|T^*\|=\|T\|\).

First we show that \(\mathcal{D}(T^*)=H_2\). Indeed, for every \(y\in H_2\), \(f_y(x):=\langle Tx,y\rangle_2\) defines a bounded linear functional on \(H_1\), since \(|f_y(x)|\leq \|T x\|_2 \|y\|_2\leq \|T\| \|y\|_2 \|x\|_1\) for all \(x\in H_1\), whence \(\|f_y\|\leq \|T\| \|y\|_2\). Note that by Riesz’s Representation Theorem (see Michaelmas term), \(T^*y=y^*\) satisfies \(\|T^*y\|_1=\|f_y\|\), thus the previous inequality implies \(\|T^*y\|_1\leq \|T\| \|y\|_2\). Therefore we obtain \(\|T^*\|\leq \|T\|\).
To prove the reverse inequality (and thus equality), let \(x\in H_1\), \(x\neq 0\). By a Corollary to the Hahn–Banach Theorem (see Michaelmas term), there exists a bounded linear functional \(f\in H_2^*\) with \(\|f\|=1\) and \(f(Tx)=\|Tx\|_2\). By Riesz’s Representation Theorem (see Michaelmas term), there exists \(y\in H_2\) with \(\|y\|_2=1\) and \(f(u)=\langle u,y\rangle_2\) for all \(u\in H_2\). In particular for \(u=Tx\), we obtain \[\|Tx\|_2=|f(Tx)|=|\langle Tx,y\rangle_2|=|\langle x,T^*y\rangle_1|\leq \|x\|_1 \|T^*\| \|y\|_2.\] With \(\|y\|_2=1\) we arrive at \(\|T\|\leq \|T^*\|\).

Let \(H_1=H_2=L^2([0,1])\), and let \(k\in L^{\infty}([0,1]\times [0,1])\). Define the bounded linear operator \(T:L^2([0,1])\to L^2([0,1])\) by \[(Tx)(t):=\int_0^1 k(t,s) x(s)\,\mathrm{d}s, \quad t\in [0,1].\] Let \(y\in L^2([0,1])\). Then \[\langle Tx,y\rangle=\int_0^1\left(\int_0^1 k(t,s) x(s)\,\mathrm{d}s\right) \overline{y(t)}\,\mathrm{d}t =\int_0^1 x(s) \overline{\left(\int_0^1 \overline{k(t,s)} y(t)\,\mathrm{d}t\right)}\,\mathrm{d}s,\] where we changed the order of integration in the last equality. The right hand side shall be equal to \(\langle x,T^*y\rangle=\int_0^1 x(s) \overline{(T^* y)(s)}\,\mathrm{d}s\), so we can read off that \[(T^*y)(s)=\int_0^1 \overline{k(t,s)} y(t)\,\mathrm{d}t, \quad s\in [0,1].\]
In \(H_1=H_2=\ell_2(\mathbb{N})\) consider the right-shift operator \(S\), see Example [ex:shift]. It acts as \(Sx=S(x_1,x_2,x_3,\dots):=(0,x_1,x_2,x_3,\dots)\). Let \(y\in \ell_2(\mathbb{N})\). Then \[\langle Sx,y\rangle=\sum_{i=1}^\infty x_i \overline{y_{i+1}},\] which needs to be equal to \(\langle x,S^*y\rangle\) for all \(x\in \ell_2\). Taking \(x=e_j\), we see that \(y_{j+1}=(S^*y)_j\) for every \(j\in\mathbb{N}\), hence \(S^*y=S^*(y_1,y_2,y_3,\dots)=(y_2,y_3,\dots)\). The operator \(S^*\) is the so-called left-shift operator in \(\ell_2(\mathbb{N})\).

Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) and \(\tilde T:\mathcal{D}(\tilde T)\subset H_1\to H_2\) be two linear operators. If \(\mathcal{D}(T)\subset \mathcal{D}(\tilde T)\) and \(Tx=\tilde Tx\) for all \(x\in\mathcal{D}(T)\), then \(\tilde T\) is an extension of \(T\), and \(T\) is a restriction of \(\tilde T\). We write \(T\subset \tilde T\).
A linear operator \(T:\mathcal{D}(T)\subset H_1\to H_2\) is closable if its graph \(G(T)=\{(x,Tx)\in H_1\times H_2:\,x\in\mathcal{D}(T)\}\) satisfies that its closure \(\overline{G(T)}\subset H_1\times H_2\) is the graph of an extension of \(T\). This extension is called the closure of \(T\), denoted by \(\overline{T}\); so we have \(\overline{G(T)}=G(\overline{T})\).

The proof of the following result is quite technical so we omit it.

Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) be densely defined (i.e. \(\overline{\mathcal{D}(T)}=H_1\)).

The adjoint operator \(T^*\) is closed.
If \(T\) is closable, then \(T^*\) is densely defined (i.e. \(\overline{\mathcal{D}(T^*)}=H_2\)), and \(T^{**}=\overline{T}\). In particular, if \(T\) is closed, then \(T^{**}=T\).

Let \(H_1,H_2,H_3\) be Hilbert spaces.

Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) and \(\tilde T:\mathcal{D}(\tilde T)\subset H_1\to H_2\) be two densely defined linear operators. If \(T\subset \tilde T\), then \(\tilde{T}^*\subset T^*\).
Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) and \(S:\mathcal{D}(S)\subset H_1\to H_2\) be two densely defined linear operators. If \(T+S\) is densely defined (i.e. \(\overline{\mathcal{D}(T)\cap\mathcal{D}(S)}=H_1\)), then \(T^*+S^*\subset (T+S)^*\).
Let \(T:\mathcal{D}(T)\subset H_1\to H_2\) and \(S:\mathcal{D}(S)\subset H_2\to H_3\) be two densely defined linear operators. If \(ST\) is densely defined (i.e. \(\overline{\{x\in\mathcal{D}(T):\,Tx\in\mathcal{D}(S)\}}=H_1\)), then \(T^*S^*\subset (ST)^*\).

If \(S\) is bounded, then we have equality in (2) and (3).

This follows from the definitions.
Let \(y\in\mathcal{D}(T^*+S^*)=\mathcal{D}(T^*)\cap\mathcal{D}(S^*)\). Then \[f(x):=\langle (T+S)x,y\rangle=\langle Tx,y\rangle+\langle Sx,y\rangle=\langle x, (T^*+S^*)y\rangle\] defines a bounded linear functional on \(\mathcal{D}(T+S)=\mathcal{D}(T)\cap\mathcal{D}(S)\). Thus \(y\in\mathcal{D}((T+S)^*y)\), and \((T+S)^*y=T^*y+S^*y\).
Let \(y\in\mathcal{D}(T^*S^*)=\{y\in\mathcal{D}(S^*):\,S^*y\in\mathcal{D}(T^*)\}\). Then \[f(x):=\langle STx,y\rangle=\langle Tx,S^*y\rangle=\langle x, T^*S^*y\rangle\] defines a bounded linear functional on \(\mathcal{D}(ST)\). Thus \(y\in\mathcal{D}((ST)^*)\), and \((ST)^*y=T^*S^*y\).

If \(S\) is bounded and densely defined, its closure \(\overline{S}\) is defined in the whole Hilbert space, and then part (1) and Proposition [prop:adjbdd] imply that \(S^*=\overline{S}^*\) is defined in the whole Hilbert space. In this case, in (2) we have \(\mathcal{D}((T+S)^*)=\mathcal{D}(T^*)=\mathcal{D}(T^*+S^*)\), and in (3) we have \(\mathcal{D}((ST)^*)=\{y\in H_3:\,S^*y\in\mathcal{D}(T^*)\}=\mathcal{D}(T^*S^*)\).

Let \(T\in C(H_1,H_2)\) be densely defined.

\(T\) is boundedly invertible if and only if \(T^*\) is boundedly invertible; in this case, \((T^*)^{-1}=(T^{-1})^*\).
If \(H_1=H_2\), then \(\sigma(T^*)=\{\overline{\lambda}:\,\lambda\in\sigma(T)\}\).

Assume that \(T\) is boundedly invertible. With \[(T^{-1})^*T^*\subset (TT^{-1})^*= I_{H_2}, \quad T^*(T^{-1})^*=(T^{-1}T)^*=I_{H_2},\] we obtain that \(T^*\) is bijective, hence boundedly invertible, with \((T^*)^{-1}=(T^{-1})^*\). Vice versa, if \(T^*\) is boundedly invertible, change the roles of \(T\), \(T^*\) in the previous proof and use that \(T^{**}=T\) since \(T\) is closed.
This follows from part (1), which implies \(((T-\lambda)^{-1})^*=(T^*-\overline{\lambda})^{-1}\) for \(\lambda\in\rho(T)\).

Let \(T\in C(H_1,H_2)\) be densely defined. Then

\(\mathcal{R}(T)^{\perp}=\mathcal{N}(T^*)\);
\(\overline{\mathcal{R}(T)}=\mathcal{N}(T^*)^\perp\);
\(\mathcal{R}(T^*)^{\perp}= \mathcal{N}(T)\);
\(\overline{\mathcal{R}(T^*)}=\mathcal{N}(T)^\perp\).

It suffices to prove (1); then one obtains (2) by taking orthogonal complements on both sides of (1) and using \(M^{\perp\perp}=\overline{M}\), and (3), (4) are analogous to (1), (2) with \(T,T^*\) replaced by \(T^*, T^{**}=T\).
To prove (1), note that \(y\in\mathcal{N}(T^*)\) if and only if \(y\in\mathcal{D}(T^*)\) and \(T^*y=0\), which is equivalent to \(\langle Tx,y\rangle =\langle x, T^* y\rangle =0\) for all \(x\in\mathcal{D}(T)\), but this just means that \(y\in \mathcal{R}(T)^\perp\).

1.3 Symmetric and selfadjoint operators

In the following we consider a Hilbert space \((H,\langle\cdot,\cdot\rangle)\) and a densely defined linear operator \(T:\mathcal{D}(T)\subset H\to H\).

The operator \(T\) is called

symmetric if \(T\subset T^*\);
selfadjoint if \(T=T^*\);
essentially selfadjoint if \(\overline{T}\) is selfadjoint.

[Hellinger–Toeplitz] Let \(T\) be symmetric with \(\mathcal{D}(T)=H\). Then \(T\) is selfadjoint and bounded.

By the assumptions, \(T\subset T^*\). Since \(\mathcal{D}(T)=H\) we have \(H=\mathcal{D}(T)\subset\mathcal{D}(T^*)\subset H\), so the sets have to be identical, which implies \(T=T^*\). By Theorem [thm:adjclosed] part (1), each adjoint operator is closed, and hence so is \(T=T^*\). Together with the Closed Graph Theorem and \(\mathcal{D}(T)=H\), we conclude that \(T\) is bounded.

The following are equivalent:

\(T\) is symmetric;
\(\langle Tx,y\rangle=\langle x,Ty\rangle\) for all \(x,y\in\mathcal{D}(T)\);
\(\langle Tx,x\rangle\in\mathbb{R}\) for all \(x\in\mathcal{D}(T)\).

(1) \(\Longrightarrow\) (2): This follows immediately from the definitions of adjoint operator and symmetric operator.
(2) \(\Longrightarrow\) (1): Show that \(T\subset T^*\). To this end, let \(y\in\mathcal{D}(T)\). Then the functional \(f(x):=\langle Tx,y\rangle=\langle x,Ty\rangle\), \(x\in\mathcal{D}(T)\), is bounded since \(|f(x)|\leq \|Ty\| \|x\|\). Thus \(y\in\mathcal{D}(T^*)\). In addition, we read off that \(T^*y=Ty\).
(2) \(\Longrightarrow\) (3): From \(\langle Tx,x\rangle=\langle x,Tx\rangle=\overline{\langle Tx,x\rangle}\) we conclude that \(\langle Tx,x\rangle\in\mathbb{R}\)..
(3) \(\Longrightarrow\) (2): Let \(x,y\in\mathcal{D}(T)\), and let \(\alpha\in\mathbb{C}\). Since the real numbers \(\langle T (x+\alpha y),(x+\alpha y)\rangle\), \(\langle Tx,x\rangle\), \(\langle Ty,y\rangle\) are equal to their complex conjugates, we have \[\begin{aligned} & \langle Tx,x\rangle+\alpha\langle Ty,x\rangle+\overline{\alpha}\langle Tx,y\rangle +|\alpha|^2\langle Ty,y\rangle =\langle T (x+\alpha y),(x+\alpha y)\rangle\\ &=\overline{\langle T (x+\alpha y),(x+\alpha y)\rangle}=\langle (x+\alpha y),T(x+\alpha y)\rangle\\ &=\langle x,Tx\rangle+\alpha\langle y,Tx\rangle+\overline{\alpha}\langle x,Ty\rangle +|\alpha|^2\langle y,Ty\rangle\\ &=\langle Tx,x\rangle+\alpha\langle y,Tx\rangle+\overline{\alpha}\langle x,Ty\rangle +|\alpha|^2\langle Ty,y\rangle.\end{aligned}\] Comparing the first and last lines, we obtain \[C(\alpha):=\alpha\langle Ty,x\rangle+\overline{\alpha}\langle Tx,y\rangle =\alpha\langle y,Tx\rangle+\overline{\alpha}\langle x,Ty\rangle=:\tilde C(\alpha).\] Thus \(C(1)+C(\mathrm{i})/\mathrm{i}=\tilde C(1)+\tilde C(\mathrm{i})/\mathrm{i}\), i.e. \(\langle Ty,x\rangle=\langle y,Tx\rangle\).

In \(H=L^2([0,1])\) let \(\mathcal{D}(T)=\{x\in C^2([0,1]):\,x(0)=x(1)=0\}\) and \(Tx=-x''\). Then, for \(x\in\mathcal{D}(T)\), \[\langle Tx,x\rangle=\int_0^1(-x''(t))\overline{x(t)}\,\mathrm{d}t=[-x'(t)\overline{x(t)}]_{t=0}^{t=1}+\int_0^1 |x'(t)|^2\,\mathrm{d}t.\] The boundary conditions \(x(0)=x(1)=0\) imply that \([-x'(t)\overline{x(t)}]_{t=0}^{t=1}=0\), thus \(\langle Tx,x\rangle\in\mathbb{R}\). Thus \(T\) is symmetric.

Let \(T\) be symmetric. Then

\(\sigma_p(T)\subset\mathbb{R}\);
For any \(\lambda\in\sigma_p(T)\), the geometric and algebraic eigenspaces agree, \(\mathcal{N}_\lambda(T)=\mathcal L_{\lambda}(T)\);
For any eigenvalues \(\lambda,\mu\in\sigma_p(T)\) with \(\lambda\neq\mu\), the geometric eigenspaces are orthogonal, \(\mathcal{N}_\lambda(T)\perp \mathcal{N}_\mu(T)\).

For any \(\lambda\in\sigma_p(T)\) and corresponding eigenvector \(0\neq x\in\mathcal{D}(T)\): \(Tx=\lambda x\). Thus \(\lambda\|x\|^2=\langle Tx,x\rangle\in\mathbb{R}\) since \(T\) is symmetric.
In general, \(\mathcal{N}_{\lambda}(T)\subset\mathcal L_\lambda(T)\), so it suffices to prove the reverse inclusion. Assume by contradition that there exists \(x\in\mathcal L_\lambda(T)\backslash\mathcal{N}_\lambda(T)\). Then there exists \(n\geq 2\) such that \((T-\lambda)^nx=0\) and \((T-\lambda)^{n-1}x\neq 0\). But then \[\|(T-\lambda)^{n-1}x\|^2=\langle (T-\lambda)^{n-1}x,(T-\lambda)^{n-1}x\rangle =\langle (T-\lambda)^{n}x,(T-\lambda)^{n-2}x\rangle=0,\] where we used that \(\langle u, (T-\lambda)v\rangle=\langle u,Tv\rangle-\lambda\langle u,v\rangle=\langle Tu,v\rangle-\lambda\langle u,v\rangle=\langle (T-\lambda)u,v\rangle\) since \(T\) is symmetric and \(\lambda\in\mathbb{R}\) by part (1). But now \(\|(T-\lambda)^{n-1}x\|^2=0\) implies \((T-\lambda)^{n-1}x=0\), a contradiction.
Without loss of generality assume that \(\lambda\neq 0\) (otherwise assume \(\mu\neq 0\) and proceed analogously). With eigenvectors \(x\in\mathcal{N}_\lambda(T)\), \(y\in\mathcal{N}_\mu(T)\) we have \(Tx=\lambda x\), \(Ty=\mu y\), thus \[\langle x,y\rangle=\frac{1}{\lambda}\langle Tx,y\rangle=\frac{1}{\lambda} \langle x,Ty\rangle=\frac{\mu}{\lambda}\langle x,y\rangle.\] This implies \(1=\mu/\lambda\) (wrong!) or \(\langle x,y\rangle=0\). Thus \(x\perp y\).

Let \(T\) be symmetric and let \(\lambda\in\mathbb{C}\backslash\mathbb{R}\). Then

\(\|(T-\lambda)x\|\geq |\mathrm{Im}(\lambda)|\|x\|\) for all \(x\in\mathcal{D}(T)\);
If \(T\in C(H)\), then \(\mathcal{R}(T-\lambda)\) is closed.

Let \(x\in\mathcal{D}(T)\). Then \[|\mathrm{Im}\langle (T-\lambda)x,x\rangle|\leq |\langle (T-\lambda)x,x\rangle|\leq \|(T-\lambda)x\| \|x\|\] by the Cauchy-Schwarz inequality. On the other hand, \[\mathrm{Im}\langle (T-\lambda) x,x\rangle=\mathrm{Im}\left(\langle Tx,x\rangle-\lambda\|x\|^2\right)=-\mathrm{Im}(\lambda)\|x\|^2\] since \(\langle Tx,x\rangle\in\mathbb{R}\). Thus \(|\mathrm{Im}(\lambda)|\|x\|^2\leq \|(T-\lambda)x\| \|x\|\) which implies the claim.
Since \(\sigma_p(T)\subset\mathbb{R}\) and \(\lambda\notin\mathbb{R}\), the operator \((T-\lambda):\mathcal{D}(T)\to\mathcal{R}(T-\lambda)\) is bijective, and its inverse \((T-\lambda)^{-1}:\mathcal{R}(T-\lambda)\to\mathcal{D}(T)\) is bounded since by part (1), \[\sup_{0\neq y\in\mathcal{R}(T-\lambda)}\frac{\|(T-\lambda)^{-1}y\|}{\|y\|}=\sup_{0\neq x\in\mathcal{D}(T)}\frac{\|x\|}{\|(T-\lambda)x\|}\leq \frac{1}{|\mathrm{Im}(\lambda)|}<\infty.\] Since \(T\) is assumed to be closed, so is \(T-\lambda\) and hence also \((T-\lambda)^{-1}\). By the Closed Graph Theorem a closed and bounded operator has closed domain, which is here \(\mathcal{D}((T-\lambda)^{-1})=\mathcal{R}(T-\lambda)\).

Let \(T\) be symmetric. Then the following are equivalent:

\(T\) is selfadjoint;
\(\mathcal{R}(T-\lambda)=H\) for all \(\lambda\in\mathbb{C}\backslash\mathbb{R}\);
there exist \(\lambda_{\pm}\in\mathbb{C}\) with \(\mathrm{Im}(\lambda_+)> 0\), \(\mathrm{Im}(\lambda_-)< 0\) and \(\mathcal{R}(T-\lambda_\pm)=H\);
\(\sigma(T)\subset\mathbb{R}\).

(1) \(\Longrightarrow\) (2): Assume there exists \(\lambda\in\mathbb{C}\backslash\mathbb{R}\) with \(\mathcal{R}(T-\lambda)\neq H\). Since \(T=T^*\) is closed, Theorem [thm:symmclosed] part (2) implies that \(\mathcal{R}(T-\lambda)\) is closed. By Theorem [thm:kerran] and the assumption \(T^*=T\), \(\mathcal{N}(T-\overline{\lambda})=\mathcal{R}(T-\lambda)^\perp \neq \{0\}\). Therefore \(\overline{\lambda}\in\sigma_p(T)\), which is a contradiction to \(\sigma_p(T)\subset\mathbb{R}\).
(2) \(\Longrightarrow\) (1): Since we already know that \(T\subset T^*\), it suffices to show that \(\mathcal{D}(T^*)\subset\mathcal{D}(T)\). Let \(\lambda\in\mathbb{C}\backslash\mathbb{R}\). Then \(\lambda\notin\sigma_p(T)\), so \(T-\lambda:\mathcal{D}(T)\to H\) is injective, and by assumption (2) also surjective. This implies \(\lambda,\overline{\lambda}\in\rho(T)\). Now let \(y\in\mathcal{D}(T^*)\) and define \(x:=(T-\lambda)^{-1}(T^*-\lambda)y\in\mathcal{D}(T)\subset\mathcal{D}(T^*)\). With \[(T^*-\lambda)(y-x)=\underbrace{(T^*-\lambda)y}_{=(T-\lambda)x}-\underbrace{(T^*-\lambda)x}_{(T-\lambda)x}=0,\] we obtain \(y -x\in\mathcal{N}(T^*-\lambda)=\mathcal{R}(T-\overline{\lambda})^\perp=\{0\}\), thus \(y=x\in\mathcal{D}(T)\).
(2) \(\Longrightarrow\) (3): This is trivially satisfied.
(3) \(\Longrightarrow\) (4): We already know that \(\sigma_p(T)\subset\mathbb{R}\). Thus \(T-\lambda_\pm:\mathcal{D}(T)\to H\) is injective, and also surjective by (3); thus \(\lambda_{\pm}\in\rho(T)\). From Corollary [cor:Taylor] we know that if \(\lambda_0\in\rho(T)\), then \(\{\lambda\in\mathbb{C}:\,|\lambda-\lambda_0|< \|(T-\lambda_0)^{-1}\|^{-1}\}\subset\rho(T)\). Note that here, analogously as in the proof of Theorem [thm:symmclosed] part (2), we know that every non-real \(\lambda_0\in\rho(T)\) satisfies \(\|(T-\lambda_0)^{-1}\|\leq 1/|\mathrm{Im}(\lambda_0)|\), which implies \(\{\lambda\in\mathbb{C}:\,|\lambda-\lambda_0|< |\mathrm{Im}(\lambda_0)|\}\subset\rho(T)\). Apply this to \(\lambda_0=\lambda_\pm\) and iterate to obtain that all of \(\mathbb{C}\backslash\mathbb{R}\) is in \(\rho(T)\).
(4) \(\Longrightarrow\) (2): This is clear.

Let \(T\) be selfadjoint. Then \(\sigma_r(T)=\emptyset\).

Let \(\lambda\in\sigma_r(T)\). Then \(\lambda\in\sigma(T)\subset\mathbb{R}\). But then Theorem [thm:kerran] implies \[\mathcal{N}(T-\lambda)=\mathcal{R}(T-\lambda)^\perp=\overline{\mathcal{R}(T-\lambda)}^\perp\neq\{0\}.\] But then \(\lambda\in\sigma_p(T)\), which is a contradiction to \(\sigma_p(T)\cap\sigma_r(T)=\emptyset\). Thus \(\sigma_r(T)=\emptyset\).

In the proof of the following result we use the following consequence of the Hahn–Banach Theorem (see Michaelmas term) and Riesz’s Representation Theorem (see Michaelmas term): If \(T\in B(H)\), then \[\begin{equation} \label{eq:HB} \|T\|=\sup_{x,y\in H \atop \|x\|=\|y\|=1} |\langle Tx,y\rangle|. \end{equation}\]

If \(T\in B(H)\) is selfadjoint, then \(\|T\|=\sup\limits_{x\in H \atop \|x\|=1}|\langle Tx,x\rangle|\).

Let \(M:=\sup\limits_{x\in H \atop \|x\|=1}|\langle Tx,x\rangle|\). Then \(M\leq \sup\limits_{x,y\in H \atop \|x\|=\|y\|=1} |\langle Tx,y\rangle|=\|T\|\) by \(\eqref{eq:HB}\). It remains to prove \(\|T\|\leq M\). For two arbitrary \(x,y\in H\) with \(\|x\|=\|y\|=1\) choose \(\theta(x,y)\in [0,2\pi)\) such that \(\langle Tx,y\rangle=\mathrm{e}^{\mathrm{i}\theta(x,y)}|\langle Tx,y\rangle|\). Then \[|\langle Tx,y\rangle|=\mathrm{e}^{-\mathrm{i}\theta(x,y)}\langle Tx,y\rangle=\langle T(\mathrm{e}^{-\mathrm{i}\theta(x,y)}x),y\rangle.\] Taking real parts on boths sides, we get \(|\langle Tx,y\rangle|= \mathrm{Re}\langle T(\mathrm{e}^{-\mathrm{i}\theta(x,y)}x),y\rangle\). Note that \(\|\mathrm{e}^{-\mathrm{i}\theta(x,y)}x\|=\|x\|=1\). It suffices to prove that, for any \(w,y\in H\) with \(\|w\|=\|y\|=1\) we have \(\mathrm{Re}\langle Tw,y\rangle\leq M\); then, by \(\eqref{eq:HB}\), \[\|T\|=\sup_{x,y\in H \atop \|x\|=\|y\|=1} |\langle Tx,y\rangle|= \sup_{x,y\in H \atop \|x\|=\|y\|=1} \mathrm{Re}\langle T(\mathrm{e}^{-\mathrm{i}\theta(x,y)}x),y\rangle \leq M.\] Let \(w,y\in H\) with \(\|w\|=\|y\|=1\). First note that, for every \(u\in H\), we have \[\begin{equation} \label{eq:M} |\langle Tu,u\rangle|\leq M\|u\|^2 \end{equation}\] (since the normalised element \(x=u/\|u\|\) satisfies \(|\langle Tx,x\rangle|\leq M\)). Thus, with \(\langle Ty,w\rangle=\langle y,Tw\rangle\), \[\langle T(w\pm y),w\pm y\rangle=\langle Tw,w\rangle+\langle Ty,y\rangle\pm\underbrace{\left(\langle Tw,y\rangle+\langle y,Tw\rangle\right)}_{=2\mathrm{Re}\langle Tw,y\rangle}.\] This implies \[\langle T(w+y),w+y\rangle-\langle T(w-y),w-y\rangle=4\mathrm{Re}\langle Tw,y\rangle,\] whence, with \(\eqref{eq:M}\), \[\begin{aligned} 4 \mathrm{Re}\langle Tw,y\rangle&\leq |\langle T(w+y),w+y\rangle|+|\langle T(w-y),w-y\rangle| \\ &\leq M\left(\|w+y\|^2+\|w-y\|^2\right)=2M(\|w\|^2+\|y\|^2)=4M.\end{aligned}\] Therefore we arrive at the claimed inequality \(\mathrm{Re}\langle Tw,y\rangle\leq M\), which concludes the proof.

1.4 Compact operators

We recall the definition of a compact set in a metric space.

Let \((X,d)\) be a metric space. A subset \(K\subset X\) is called

compact if every open cover of \(K\) (i.e. \(K\subset \cup_{i\in I}U_i\) for an index set \(I\) and open set \(U_i\subset X\)) has a finite subcover (i.e. \(K\subset \cup_{j=1}^n U_{i_j}\) for some \(i_1,i_2,\dots,i_n\subset I\)).
relevatively compact if the closure \(\overline{K}\) is compact.

Equivalently, \(K\) is compact if every sequence \((x_n)_{n=1}^\infty\subset K\) has a convergent subsequence with limit in \(K\); this is called sequentially compact.
A (relatively) compact set \(K\) is bounded, i.e. \(\sup_{x\in K}d(x,0)<\infty\).
In a Banach space \((X,\|\cdot\|)\), the closed unit ball \(\{x\in X:\,\|x\|\leq 1\}\) is compact if and only if \(\dim(X)<\infty\); this was shown in Michaelmas term.

In the following, \(X,Y,Z\) are Banach spaces.

A linear operator \(T:X\to Y\) is called compact if for every bounded \(M\subset X\) the set \(T(M)\subset Y\) is relatively compact.

A linear operator \(T:X\to Y\) is compact if and only if for every bounded sequence \((x_n)_{n\in\mathbb{N}}\subset X\) (i.e. \(\sup_{n\in\mathbb{N}}\|x_n\|<\infty\)), the image sequence \((Tx_n)_{n\in\mathbb{N}}\subset Y\) has a convergent subsequence.
A compact linear operator \(T:X\to Y\) is bounded.
The identity operator \(I:X\to X\) is compact if and only if \(\dim(X)<\infty\).
If \(T_1,T_2:X\to Y\) are compact and \(\alpha,\beta\in\mathbb{C}\), then \(\alpha T_1+\beta T_2\) is compact.
If \(T_1: X\to Y\) and \(T_2:Y\to Z\) are bounded and (at least) one of them is compact, then \(T_2T_1:X\to Z\) is compact.
If \(T\in B(X,Y)\) satisfies \(\dim(\mathcal{R}(X))<\infty\), i.e. has finite rank, then \(T\) is compact.

First assume that \(T\) is compact and let \((x_n)_{n=1}^\infty\subset X\) be bounded. Then \(\{x_n:\,n\in\mathbb{N}\}\) is bounded and hence \(\overline{\{Tx_n:\,n\in\mathbb{N}\}}\subset Y\) is compact. It follows from Remark [rem:comp] part (1) that \((Tx_n)_{n=1}^\infty\subset Y\) has a convergent subsequence.
Conversely, assume that for every bounded sequence \((x_n)_{n\in\mathbb{N}}\subset X\), \((Tx_n)_{n\in\mathbb{N}}\subset Y\) has a convergent subsequence. Let \(M\subset X\) be bounded and take a sequence \((y_n)_{n\in\mathbb{N}}\subset\overline{T(M)}\). By definition of the closure, for every \(n\in\mathbb{N}\) there exists \(x_n\in M\) with \(\|Tx_n-y_n\|<1/n\). By assumption, \((Tx_n)_{n\in\mathbb{N}}\subset Y\) has a convergent subsequence, \(Tx_{n_k}\to y\), \(k\to\infty\), for some \(y\in Y\). Then \[\|y-y_{n_k}\|\leq \|y-Tx_{n_k}\|+\underbrace{\|Tx_{n_k}-y_{n_k}\|}_{<1/n_k}\to 0, \quad k\to\infty.\] Thus \((y_n)_{n=1}^\infty\subset\overline{T(M)}\) has a convergent subsequence, and hence \(T(M)\) is relatively compact by Remark [rem:comp] part (1). This proves that \(T\) is compact.
Obviously, the set \(\overline{B_1(0)}=\{x\in X:\,\|x\|\leq 1\}\) is bounded. Thus the compactness of \(T\) implies that \(T(\overline{B_1(0)})\) is relatively compact. It follows from Remark [rem:comp] part (2) that \(T(\overline{B_1(0)})\) is bounded. Thus \(\sup_{\|x\|\leq 1}\|Tx\|<\infty\), which implies that \(T\) is bounded.
This follows from Remark [rem:comp] part (3).
This is easy to see.
Exercise!

Let \(T_n,T\in B(X,Y)\), \(n\in\mathbb{N}\), with \(\|T_n-T\|\to 0\) as \(n\to\infty\). If all \(T_n\) are compact, then \(T\) is compact.

The proof is quite technical. It follows from a diagonal sequence argument: Let \((x_n)_{n=1}^\infty\subset X\) be a bounded sequence. We have to show that \((Tx_n)_{n=1}^\infty\subset Y\) has a convergent subsequence. Since \(T_1\) is compact, there exists a convergent subsequence \((T_1x_i)_{i\in I_1}\subset (T_1x_n)_{n=1}^\infty\). Since \(T_2\) is compact, there exists a convergent subsequence \((T_2x_i)_{i\in I_2}\subset (T_2x_i)_{i\in I_1}\), and so on. Since the index sets \(I_n\) satisfy \(I_n\subset I_{n-1}\), the \(j\)-th element \(i_j^{(n)}\) in \(I_n\) satisfies \(i_j^{(n)}\geq i_j^{(n-1)}\). Consider the diagonal sequence \(y_n:=x_{i_n^{(n)}}\), which is a subsequence of \((x_n)_{n=1}^\infty\). Then \((y_n)_{n\in\mathbb{N}}\subset X\) is bounded as well. One can show that \((Ty_n)_{n=1}^\infty\) is a Cauchy sequence in \(Y\), and thus convergent since the Banach space \(Y\) is complete. We omit the details.

If \(\dim(X)<\infty\), then each \(T\in B(X)\) is compact.
In a Hilbert space \((H,\langle \cdot,\cdot\rangle)\), consider finitely many \(e_i,f_i\in H\), \(i=1,\dots,n\). Then the operator \(T:H\to H\) defined by \[Tx:=\sum_{i=1}^n \langle x,f_i\rangle e_i, \quad x\in H,\] has finite rank and is therefore compact. If \(e_i=f_i\) for \(i=1,\dots,n\), and \(\{e_i:\,i=1,\dots,n\}\) are orthonormal, then \(K\) is the orthogonal projection onto \({\rm span}\{e_i:\,i=1,\dots,n\}\).
As in (2) but now \[Tx:=\sum_{i=1}^\infty \langle x,f_i\rangle e_i, \quad x\in H,\] where we assume that \(\sum_{i=1}^\infty \|e_i\| \|f_i\|<\infty\). Then \(T\) is the norm limit of operators in (2) (limit \(n\to\infty\)), and thus \(T\) is compact by Theorem [thm:limitcomp].

Recall that

a sequence \((x_n)_{n\in\mathbb{N}}\) converges weakly in \(X\) to some \(x\in X\), i.e. \(x_n\stackrel{w}{\to} x\) in \(X\), if \(f(x_n)\to f(x)\) for every \(f\in X^*\).
if \(x_n\stackrel{w}{\to}x\) in \(X\), then \(\sup_{n\in\mathbb{N}}\|x_n\|<\infty\).
weak limits are unique.
if \(x_n\to x\) in \(X\), then \(x_n\stackrel{w}{\to} x\) in \(X\).

The next result states that a compact operator sends weakly convergent sequences to (strongly) convergent sequences.

Let \(T\in B(X,Y)\) be compact. If \(x_n\stackrel{w}{\to} x\) in \(X\), then \(Tx_n\to Tx\) in \(Y\).

Let \(y_n:=Tx_n\) and \(y:=Tx\). First we show that \(y_n\stackrel{w}{\to} y\). To this end, let \(f\in Y^*\). Then \(g:X\to\mathbb{C}\) defined by \(g(u):=f(Tu)\) defines a bounded linear functional on \(X\), i.e. \(g\in X^*\). Now \(x_n\stackrel{w}{\to} x\) implies \[f(y_n)=f(Tx_n)=g(x_n)\to g(x)=f(Tx)=f(y).\] This proves \(y_n\stackrel{w}{\to} y\). The sequence \((x_n)_{n\in \mathbb{N}}\) is bounded since it is weakly convergent. The compactness of \(T\) implies that \((Tx_n)_{n\in \mathbb{N}}\) has a convergent subsequence \((Tx_{n_k})_{k\in\mathbb{N}}\), i.e. \(Tx_{n_k}\to y_0\) as \(k\to\infty\), for some \(y_0\in Y\). However, convergence in \(Y\) implies weak convergence in \(Y\). Since we know that each subsequence of \(y_n=Tx_n\), \(n\in\mathbb{N}\), converges weakly to \(y=Tx\), the uniqueness of the weak limit implies that \(y_0=y=Tx\). Thus \(Tx_{n_k}\to Tx\) as \(k\to\infty\). However, we want to prove that \(Tx_n\to Tx\), not only on a subsequence. To prove this, we assume by contradiction that there exist an infinite index set \(I\subset\mathbb{N}\) and an \(\epsilon>0\) such that \(\|Tx_n-y\|\geq \varepsilon\) for all \(n\in I\). The sequence \((x_n)_{n\in I}\) is bounded, as a subsequence of the bounded sequence \((x_n)_{n\in\mathbb{N}}\). Now repeat the above argument, with \((x_n)_{n\in\mathbb{N}}\) replaced by \((x_n)_{n\in I}\), to prove that \((Tx_n)_{n\in I}\) has a subsequence that converges to \(y=Tx\). This is a contradiction to \(\|Tx_n-y\|\geq \varepsilon\) for all \(n\in I\), hence no such index set \(I\) can exist. Therefore, \(Tx_n\to y=Tx\) as \(n\to\infty\).

Let \(H_1,H_2\) be Hilbert spaces and let \(T\in B(H_1,H_2)\). Then \[T \text{ compact }\quad \stackrel{\rm(1)}{\Longleftrightarrow}\quad T^*T \text{ compact} \quad \stackrel{\rm(2)}{\Longleftrightarrow}\quad T^* \text{ compact}.\]

\(\Longrightarrow\): If \(T\) is compact, then it is bounded, and hence \(T^*\) is bounded. Now \(T^*T\) is compact as the product of a bounded and a compact operator, see Proposition [prop:comp] part (5).
\(\Longleftarrow\): Take a bounded sequence \((x_n)_{n=1}^\infty\subset H_1\), \(C:=\sup_{n\in\mathbb{N}}\|x_n\|<\infty\). Since \(T^*T\) is assumed to be compact, there exists a convergent subseqence \((T^*Tx_{n_k})_{k=1}^\infty\). But then \[\begin{aligned} \|Tx_{n_k}-Tx_{n_j}\|^2&=\|T(x_{n_k}-x_{n_j})\|^2=\langle T^*T(x_{n_k}-x_{n_j}),x_{n_k}-x_{n_j}\rangle\\ &\leq \|T^*T(x_{n_k}-x_{n_j})\|\underbrace{\|x_{n_k}-x_{n_j}\|}_{\leq 2C}\to 0\end{aligned}\] as \(k,j\to\infty\). Thus \((Tx_{n_k})_{k=1}^\infty\) is a Cauchy sequence and hence convergent (Hilbert spaces are complete). This proves that \(T\) is compact.
\(\Longleftarrow\): If \(T^*\) is compact, then it is bounded, and hence \(T\) is bounded. Thus \(T^*T\) is compact by Proposition [prop:comp] part (5).
\(\Longrightarrow\): Assume that \(T^*T\) is compact. By equivalence (1) we know that \(T\) is compact. Thus \(TT^*\) is compact by Proposition [prop:comp] part (5). Apply equivalence (1) to \(T^*\) and use \(T^{**}=\overline{T}=T\) (bounded operators are closed if they are defined on the whole space) to obtain that \(T^*\) is compact if and only if \(TT^*\) is compact. Since we have shown the latter already, we conclude that \(T^*\) is compact.

The following result states that every compact operator is the norm-limit of finite rank operators.

Let \(H\) be a separable Hilbert space and let \(T\in B(H)\) be compact. Then there exist \(T_n\in B(H)\), \(n\in\mathbb{N}\), with finite rank such that \(\|T_n-T\|\to 0\) as \(n\to\infty\).

Since \(H\) is separable, there exists a countable orthonormal basis \(\{e_i:\,i\in\mathbb{N}\}\) of \(H\). For \(n\in\mathbb{N}\), define the finite rank operator \(T_n\in B(H)\) by \[T_nx:=\sum_{i=1}^n \langle x,e_i\rangle Te_i, \quad x\in H.\] We show that \(\|T_n-T\|\to 0\) as \(n\to\infty\). To this end, let \(x\in H\). Then \[\begin{aligned} (T-T_n)x &=(T-T_n)\sum_{j =1}^\infty \langle x,e_j\rangle e_j =\sum_{j=1}^\infty \langle x,e_j\rangle Te_j -\sum_{i=1}^n \sum_{j=1}^\infty \langle x,e_j\rangle \underbrace{\langle e_j, e_i\rangle}_{=\delta_{ij}} Te_i \\ &=\sum_{j=n+1}^\infty \langle x,e_j\rangle Te_j =T\left(\sum_{j=n+1}^\infty \langle x,e_j\rangle e_j\right)\end{aligned}\] where we used the Kronecker delta \(\delta_{ij}=1\) if \(i=j\) and \(\delta_{ij}=0\) otherwise. Note that \(\sum_{j=n+1}^\infty \langle x,e_j\rangle e_j\in\{e_i:\,i=1,\dots,n\}^\perp\). Thus, if we define \[\alpha_n:=\sup_{0\neq u\in \{e_i:\,i=1,\dots,n\}^\perp}\frac{\|Tu\|}{\|u\|},\] then \[\|(T-T_n)x\|\leq \alpha_n \left\|\sum_{j=n+1}^\infty \langle x,e_j\rangle e_j\right\|\leq \alpha_n \|x\|.\] This yields \(\|T-T_n\|\leq \alpha_n\). It remains to prove \(\alpha_n\to 0\) as \(n\to\infty\). Let \(n\in\mathbb{N}\). By definition of the supremum, there exists \(0\neq u_n\in \{e_i:\,i=1,\dots,n\}^\perp\) with \[\frac{\|Tu_n\|}{\|u_n\|}\in \left[\frac{\alpha_n}{2},\alpha_n\right].\] Let \(v_n:=u_n/\|u_n\|\). Then \(v_n\in \{e_i:\,i=1,\dots,n\}^\perp\), \(\|v_n\|=1\) and \(\|Tv_n\|\geq \alpha_n/2\). We claim that \(v_n\stackrel{w}{\to} 0\) as \(n\to\infty\). To prove this, let \(y\in H\). Then, writing \(v_n=\sum_{i=1}^\infty \langle v_n,e_i\rangle e_i=\sum_{i=n+1}^\infty \langle v_n,e_i\rangle e_i\) and \(y=\sum_{i=1}^\infty \langle y,e_i\rangle e_i\), \[\begin{aligned} |\langle v_n,y\rangle| &=\left|\sum_{i=n+1}^\infty \langle v_n,e_i\rangle \overline{\langle y,e_i\rangle}\right| \leq \sum_{i=n+1}^\infty |\langle v_n,e_i\rangle|\,|\langle y,e_i\rangle|\\ &\leq \left(\sum_{i=n+1}^\infty |\langle v_n,e_i\rangle|^2\right)^{1/2}\left(\sum_{i=n+1}^\infty|\langle y,e_i\rangle|^2\right)^{1/2}\end{aligned}\] using the Cauchy-Schwarz inequality. Now, with \(\sum_{i=n+1}^\infty |\langle v_n,e_i\rangle|^2=\|v_n\|^2=1\), we obtain \[|\langle v_n,y\rangle|^2 \leq \sum_{i=n+1}^\infty|\langle y,e_i\rangle|^2,\] and the right hand side converges to zero as \(n\to\infty\), because \(\sum_{i=1}^\infty|\langle y,e_i\rangle|^2=\|y\|^2<\infty\). Thus \(\langle v_n,y\rangle\to 0\), and this is true for all \(y\in H\), hence \(v_n\stackrel{w}{\to}0\). Since \(T\) is compact, it maps weakly convergent sequences to convergent sequences (see Theorem [thm:compconv]). Therefore, \(Tv_n\to 0\), i.e. \(\|Tv_n\|\to 0\). Finally, \(\|Tv_n\|\geq \alpha_n/2\) implies \(\alpha_n\to 0\), which concludes the proof since \(\|T-T_n\|\leq \alpha_n\).

Let \(H\) be a Hilbert space and \(T\in B(H)\) be selfadjoint. If \(T\) is compact, then \(\|T\|\in\sigma_p(T)\) or \(-\|T\|\in\sigma_p(T)\), and \[\|T\|=\max_{x\in H\atop \|x\|=1}|\langle Tx,x\rangle|.\]

If \(T=0\) (the zero operator), the claim is easy to prove. Now let \(\lambda:=\|T\|>0\). By Theorem [thm:sanorm] we know that \(\|T\|=\sup\limits_{x\in H \atop \|x\|=1}|\langle Tx,x\rangle|\). Thus there exists a sequence \((x_n)_{n=1}^\infty\subset H\) with \(\|x_n\|=1\) and \(0\leq \langle Tx_n,x_n\rangle\to \lambda\) or \(0\geq \langle Tx_n,x_n\rangle\to -\lambda\). First we assume that the former is true. Then \[0\leq \|(T-\lambda)x_n\|^2=(\underbrace{\|Tx_n\|}_{\leq \|T\|=\lambda})^2-2\lambda\langle T x_n,x_n\rangle+\lambda^2 \leq 2\lambda(\lambda-\langle Tx_n,x_n\rangle)\to 0,\] which implies \((T-\lambda)x_n\to 0\). Since \(T\) is assumed to be compact, there exists a convergent sequence \(Tx_{n_k}\to y\) for some \(y\in H\). Then \[x_{n_k}=\frac{1}{\lambda}\left(Tx_{n_k}-(T-\lambda)x_{n_k}\right)\to \frac{1}{\lambda}y.\] Note that \(\|y\|=|\lambda|\,\lim_{k\to\infty}\|x_{n_k}\|=|\lambda|>0\), hence \(y\neq 0\). Since \(T\) is bounded, it is continuous. Thus \(x_{n_k}\to y/\lambda\) implies \(Tx_{n_k}\to Ty/\lambda\). However, we also know that \(Tx_{n_k}\to y\), hence \(y=Ty/\lambda\), i.e. \(Ty=\lambda y\). This proves \(\|T\|=\lambda\in\sigma_p(T)\). Analogously one can prove that \(0\geq \langle Tx_n,x_n\rangle\to -\lambda\) implies that \(-\|T\|=-\lambda\in\sigma_p(T)\). In both cases, we see that \[|\lambda|=\|T\|=\sup\limits_{x\in H \atop \|x\|=1}|\langle Tx,x\rangle|,\] and the supremum is attained, thus it is a maximum.

If we drop the assumption that \(T\) is compact, we can still conclude that \(\|T\|\in\sigma(T)\) or \(-\|T\|\in\sigma(T)\) (a point of the spectrum, but not necessarily an eigenvalue). To prove this, we proceed as in the previous proof to show that \(\lambda=\|T\|\) satisfies \((T-\lambda)x_n\to 0\). Now if, by contradiction, \(\lambda\notin\sigma(T)\), then \((T-\lambda)^{-1}\) is a bounded operator and hence \(x_n=(T-\lambda)^{-1}(T-\lambda)x_n\to 0\). The obtained contradiction to \(\|x_n\|=1\) implies \(\lambda=\|T\|\in\sigma(T)\). The other case works analogously.

1.5 Spectral theorems

Recall from Linear Algebra that for a selfadjoint \(n\times n\) matrix \(A=A^*\) there exists an orthonormal basis (ONB) \(\{e_i:\,i=1,\dots,n\}\) of \(\mathbb{C}^n\) consisting of eigenvectors of \(A\) such that with respect to this ONB the matrix is diagonal, \(A={\rm diag}(\lambda_1,\lambda_2,\dots,\lambda_n)\) where \(\lambda_i\in\mathbb{R}\), \(i=1,\dots,n\), are the eigenvalues of \(A\).

Now we study the properties of the spectrum and eigenspaces of linear operators in a Hilbert space \((H,\langle\cdot,\cdot\rangle)\).

[Spectral theorem for compact selfadjoint operators] Let \(T\in B(H)\) be compact and selfadjoint. Then there exists an orthonormal system (ONS) \(\{e_i:\,i=1,\dots,N\}\) (\(N\in\mathbb{N}\cup\{\infty\}\)) of eigenvectors of \(T\) corresponding to eigenvalues \(\lambda_i\in\sigma_p(T)\backslash\{0\}\), which we order as \(|\lambda_1|\geq |\lambda_2|\geq \cdots\), such that \[\begin{equation} \label{eq:Tx} Tx=\sum_{i=1}^N\lambda_i\langle x,e_i\rangle e_i, \quad x\in H. \end{equation}\] If \(N=\infty\), then \(\lim_{i\to\infty}\lambda_i=0\).

This is proved via iterated application of Theorem [thm:normeval]:
Step 1: Set \(H_1:=H\) and \(T_1:=T\). By Theorem [thm:normeval], there exists \(\lambda_1\in\sigma_p(T_1)\) with \(|\lambda_1|=\|T_1\|\). Let \(e_1\in H_1\) be a normalised eigenvector corresponding to \(\lambda_1\), i.e. \(T_1e_1=Te_1=\lambda_1 e_1\). Then \(T(\{e_1\}^\perp)\subset\{e_1\}^\perp\) because if \(x\in \{e_1\}^\perp\) then \[\langle Tx,e_1\rangle=\langle x,Te_1\rangle=\lambda_1\langle x,e_1\rangle=0,\] i.e. \(Tx\in\{e_1\}^\perp\).
Step 2: Set \(H_2:=\{e_1\}^\perp\) (which is a Hilbert space) and \(T_2:=T_1|_{H_2}=T|_{H_2}\) (which is selfadjoint in \(H_2\)). By Theorem [thm:normeval], there exists \(\lambda_2\in\sigma_p(T_2)\) with \(|\lambda_2|=\|T_2\|\). Let \(e_2\in H_2\) be a normalised eigenvector corresponding to \(\lambda_2\), i.e. \(T_2e_2=Te_2=\lambda_2 e_2\). Note that \(e_2\perp e_1\) since \(e_2\in H_2=\{e_1\}^\perp\). In addition, \(T(\{e_1,e_2\}^\perp)\subset\{e_1,e_2\}^\perp\), analogously as before.
Step 3: Set \(H_3:=\{e_1,e_2\}^\perp\) (which is a Hilbert space) and \(T_3:=T_2|_{H_3}=T|_{H_3}\) (which is selfadjoint in \(H_3\)). Proceed analogously. There are two possibilities:

There exists \(n_0\in\mathbb{N}\) such that \(T_{n_0+1}=0\) in \(H_{n_0+1}\). Then \(N=n_0\in\mathbb{N}\).
For all \(n\in\mathbb{N}\), \(T_n\) is not the zero operator in \(H_n\). Then \(N=\infty\).

Note that \(|\lambda_1|\geq |\lambda_2|\geq\cdots\), and \(\{e_i:\,i=1,\dots,N\}\) is an ONS in \(H\).

Next we show that if \(N=\infty\) then \(\lim_{i\to\infty}\lambda_i=0\). We proceed by contradiction and assume the claim is false. Since \((|\lambda_i|)_{i=1}^\infty\) is a monotonically decreasing sequence, there exists \(\varepsilon>0\) such that \(|\lambda_i|\geq \varepsilon\) for all \(i\in\mathbb{N}\). Let \(i,j\in\mathbb{N}\) with \(i\neq j\). Then, using \(e_i\perp e_j\), \[\|Te_i-Te_j\|^2=\|\lambda_i e_i-\lambda_j e_j\|^2=|\lambda_i|^2+|\lambda_j|^2\geq 2\varepsilon^2.\] This proves that \((Te_i)_{i=1}^\infty\) has no subsequence that is Cauchy (or even convergent). However, since \(\{e_i:\,i\in\mathbb{N}\}\subset H\) is a bounded set and \(T\) is compact, \((Te_i)_{i=1}^\infty\) should have a convergent subsequence. The obtained contradiction proves \(\lim_{i\to\infty}\lambda_i=0\).

It remain to prove \(\eqref{eq:Tx}\). Let \(x\in H\). Define \[x_n:=x-\sum_{i=1}^{n-1} \langle x,e_i\rangle e_i, \quad n\in\mathbb{N}\backslash\{1\}.\] Then \(x_n\in H_n=\{e_j:\,j=1,\dots,n-1\}^\perp\) because \(\langle x_n,e_j\rangle=\langle x,e_j\rangle -\langle x,e_j\rangle=0\) for all \(j=1,\dots,n-1\). Thus \[\begin{equation} \label{eq:xnorm} \|x\|^2=\left\|x_n+\sum_{i=1}^{n-1} \langle x,e_i\rangle e_i\right\|^2=\|x_n\|^2+\left\| \sum_{i=1}^{n-1} \langle x,e_i\rangle e_i\right\|^2. \end{equation}\] Proceed separately for the cases (1) or (2) above:
Case (1): Since \(T_{N+1}=0\) in \(H_{N+1}\), we have \[0=T_{N+1}x_{N+1}=Tx_{N+1}=Tx-\sum_{i=1}^{N}\langle x,e_i\rangle \underbrace{Te_i}_{=\lambda_i e_i}.\] Hence we obtain \(\eqref{eq:Tx}\).
Case (2): Using \(\|T_n\|=|\lambda_n|\) and \(\eqref{eq:xnorm}\) we obtain \[\begin{aligned} \left\|Tx-\sum_{i=1}^{n-1} \lambda_i\langle x,e_i\rangle e_i\right\|^2 &=\left\|T\left(x-\sum_{i=1}^{n-1} \langle x,e_i\rangle e_i\right)\right\|^2 =\|Tx_n\|^2\\ &=\|T_n x_n\|^2 \leq \|T_n\|^2 \|x_n\|^2=|\lambda_n|^2 \|x_n\|^2\\ &\leq |\lambda_n|^2 \left(\|x_n\|^2+\left\| \sum_{i=1}^{n-1} \langle x,e_i\rangle e_i\right\|^2\right) =|\lambda_n|^2\|x\|^2.\end{aligned}\] Since \(|\lambda_n|\to 0\) as \(n\to\infty\), we see that the partial sums \(\sum_{i=1}^{n-1} \lambda_i\langle x,e_i\rangle e_i\) converge to \(Tx\), which concludes the proof.

The set of non-zero eigenvalues \(\sigma_p(T)\backslash\{0\}\) is at most countable, and each non-zero eigenvalue has finite multiplicity (otherwise we would have a contradiction to \(N<\infty\) or \(\lim_{i\to\infty}\lambda_i=0\)). The only possible accumulation point of non-zero eigenvalues is \(0\).
The einvectors \(\{e_i:\,i=1,\dots,N\}\) form an ONB of \(\overline{\mathcal{R}(T)}\). By Theorem [thm:kerran], \(\overline{\mathcal{R}(T)}=\mathcal{N}(T)^\perp\).
If \(P_0:H\to H\) denotes the orthogonal projection onto \(\mathcal{N}(T)\), then \[\begin{equation} \label{eq:x} x=P_0x+\sum_{i=1}^N \langle x,e_i\rangle e_i, \quad x\in H. \end{equation}\]
From \(\eqref{eq:Tx}\) and \(\eqref{eq:x}\) it follows that \(\sigma(T)\backslash\{0\}=\sigma_p(T)\backslash\{0\}\) (see also the Corollary below, which shows that for every \(0\neq \lambda\in\mathbb{C}\backslash\sigma_p(T)\) the operator \((T-\lambda)^{-1}\) exists and is bounded). In addition, for every eigenvalue \(\lambda\in\sigma_p(T)\), \(\mathcal{N}_\lambda(T)=\mathcal L_\lambda(T)\).

Let \(T\in B(H)\) be compact and selfadjoint. Then, for \(0\neq \lambda\in\rho(T)\), \[(T-\lambda)^{-1}y=-\frac{1}{\lambda}P_0 y +\sum_{i=1}^N\frac{1}{\lambda_i-\lambda}\langle y,e_i\rangle e_i, \quad y\in H.\]

Set \(x=(T-\lambda)^{-1}y\). We can write \(x=P_0x+\sum_{i=1}^N \langle x,e_i\rangle e_i\). This implies, with \(TP_0=0\), \[y=(T-\lambda)x=-\lambda P_0x+\sum_{i=1}^N (\lambda_i-\lambda) \langle x,e_i\rangle e_i.\] However, we also have \(y=P_0y+\sum_{i=1}^N \langle y,e_i\rangle e_i\). Comparing both representations of \(y\), we see that \[-\lambda P_0x=P_0 y, \quad (\lambda_i-\lambda)\langle x,e_i\rangle=\langle y,e_i\rangle.\] This implies the claim.

Next we want to study the spectral properties of (possibly unbounded) selfadjoint operators with compact resolvents.

A linear operator \(T\in C(X)\) is said to have compact resolvent if \((T-\lambda)^{-1}\) is compact for all \(\lambda\in\rho(T)\).

An operator \(T\in C(X)\) has compact resolvent if there exists \(\lambda_0\in\rho(T)\) such that \((T-\lambda_0)^{-1}\) is compact.

The first resolvent identity (see Theorem [thm:firstres]) implies that, for any \(\rho(T)\), \[(T-\lambda)^{-1}=(T-\lambda_0)^{-1}+(\lambda-\lambda_0)(T-\lambda)^{-1}(T-\lambda_0)^{-1}.\] Since \((T-\lambda)^{-1}\) is bounded and \((T-\lambda_0)^{-1}\) is compact, also \((T-\lambda)^{-1}\) is compact.

If \(\dim(X)=\infty\) and \(T\in C(X)\) has compact resolvent, then \(T\) is unbounded. This is because if \(T\) were bounded, then the identity operator \(I=(T-\lambda)(T-\lambda)^{-1}\) would be compact, as the product of a bounded and a compact operator, but the identiy operator is compact only in finite-dimensional spaces.

[Spectral theorem for selfadjoint operators with compact resolvents] Let \(T\in C(H)\) be selfadjoint with compact resolvent.

The spectrum consists of countably many eigenvalues, \(\sigma(T)=\sigma_p(T)\subset\mathbb{R}\).
For an eigenvalue \(\lambda\in\sigma_p(T)\), \(\mathcal{N}_\lambda(T)=\mathcal L_\lambda(T)\) and the eigenspace is finite-dimensional.
There exists an ONB of \(H\) consisting of eigenvectors \(e_i\) corresponding to the eigenvalues \(\lambda_i\), which are ordered as \(0\leq |\lambda_1|\leq |\lambda_2|\leq \cdots\), with \(\lim_{i\to\infty}|\lambda_i|=\infty\).
The operator domain can be written as \[\mathcal{D}(T)=\left\{x\in H:\,\sum_{i=1}^\infty |\lambda_i|^2 |\langle x,e_i\rangle|^2<\infty\right\}.\]
For every \(x\in\mathcal{D}(T)\), \[Tx=\sum_{i=1}^\infty \lambda_i \langle x,e_i\rangle e_i.\]

Parts (1), (2), (3) follow from Theorem [thm:specthm] applied to the selfadjoint, compact operator \((T-\lambda_0)^{-1}\) for some real \(\lambda_0\in\rho(T)\). Note that \(\mathcal{N}((T-\lambda_0)^{-1})=\{0\}\), thus Remark [rem:specthm] part (2) implies that the eigenvectors form an ONB of \(H\). Also, \(x\in H\) is an eigenvector of \(T\) to the eigenvalue \(\lambda\) if and only if it is an eigenvector of \((T-\lambda_0)^{-1}\) to the eigenvalue \((\lambda-\lambda_0)^{-1}\).

It remains to prove parts (4), (5). Without loss of generality we assume that \(0\in\rho(T)\) (otherwise consider \(T'=T-\lambda_0\)). First, let \(x\in H\) with \(\sum_{i=1}^\infty |\lambda_i|^2 |\langle x,e_i\rangle|^2<\infty\). Then, for any \(m,n\in\mathbb{N}\) with \(m>n\), \[\left\|\sum_{i=1}^m \lambda_i\langle x,e_i\rangle e_i-\sum_{i=1}^n\lambda_i\langle x,e_i\rangle e_i\right\|^2 =\left\|\sum_{i=n+1}^m \lambda_i\langle x,e_i\rangle e_i\right\|^2 =\sum_{i=n+1}^m|\lambda_i|^2|\langle x,e_i\rangle|^2,\] which converges to zero as \(m,n\to\infty\), hence the partial sums form a Cauchy sequence in \(H\) and are thus convergent, \(y:=\sum_{i=1}^\infty \lambda_i\langle x,e_i\rangle e_i\in H\). Since we assume that \(0\in\rho(T)\), we have \(e_i=\lambda_iT^{-1}e_i\). The eigenvectors \(\{e_i:\,i\in\mathbb{N}\}\) for an ONB, hence \[x=\sum_{i=1}^\infty \langle x,e_i\rangle e_i=\sum_{i=1}^\infty \langle x,e_i\rangle \lambda_i T^{-1}e_i.\] The partial sums \(x_n:=\sum_{i=1}^n \langle x,e_i\rangle \lambda_i T^{-1}e_i\) satisfy \(x_n\to x\) and \[Tx_n=\sum_{i=1}^\infty \langle x,e_i\rangle \lambda_i e_i\to y.\] By definition of \(T\) being closed, \(x\in\mathcal{D}(T)\) and \(Tx=y=\sum_{i=1}^\infty \lambda_i\langle x,e_i\rangle e_i\).

Conversely, let \(x\in\mathcal{D}(T)\). We have to show that \(\sum_{i=1}^\infty |\lambda_i|^2 |\langle x,e_i\rangle|^2<\infty\). Since the \(\{e_i:\,i\in\mathbb{N}\}\) form an ONB and \(Tx\in H\), \[Tx=\sum_{i=1}^\infty \langle Tx,e_i\rangle e_i=\sum_{i=1}^\infty \langle x,Te_i\rangle e_i =\sum_{i=1}^\infty \lambda_i\langle x,e_i\rangle.\] This implies \[\sum_{i=1}^\infty |\lambda_i|^2 |\langle x,e_i\rangle|^2=\left\|\sum_{i=1}^\infty \lambda_i\langle x,e_i\rangle\right\|^2=\|Tx\|^2<\infty.\]

[Application: Vibrating string] We consider the one-dimensional wave equation of a vibrating string of constant density (for simplification, we set all physical constants to \(1\)): \[\frac{\partial^2 u(x,t)}{\partial t^2}= \frac{\partial^2 u(x,t)}{\partial x^2}, \quad x\in [0,1], \quad t\geq 0.\] Here \(x\in [0,1]\) is the spacial variable and \(t\geq 0\) is the time. We impose the boundary conditions \[u(0,t)=0, \quad u(1,t)=0, \quad t\geq 0,\] i.e. the string is fixed to zero at the two endpoints. In addition, at \(t=0\) we have an initial condition, \[u(x,0)=f(x), \quad x\in [0,1],\] for a given \(f\in L^2([0,1])\). We find the solution \(u\) using the Spectral Theorem for selfadjoint operators with compact resolvents. After a separation of variables ansatz \(u(x,t)=y(x)v(t)\) we see that, for some \(\lambda\in\mathbb{C}\), \[-y''(x)=\lambda y(x)\quad \text{and}\quad -v''(t)=\lambda v(t).\] We first concentrate on the first differential equation. The boundary conditions read \(y(0)=y(1)=0\). Thus we introduce the linear operator \((Ty)(x):=-y''(x)\) in the Hilbert space \(L^2([0,1])\), on the domain \(\mathcal{D}(T):=\{y\in W^{2,2}([0,1]):\,y(0)=y(1)=0\}\). The Sobolev space \[W^{2,2}([0,1]):=\{y\in L^2([0,1]):\,y,y' \text{ absolutely continuous}, \,y''\in L^2([0,1])\}\] will be studied in the next Chapter. One can show that \(T\) is closed, selfadjoint and has compact resolvent (try to show this after learning about Sobolev spaces).

Next we find all eigenvalues of \(T\). To this end, we solve \(-y''=\lambda y\). The general solution is \(y(x)=A\mathrm{e}^{\mathrm{i}\sqrt{\lambda} x}+B\mathrm{e}^{-\mathrm{i}\sqrt{\lambda} x}\), and the constants \(A,B\) are found using the boundary conditions \(y(0)=y(1)=0\) and the normalisation condition \(\|y\|^2=\int_0^1|y(x)|^2\,\mathrm{d}x=1\). This gives the eigenvalues \[\lambda_n=n^2\pi^2, \quad n\in\mathbb{N},\] with corresponding eigenfunctions \[y_n(x)=\sqrt{2}\sin(n\pi x), \quad x\in [0,1].\] By Theorem [thm:specthmres], \(\{y_n:\,n\in\mathbb{N}\}\) is an ONB of \(L^2([0,1])\). Thus we can write the intial profile as \(f=\sum_{n=1}^\infty \langle f,y_n\rangle y_n\). In addition, for every \(t\geq 0\) we can write \(u(x,t)=\sum_{n=1}^\infty \langle u(\cdot,t),y_n\rangle y_n(x)\). This solves the wave equation and the initial and boundary conditions if and only if the Fourier coefficients \(c_n(t):=\langle u(\cdot,t),y_n\rangle\) satisfy \[\sum_{n=1}^\infty c_n''(t) y_n=-\sum_{n=1}^\infty \lambda_n\langle u(\cdot,t)),y_n\rangle y_n, \quad \sum_{n=1}^\infty c_n(0) y_n=\sum_{n=1}^\infty \langle f,y_n\rangle y_n.\] Comparing the representations, we see that \[-c_n''(t)=\lambda_n c_n(t), \quad c_n(0)=\langle f,y_n\rangle.\] The general solution is given by \(c_n(t)=C_n \mathrm{e}^{\mathrm{i}\sqrt{\lambda_n}t}+D_n\mathrm{e}^{-\mathrm{i}\sqrt{\lambda_n}t}\) with constants \(C_n,D_n\in\mathbb{C}\) satisfying \(C_n+D_n=\langle f,y_n\rangle\). In order to write the solution in terms of real-valued functions, we can write \(c_n(t)=\langle f,y_n\rangle \cos(\sqrt{\lambda_n} t+\varphi_n)\) with a constant \(\varphi_n\in\mathbb{R}\). Using the explicit expressions for \(\lambda_n\) and \(y_n\) we obtain \[u(x,t)=2\sum_{n=1}^\infty \left(\int_0^1 f(s)\sin(n\pi s)\,\mathrm{d}s\right)\sin(n\pi x)\cos(n\pi t+\varphi_n).\] As each \(\varphi_n\) is arbitrary, this solution is not unique. To make it unique, we would need to impose another initial condition, such as \(\partial_t u(x,0)=g(x)\), \(x\in [0,1]\), for a given \(g\in L^2([0,1])\), or \(u(x,t_0)=h(x)\), \(x\in [0,1]\), for given \(t_0>0\) and \(h\in L^2([0,1])\).

2 Hilbert space methods for PDEs

2.1 Sobolev spaces

In this section we introduce weak derivatives and Sobolev spaces. Throughout, let \(U\subset\mathbb{R}^n\) denote an open subset. Denote by \(C_c^\infty(U)\) the space of infinitely differentiable functions \(\phi:U\to \mathbb{C}\) and such that its support \({\rm supp}(\phi):=\overline{\{x\in U:\,\phi(x)\neq 0\}}\) is a compact (i.e. closed and bounded) subset of \(U\); in particular, since \(U\) is open, this implies that the support of \(\phi\) has positive distance to the boundary of \(U\), hence \(\phi\) is zero in a neighbourhood of the boundary of \(U\). We call \(\phi\in C_c^\infty(U)\) a test function.

Let \(1\leq p<\infty\). Then the space \(C_c^\infty(U)\) is dense in \(L^p(U)\), i.e. for every \(u\in L^p(U)\) there exists a sequence \((u_j)_{j\in\mathbb{N}}\subset C_c^\infty(U)\) with \(\|u_j-u\|_{L^p(U)}\to 0\) as \(j\to\infty\).

Note that all (?) differential operators \(T\) that we have seen in this module satisfy \(C_c^{\infty}(U)\subset\mathcal{D}(T)\subset L^p(U)\) for some \(p\) and \(U\). Now since \(C_c^{\infty}(U)\subset L^p(U)\) is dense, so is \(\mathcal{D}(T)\), which means that \(T\) is densely defined in the space \(L^p(U)\).

Now we want to study \(L^p(U)\)-functions that are differentiable in a weak sense.

Motivation for weak derivative. Let \(u\in C^1(U)\) and \(\phi\in C_c^\infty(U)\). Integration by parts yields \[\int_U u (\partial_{x_i}\phi)\,\mathrm{d}x=-\int_U (\partial_{x_i} u)\phi\,\mathrm{d}x, \quad i=1,\dots,n.\] Note that we have no boundary term because \(\phi\in C_c^\infty(U)\) has compact support in \(U\). More generally, for \(k\in\mathbb{N}\), let \(\alpha=(\alpha_1,\dots,\alpha_n)\in\mathbb{N}_0^n\) be a multiindex of order \(|\alpha|:=\alpha_1+\dots+\alpha_n=k\), then \[\begin{equation} \label{eq:alphader} \int_U u (D^\alpha \phi)\,\mathrm{d}x=(-1)^{|\alpha|}\int_U (D^\alpha u)\phi\,\mathrm{d}x, \end{equation}\] with notation \[D^\alpha=\partial_{x_1}^{\alpha_1}\dots\partial_{x_n}^{\alpha_n}.\] Note that \(D^0u=u\).

Let \(L_{loc}^1(U)\) denote the set of all measurable functions \(u:U\to\mathbb{C}\) such that for every compact subset \(V\subset U\) the restriction \(u|_V\) is in \(L^1(V)\).

Let \(u,v\in L_{loc}^1(U)\) and let \(\alpha\) be a multiiindex. Then \(v\) is the weak \(\alpha\)-th partial derivative of \(u\), denoted by \(D^{\alpha}u\), provided \[\forall\,\phi\in C_c^\infty(U):\quad \int_U u (D^\alpha \phi)\,\mathrm{d}x=(-1)^{|\alpha|}\int_U v\phi\,\mathrm{d}x\] (compare with \(\eqref{eq:alphader}\)).

Note that the integrals exist since \(u,v\in L_{loc}(U)\) and \(\phi\) has compact support, hence for example \[\left|\int_U u (D^\alpha \phi)\,\mathrm{d}x\right|=\left|\int_{{\rm supp}(\phi)} u (D^\alpha \phi)\,\mathrm{d}x\right|\leq \|D^\alpha \phi\|_{\infty} \int_{{\rm supp}(\phi)}|u| \,\mathrm{d}x<\infty.\]

[Uniqueness of weak derivatives] A weak \(\alpha\)-th partial derivative is uniquely defined up to a set of measure zero.

If \(u\in C^k(U)\) and \(|\alpha|=k\) then \(D^\alpha u\) exists in classical sense and therefore agrees with the weak derivative (up to a set of measure zero).

In the following we let \(1\leq p\leq \infty\) and \(k\in\mathbb{N}\).

The Sobolev space \(W^{k,p}(U)\) consists of all \(u\in L_{loc}^1(U)\) such that for every multiindex \(\alpha\) with \(|\alpha|\leq k\) the weak derivative \(D^\alpha u\) exists and is in \(L^p(U)\). We equip the space \(W^{k,p}(U)\) with the norm \[\|u\|_{W^{k,p}(U)}=\begin{cases} \left(\sum_{|\alpha|\leq k}\int_U|D^{\alpha}u|^p\,\mathrm{d}x\right)^{1/p}, &1\leq p<\infty,\\ \sum_{|\alpha|\leq k} \|D^{\alpha} u\|_\infty, & p=\infty. \end{cases}\]
For \(p=2\) we write \(H^k(U):=W^{k,2}(U)\) – the letter \(H\) emphasises that (as we will see) \(H^k(U)\) is a Hilbert space with scalar product \[\langle u,v\rangle_{H^k(U)}=\sum_{|\alpha|\leq k}\langle D^\alpha u,D^\alpha v\rangle_{L^2(U)}.\]
We denote by \(W_0^{k,p}(U)\) the closure of \(C_c^\infty(U)\) in \(W^{k,p}(U)\) (i.e. with respect to the norm \(\|\cdot\|_{W^{k,p}(U)}\)). For \(p=2\) we write \(H_0^k(U):=W_0^{k,2}(U)\).

[Integration by parts] Let \(u\in H^1(U)\) and \(v\in H_0^1(U)\). Then \[\int_U u(\partial_{x_i} v)\,\mathrm{d}x=-\int_U (\partial_{x_i} u) v\,\mathrm{d}x, \quad i=1,\dots,n.\]

This is true if \(v\in C_c^\infty(U)\) (by definition of weak derivatives) and hence also for \(v\in H_0^1(U)\) since \(C_c^\infty(U)\) is dense in \(H_0^1(U)\).

For each \(k\in\mathbb{N}\) and \(1\leq p\leq \infty\) the Sobolev spaces \(W^{k,p}(U)\), \(W_0^{k,p}(U)\) are Banach spaces (i.e. complete); in particular, for \(p=2\) we have that \(H^k(U)=W^{k,2}(U)\), \(H_0^k(U)=W_0^{k,2}\) are Hilbert spaces.

For the next result we recall (from Analysis IIII) that a function \(u:(a,b)\to \mathbb{C}\) is absolutely continuous if and only if the (classical) derivative \(u'\) exists almost everywhere with \(u'\in L_{loc}^1((a,b))\) and \[\begin{equation} \label{eq:AC} \forall\,x_1,x_2\in (a,b):\quad u(x_2)=u(x_1)+\int_{x_1}^{x_2} u'(t)\,\mathrm{d}t. \end{equation}\] In particular, Lipschitz continuous functions are absolutely continuous (you may use this without proof).

Consider \(u(x)=|x|\) for \(x\in\mathbb{R}\). Then \(u\) is Lipschitz continuous and hence absolutely continuous. The derivative exists almost everywhere, namely in \(\mathbb{R}\backslash\{0\}\), with \(u'(x)={\rm sgn}(x)\), which belongs to \(L_{loc}^1(\mathbb{R})\). In addition, one may check that \(\eqref{eq:AC}\) holds.
Consider \(u(x)=|x|\sin(1/x)\) for \(x\in\mathbb{R}\backslash\{0\}\) and \(u(0)=0\). Then \(u\) is continuous in \(\mathbb{R}\), and differentiable at each \(x\neq 0\), with derivative \[u'(x)={\rm sgn}(x)\sin(1/x)-\frac{|x| \cos(1/x)}{x^2}, \quad x\neq 0.\] Note that \(u'(x)\) is unbounded near \(x=0\). In fact, this singularity is so strong that \(u'\notin L_{loc}^1(\mathbb{R})\) since (for \(\varepsilon>0\) small) we have \[\int_{\varepsilon}^1 |u'(x)|\,\mathrm{d}x\geq \int_{\varepsilon}^1 \frac{|\cos(1/x)|}{x}\,\mathrm{d}x-\int_{\varepsilon}^1 |\sin(1/x)|\,\mathrm{d}x,\] and \(\lim_{\varepsilon\to 0}\int_{\varepsilon}^1 |\sin(1/x)|\,\mathrm{d}x\leq \lim_{\varepsilon\to 0}\int_{\varepsilon}^1 \,\mathrm{d}x=1\) whereas (substituting \(1/x=y\) and \(\mathrm{d}x=-y^{-2} \mathrm{d}y\)) \[\int_{\varepsilon}^1 \frac{|\cos(1/x)|}{x}\,\mathrm{d}x =\int_1^{1/\varepsilon} \frac{1}{y} |\cos(y)|\,\mathrm{d}y\] is unbounded as \(\varepsilon\to 0\). Thus \(u'\notin L^1([0,1])\) and hence \(u\) is not absolutely continuous.

In dimension \(n=1\) let \(U=(a,b)\) be a bounded or unbounded interval.

We have \(u\in W^{1,p}((a,b))\) if and only if \(u\in L^p((a,b))\) is absolutely continuous and its derivative (which exists almost everywhere) is in \(L^p((a,b))\).
Analogously, \(u\in W^{k,p}((a,b))\) if and only if for \(j=0,\dots,k-1\) the \(j\)-th derivative \(u^{(j)}\) is absolutely continuous (hence its derivative exists almost everywhere) and \(u^{(j)}\in L^p((a,b))\) for \(j=0,\dots,k\).
If \((a,b)\) is a bounded interval, then \(u\in W_0^{k,p}((a,b))\) if and only if \(u\in W^{k,p}((a,b))\) and \[\forall j=0,\dots,k-1:\quad u^{(j)}(a)=u^{(j)}(b)=0\] (note that we have no boundary condition for the \(k\)-th derivative).

This simple characterisation only holds for \(n=1\). In higher dimensions, a function may belong to a Sobolov space and yet be discontinuous and/or be unbounded.

For \(r>0\) denote \(B_r(0)=\{x\in\mathbb{R}^n:\,|x|<r\}\). Let \(U=B_1(0)\) be the open unit ball in \(\mathbb{R}^n\), and \[u(x)=\begin{cases} |x|^{-\beta}, &x\in U\backslash\{0\},\\ 0, &x=0,\end{cases}\] for \(\beta>0\) (the value at \(x=0\) is not important). We find all possible \(p\) (depending on \(\beta\) and \(n\)) such that \(u\in W^{1,p}(U)\). First note that \(u\) is \(C^1(U\backslash\{0\})\), with (classical) partial derivative \[\partial_{x_i}u(x)=\frac{-\beta}{|x|^{\beta+1}}\frac{x_i}{|x|}, \quad x\neq 0.\] Thus the gradient \(\nabla u\) satisfies \[|\nabla u(x)|=\frac{\beta}{|x|^{\beta+1}}.\] For a \(\gamma>0\) we see that \(|x|^{-\gamma}\in L^1(U)\) if and only if \(\gamma<n\) since in this case, with \(C={\rm Area}(\partial B_1(0))\), \[\int_U|x|^{-\gamma}\,\mathrm{d}x=\int_0^1 |x|^{-\gamma} C |x|^{n-1}\,\mathrm{d}|x|=C \int_0^1 |x|^{n-1-\gamma}\,\mathrm{d}|x|=\frac{C}{n-\gamma}<\infty.\] This implies \(u\in L^p(U)\) if and only if \(|x|^{-p\beta}\in L^1(U)\), which is satisfied if and only if \(p\beta<n\). Analogously, \(|\nabla u|\in L^p(U)\) if and only if \(p(\beta+1)<n\). It remains to check that \(u\) is weakly differentiable in \(U\); then \(u\in W^{1,p}(U)\) for \(p<\frac{n}{\beta+1}\).

Take a small \(\varepsilon>0\). Let \(\phi\in C_c^\infty(U)\). Then \[\int_{U\backslash B_{\varepsilon}(0)} u(\partial_{x_i} \phi)\,\mathrm{d}x=-\int_{U\backslash B_{\varepsilon}(0)}(\partial_{x_i}u)\phi\,\mathrm{d}x+\int_{\partial B_{\varepsilon}(0)}u\phi\nu_i\,\mathrm{d}S,\] where \(\nu=(\nu_1,\dots,\nu_n)\) denotes the inward pointing normal on \(\partial B_{\varepsilon}(0)\). We estimate, using \(|u|=\varepsilon^{-\beta}\) on \(\partial B_{\varepsilon}(0)\), \[\left|\int_{\partial B_{\varepsilon}(0)}u\phi\nu_i\,\mathrm{d}S\right|\leq \|\phi\|_{\infty}\varepsilon^{-\beta}\int_{\partial B_{\varepsilon}(0)}\,\mathrm{d}S =\|\phi\|_{\infty}\varepsilon^{-\beta}\varepsilon^{n-1}{\rm Area}(\partial B_1(0)).\] Note that if \(\beta+1<n\), then this converges to zero as \(\varepsilon\to 0\). Thus \[\int_U u(\partial_{x_i} \phi)\,\mathrm{d}x=-\int_U(\partial_{x_i}u)\phi\,\mathrm{d}x.\] This proves that \(u\) is weakly differentiable if \(\beta+1<n\), and \(u\in W^{1,p}(U)\) for \(1\leq p<\frac{n}{\beta+1}\).

Now we study whether some subspaces of Sobolev spaces are dense. Recall that \(W_0^{k,p}(U)\) is the closure of \(C_c^{\infty}(U)\) in \(W^{k,p}(U)\), i.e. \(C_c^\infty(U)\) is dense in \(W_0^{k,p}(U)\) by definition. The subscript \(0\) is important, as in general \(C_c^\infty(U)\) is not dense in \(W^{k,p}(U)\). The exception is when \(U=\mathbb{R}^n\), as the following result shows:

We have \(W^{k,p}(\mathbb{R}^n)=W_0^{k,p}(\mathbb{R}^n)\) and hence \(C_c^{\infty}(\mathbb{R}^n)\) is dense in \(W^{k,p}(\mathbb{R}^n)\).

For general bounded, open \(U\subset\mathbb{R}^n\) we can approximate each \(u\in W^{k,p}(U)\) by smooth functions:

Assume that \(U\) is bounded and \(1\leq p<\infty\) (not \(p=\infty\)). Then \(C^{\infty}(U)\cap W^{k,p}(U)\) is dense in \(W^{k,p}(U)\), i.e. for every \(u\in W^{k,p}(U)\) there exists a sequence \((u_j)_{j\in\mathbb{N}}\subset C^{\infty}(U)\cap W^{k,p}(U)\) such that \(\|u_j-u\|_{W^{k,p}(U)}\to 0\) as \(j\to\infty\).

Under the additional assumption that the boundary \(\partial U\) is \(C^1\), then \(u\in W^{k,p}(U)\) can even be approximated by functions that are in \(C^{\infty}(\overline{U})\), i.e. such that the functions and all their derivatives are continuous up to the boundary.

Assume that \(U\) is bounded and \(\partial U\) is \(C^1\), and \(1\leq p<\infty\) (not \(p=\infty\)). Then \(C^{\infty}(\overline{U})\cap W^{k,p}(U)\) is dense in \(W^{k,p}(U)\).

Next we study compact embeddings. Let \(X,Y\) be two Banach spaces such that \(X\subset Y\). We say that \(X\) is compactly embedded in \(Y\) if the embedding operator \(J:X\to Y\), \(Jx=x\) is compact.

[Rellich–Kondrachov Compactness Theorem] Assume that \(U\) is bounded and \(\partial U\) is \(C^1\). Then the space \(W^{1,p}(U)\) is compactly embedded in \(L^p(U)\).

In fact, the Rellich–Kondrachov Theorem is true in more generality, namely that \(W^{1,p}(U)\) is compactly embedded in \(L^q(U)\) if \(p,q\) satisfy certain inequalities; however, in this module we will only use the case \(p=q\).
We know that \(W_0^{1,p}(U)\subset W^{1,p}(U)\). In fact, \(W_0^{1,p}(U)\) is compactly embedded in \(L^p(U)\) even without the assumption that \(\partial U\) is \(C^1\).

A typical application is the following.

Let \([a,b]\subset\mathbb{R}\) be a bounded interval. Let \(T=-\frac{\mathrm{d}^2}{\mathrm{d}x^2}\) be the differential operator in \(L^2([a,b])\) with operator domain \[\mathcal{D}(T)=\{u\in H^2((a,b)):\,u(a)=u(b)=0\}=H^2((a,b))\cap H_0^1((a,b)).\] This operator is densely defined because \(C_c^{\infty}((a,b))\subset\mathcal{D}(T)\subset L^2([a,b])\), and \(C_c^{\infty}((a,b))\) is dense in \(L^2([a,b])\). By checking that \(\langle Tu,u\rangle\in\mathbb{R}\) for all \(u\in\mathcal{D}(T)\), we conclude that \(T\) is symmetric. One can even show that \(T\) is selfadjoint, for example by checking that \(\mathcal{R}(T\pm\mathrm{i})=L^2([a,b])\). To this end, for \(\lambda\neq 0\) we use that \(-u''-\lambda u=y\) (with \(y\in L^2([a,b])\)) is satisfied for \[u(x)=-\frac{\mathrm{e}^{\mathrm{i}\sqrt{\lambda}x}}{2\mathrm{i}\sqrt{\lambda}}\int_a^x \mathrm{e}^{-\mathrm{i}\sqrt{\lambda}t} y(t)\,\mathrm{d}t +\frac{\mathrm{e}^{-\mathrm{i}\sqrt{\lambda}x}}{2\mathrm{i}\sqrt{\lambda}}\int_a^x \mathrm{e}^{\mathrm{i}\sqrt{\lambda}t} y(t)\,\mathrm{d}t +A\mathrm{e}^{\mathrm{i}\sqrt{\lambda}x}+B\mathrm{e}^{-\mathrm{i}\sqrt{\lambda}x},\] where the constants \(A,B\) are such that \(u(a)=u(b)=0\), which is possible for \(\lambda=\pm\mathrm{i}\) (check! In fact, it turns out this is possible for all \(\lambda\neq 0\) such that \(\mathrm{e}^{2\mathrm{i}\sqrt{\lambda}(b-a)}\neq 1\), which is equivalent to \(\sqrt{\lambda}\notin \frac{\pi}{b-a}\mathbb{Z}\)). Thus indeed \(\mathcal{R}(T\pm\mathrm{i})=L^2([a,b])\), and hence \(T\) is selfadjoint.

Next we show that \(T\) has compact resolvent. To this end, we use Sheet 7, Q7, with \(X=L^2([a,b])\): If \(\rho(T)\neq \emptyset\), then \(T\) has compact resolvent if and only if the embedding operator \(J: (\mathcal{D}(T),\|\cdot\|_T)\to X\), \(Ju=u\), is compact, where \(\mathcal{D}(T)\) is equipped with the graph norm \[\|u\|_T:=\|u\|+\|Tu\|, \quad u\in\mathcal{D}(T).\] Here and in the remainder of this Example, \(\|\cdot\|\) denotes the \(L^2\)-norm. We show that this embedding \(J\) is compact. To this end, let \((u_j)_{j\in\mathbb{N}}\) be a bounded sequence in \((\mathcal{D}(T),\|\cdot\|_T)\), i.e. \(\sup_j(\|u_j\|+\|u_j''\|)<\infty\). This implies that \(\sup_j\|u_j\|<\infty\) and \(\sup_j\|u_j''\|<\infty\). Note that then also \(\sup_j\|u_j'\|<\infty\) because, integrating by parts, \[\|u_j'\|^2=-\langle u_j,u_j''\rangle\leq \|u_j\|\|u_j''\|.\] Thus \(\sup_j\|u_j\|_{H^1((a,b))}=\sup_j\sqrt{\|u_j\|^2+\|u_j'\|^2}<\infty\). Now the Rellich–Kondrachov Theorem (for \(p=2\)) implies that \((u_j)_{j\in\mathbb{N}}\subset L^2([a,b])\) has a convergent subsequence. This proves that the above operator \(J\) is compact, and hence \(T\) has compact resolvent. Now the spectral theorem for selfadjoint operators with compact resolvents applies, which implies that the spectrum of \(T\) consists of a countable set of real eigenvalues and the eigenfunctions form an orthornomal basis of \(L^2([a,b])\).

One can do the same in higher dimensions: Let \(U\) be a bounded, open, connected subset of \(\mathbb{R}^n\). The Laplacian \(T=-\Delta=-\sum_{i=1}^n\partial_{x_i}^2\) with operator domain \(\mathcal{D}(T)=H^2(U)\cap H_0^1(U)\) is selfadjoint in \(L^2(U)\) and has compact resolvent. Thus the spectrum of \(T\) consists of a countable set of real eigenvalues and the eigenfunctions form an orthornomal basis of \(L^2(U)\). The eigenvalues \(\lambda\in\sigma_p(T)\) are all positive because, if \(f\) is the corresponding normalised eigenfunction, then \[\lambda=\langle \lambda f,f\rangle=\langle Tf,f\rangle=-\langle \Delta f,f\rangle=\|\nabla f\|^2>0.\] Let \(0<\lambda_1\leq \lambda_2\leq \lambda_3\leq \dots\) be the eigenvalues of \(T\), and let \(e_j\) denote the corresponding normalised eigenfunction to \(\lambda_j\). Then, for every \(f\in\mathcal{D}(T)\), the spectral theorem implies that \[Tf=\sum_{j=1}^\infty \lambda_j \langle f,e_j\rangle e_j,\] and hence \[\|\nabla f\|^2=-\langle \Delta f,f\rangle=\langle Tf,f\rangle=\sum_{j=1}^\infty \lambda_j |\langle f,e_j\rangle|^2\geq \lambda_1 \sum_{j=1}^\infty |\langle f,e_j\rangle|^2=\lambda_1 \|f\|^2.\] Thus we obtain the Poincaré inequality (appears also in the PDEs module) \[\|f\|^2\leq C \|\nabla f\|^2\] with \(C=1/\lambda_1\). It holds for all \(f\in\mathcal{D}(T)\). However, \(\mathcal{D}(T)=H^2(U)\cap H_0^1(U)\) is dense in \(H_0^1(U)\) (for example because \(C_c^\infty(U)\subset\mathcal{D}(T)\) is dense in \(H_0^1(U)\)), and thus the Poincaré inequality also holds for all \(f\in H_0^1(U)\).

Now we introduce the dual space of \(H_0^1(U)\).

The space \(H^{-1}(U):=(H_0^1(U))^*\) is the dual space of \(H_0^1(U)\), i.e. the space of all bounded linear functionals \(f:H_0^1(U)\to \mathbb{C}\).

Since \(H_0^1(U)\) is a Hilbert space, the Riesz Representation Theorem implies that for every \(f\in H^{-1}(U)\) there exists \(u\in H_0^1(U)\) such that \[\forall v\in H_0^1(U):\quad f(v)=\langle v,u\rangle_{H_0^1(U)}=\langle v,u\rangle_{L^2(U)}+\langle \nabla v,\nabla u\rangle_{L^2(U)}.\]

2.2 Existence of weak solutions

The vital tool in the Hilbert space approach to boundary-value problems is the Lax–Milgram Theorem. The essence of the method is the interpretation of the problem in a weak sense.

Motivation. Let \(U\subset\mathbb{R}^n\) be an open, bounded subset. We study the boundary-value problem \[\begin{equation} \label{eq:BVP} \begin{cases} Tu=g &\text{in } U,\\ u=0 & \text{on }\partial U,\end{cases} \end{equation}\] where \(T\) is a (formal) differential operator and \(g:U\to \mathbb{C}\) is a given function. We want to find a solution \(u\) in a weak sense, which has less regularity than if \(u\) were in \(\mathcal{D}(T)\).

As a more concrete example let \[(Tu)(x)=-\sum_{i,j=1}^n \partial_{x_j}(a_{ij}(x)\partial_{x_i}u(x))+\sum_{i=1}^n b_i(x)\partial_{x_i}u(x)+c(x) u(x)\] with coefficient functions \(a_{ij}, b_i, c:U\to\mathbb{C}\) for \(i,j=1,\dots,n\). Assuming for the moment that \(u\) is a smooth function, take the \(L^2(U)\) scalar product of a smooth test function \(\varphi\in C_c^{\infty}(U)\) with both sides of the PDE \(Tu=g\) to find, after integration by parts, \[\begin{equation} \label{eq:weak} \int_U \left(\sum_{i,j=1}^n (\partial_{x_j}\varphi) \overline{a_{ij}(\partial_{x_i} u)}+\sum_{i=1}^n \varphi\overline{ b_i (\partial_{x_i} u)} + \varphi\overline{cu}\right)\,\mathrm{d}x =\int_U \varphi \overline{g}\,\mathrm{d}x. \end{equation}\] If the coefficient functions are “sufficiently nice”, say \(a_{ij},b_i,c\in L^\infty(U)\), then the left hand side of \(\eqref{eq:weak}\) makes sense for \(u,\varphi\in H_0^1(U)\). (We choose the space \(H_0^1(U)\) as opposed to \(H^1(U)\) to incorporate the boundary condition \(u=0\) on \(\partial U\).) The right hand side of \(\eqref{eq:weak}\) can be generalised to \(f(\varphi)\) for a bounded linear functional \(f\in H^{-1}(U)=H_0^1(U)^*\) instead of the more concrete \(f(\varphi)=\int_U \varphi \overline{g}\,\mathrm{d}x\). (Note that here \(|f(\varphi)|\leq \|g\|_{L^2}\|\varphi\|_{L^2}\leq \|g\|_{L^2}\|\varphi\|_{H^1}\), so \(f\in H^{-1}(U)\) if \(g\in L^2(U)\).) Let \(B(\varphi,u)\) denote the left hand side of \(\eqref{eq:weak}\). Then, for \(f\in H^{-1}(U)\), a function \(u\in H_0^1(U)\) is called a weak solution of the boundary-value problem \[\begin{equation} \label{eq:BVPf} \begin{cases} Tu=f &\text{in } U,\\ u=0 & \text{on }\partial U,\end{cases} \end{equation}\] if \[\forall\,\varphi\in H_0^1(U):\quad B(\varphi,u)=f(\varphi).\] We want to know under which conditions on \(a_{ij},b_i,c\) a weak solution exists.

In the literature, often only real-valued functions are considered and hence no complex-conjugation is needed in \(\eqref{eq:weak}\). Here we want to allow for complex-valued functions, as a continuation of the previous section. We take the complex conjugate of \(u\) and not of \(\varphi\) because \(f(\varphi)\) is linear in \(\varphi\) for \(f\in H^{-1}(U)\). Thus we study mappings \(B(\varphi,u)\) that are linear in \(\varphi\) but antilinar (also called conjugate-linear) in \(u\) (note that the same linearity properties are shared by scalar products). These mappings are called sesquilinear forms.

We formulate the Lax–Milgram Theorem in the general setting of a Hilbert space \(H\) (with scalar product \(\langle \cdot,\cdot\rangle\) and norm \(\|\cdot\|\)) and a sesquilinear form \(B:H\times H\to\mathbb{C}\) satisfying, for \(z_1,z_2\in \mathbb{C}\), \[B(z_1 \varphi_1+z_2\varphi_2,u)=z_1 B(\varphi_1,u)+z_2 B(\varphi_2,u), \quad B(\varphi,z_1 u_1+z_2 u_2)=\overline{z_1}B(\varphi,u_1)+\overline{z_2}B(\varphi,u_2).\] The sesquilinear form is

bounded if there exists \(\alpha>0\) such that \[\forall \varphi,u\in H:\quad |B(\varphi,u)|\leq \alpha \|\varphi\| \|u\|.\]
coercive if there exists \(\beta>0\) such that \[\forall\,u\in H:\quad |B(u,u)|\geq \beta \|u\|^2.\]

[Lax–Milgram] Let \(B:H\times H\to\mathbb{C}\) be a sesquilinear form that is bounded and coercive. Let \(f\in H^*\) be an arbitrary bounded linear functional. Then there exists a unique \(u\in H\) such that \[\begin{equation} \label{eq:LM} \forall\,\varphi\in H:\quad B(\varphi,u)=f(\varphi). \end{equation}\]

For each fixed element \(u\), the mapping \(F_u(\varphi):=B(\varphi,u)\) is bounded linear functional with \(\|F_u\|\leq \alpha \|u\|\). The Riesz Representation Theorem implies that there exists a unique \(h_u\in H\) such that \(F_u(\varphi)=\langle \varphi,h_u\rangle\) and \(\|F_u\|=\|h_u\|\). Now let \(A:H\to H\) be defined by \(Au=h_u\). One can check that this defines a linear operator. Since \(\|Au\|=\|h_u\|=\|F_u\|\leq \alpha \|u\|\) we obtain that \(A\) is bounded, \(\|A\|\leq \alpha\). In addition, \[\beta \|u\|^2\leq |B(u,u)|=|F_u(u)|=|\langle u,Au\rangle|\leq \|u\| \|Au\|,\] which implies (cancel one \(\|u\|\) on each side) that \(\|Au\|\geq \beta \|u\|\). We claim that this implies that \(A\) is injective and \(\mathcal{R}(A)\subset H\) is closed. Indeed, \(Au=0\) implies \(0=\|Au\|\geq \beta \|u\|\), hence \(u=0\) and \(A\) is injective. To show that \(\mathcal{R}(A)\) is closed, let \(Au_j\to v\in H\). Then \((Au_j)_{j\in\mathbb{N}}\subset H\) is a Cauchy sequence. However, \(\|Au_j-Au_m\|=\|A(u_j-u_m)\|\geq \beta \|u_j-u_m\|\), and hence \((u_j)_{j\in\mathbb{N}}\subset H\) is a Cauchy sequence and hence convergent, \(u_j\to u\). The boundedness of \(A\) implies that \(A\) is continuous and hence \(v=\lim_{j\to\infty}Au_j=Au\in\mathcal{R}(A)\). Thus, indeed, \(\mathcal{R}(A)\) is closed. Now we show that \(\mathcal{R}(A)=H\); since \(\mathcal{R}(A)\) is closed, we can equivalently show that \(\mathcal{R}(A)^\perp =\{0\}\). Let \(w\in \mathcal{R}(A)^\perp\). Since \(Aw\in\mathcal{R}(A)\) we get \[0=|\langle w, Aw\rangle|=|B(w,w)|\geq \beta \|w\|^2,\] which implies \(w=0\). Altogether, \(A:H\to H\) is bijective and hence \(A^{-1}:H\to H\) exists and is bounded. Now, again with the Riesz Representation Theorem, we know that there exists \(h_f\in H\) such that \(f(\varphi)=\langle \varphi, h_f\rangle\) for all \(\varphi\in H\). Let \(u_f:=A^{-1} h_f\). Then, for every \(\varphi\in H\), \[B(\varphi,u_f)=\langle \varphi, A u_f\rangle =\langle \varphi, h_f\rangle =f(\varphi).\] Thus \(u=u_f\) satisfies \(\eqref{eq:LM}\). It remain to show uniqueness of the solution to \(\eqref{eq:LM}\). Assume that \(u,\widetilde u\) both satisfy \(\eqref{eq:LM}\). Hence, for \(\varphi\in H\), \(B(\varphi,u)=f(\varphi)=B(\varphi,\tilde u)\), which implies \(B(\varphi,u-\widetilde u)=0\) by the linearity properties of \(B(\cdot,\cdot)\). Setting \(\varphi=u-\widetilde u\) we arrive at \[0=B(u-\widetilde u,u-\widetilde u)\geq \beta \|u-\widetilde u\|^2.\] This implies \(u=\widetilde u\), so the solution is unique.

Now we return to the boundary-value problem in the motivation at the beginning of this section. Let \(a_{ij},b_i,c\in L^\infty(U)\). In the Hilbert space \(H=H_0^1(U)\) define the sesquilinear form \(B:H_0^1(U)\times H_0^1(U)\to \mathbb{C}\) by \[B(\varphi,u):=\int_U \left(\sum_{i,j=1}^n (\partial_{x_j}\varphi) \overline{a_{ij}(\partial_{x_i} u)}+\sum_{i=1}^n \varphi\overline{ b_i (\partial_{x_i} u)} + \varphi\overline{cu}\right)\,\mathrm{d}x.\] Let \(f\in H^{-1}(U)\). We want to apply the Lax–Milgram Theorem to prove the existence of a unique \(u\in H_0^1(U)\) such that \(B(\varphi,u)=f(\varphi)\) for all \(\varphi\in H_0^1(U)\). To this end, we need to prove that \(B(\cdot,\cdot)\) is bounded and coercive. This is possible only under additional assumptions on \(a_{ij},b_i,c\). We illustrate this here for the example of constant coefficient functions.

Let \(a_{ij},b_i,c\in\mathbb{C}\) be constants. We assume that for the \(n\times n\) matrix \(a=(a_{ij})_{i,j=1}^n\) there exists \(\theta>0\) with \[\begin{equation} \label{eq:elliptic} \forall \xi\in\mathbb{C}^n:\quad \mathrm{Re}\langle a \xi,\xi\rangle_{\mathbb{C}^n}\geq \theta |\xi|^2. \end{equation}\] This is satisfied for example for \(-\Delta\), i.e. \(a_{ij}=\delta_{i,j}\), in which case \(a\) is the identity matrix and we can take \(\theta=1\).

First we check that \(B(\cdot,\cdot)\) is bounded. Let \(\varphi,u\in H_0^1(U)\). Then \[\begin{aligned} |B(\varphi,u)| &\leq \sum_{i,j=1}^n |a_{ij}|\|\partial_{x_j}\varphi\|_{L^2}\|\partial_{x_i} u\|_{L^2}+\|\varphi\|_{L^2}\sum_{i=1}^n |b_i| \|\partial_{x_i}u\|_{L^2}+|c|\|\varphi\|_{L^2}\|u\|_{L^2}\\ &\leq \left(\sum_{i,j=1}^n|a_{ij}|+\sum_{i=1}^n|b_i|+|c|\right)\|\varphi\|_{H^1}\|u\|_{H^1}.\end{aligned}\] Now we check coercivity: To this end, let \(u\in H_0^1(U)\). We apply \(\eqref{eq:elliptic}\) for \(\xi(x)=\nabla u(x)\) at each \(x\in U\). Note that \(\mathrm{Re}\langle a\xi(x),\xi(x)\rangle_{\mathbb{C}^n}=\mathrm{Re}\langle\xi(x), a\xi(x)\rangle_{\mathbb{C}^n}\). Thus \[\begin{aligned} \mathrm{Re}(B(u,u))= &\int_U \left(\mathrm{Re}\langle \xi(x),a \xi(x)\rangle_{\mathbb{C}^n}+\sum_{i=1}^n \mathrm{Re}( u\overline{ b_i (\partial_{x_i} u)}) + \mathrm{Re}(\overline{c}|u|^2)\right)\,\mathrm{d}x\\ &\geq \int_U \theta |\xi(x)|^2\,\mathrm{d}x- \|u\|_{L^2}\sum_{i=1}^n |b_i| \|\partial_{x_i}u\|_{L^2}+\mathrm{Re}(c)\|u\|_{L^2}^2\\ &\geq \theta \|\nabla u\|_{L^2}^2- |b|\|u\|_{L^2} \|\nabla u\|_{L^2}+\mathrm{Re}(c)\|u\|_{L^2}^2\end{aligned}\] where \(b=(b_i)_{i=1}^n\). Using the inequality \(r s\leq \varepsilon r^2+\frac{1}{4\varepsilon} s^2\) for \(r,s>0\) and any \(\varepsilon>0\), we estimate the latter further by \[\begin{aligned} &\theta \|\nabla u\|_{L^2}^2- \varepsilon\|\nabla u\|_{L^2}^2-\frac{1}{4\varepsilon} (|b|\|u\|_{L^2}) ^2+\mathrm{Re}(c)\|u\|_{L^2}^2\\ &=(\theta-\varepsilon) \|\nabla u\|_{L^2}^2+\left(-\frac{|b|^2}{4\varepsilon}+ \mathrm{Re}(c)\right)\|u\|_{L^2}^2.\end{aligned}\] It may not be possible to choose \(\varepsilon>0\) such that both coefficients in front of the norms \(\|\nabla u\|_{L^2}^2\), \(\|u\|_{L^2}^2\) are positive. For example if we choose \(\varepsilon=\theta/2\), then we obtain \[\begin{aligned} |B(u,u)| &\geq \mathrm{Re}(B(u,u)) \geq \frac{\theta}{2} \|\nabla u\|_{L^2}^2+\left(-\frac{|b|^2}{2\theta} + \mathrm{Re}(c)\right)\|u\|_{L^2}^2\\ &\geq \min\left\{\frac{\theta}{2} ,-\frac{|b|^2}{2\theta}+ \mathrm{Re}(c)\right\}\|u\|_{H^1}^2=:\beta \|u\|_{H^1}^2.\end{aligned}\] This choice of \(\beta\) is positive only if \(\mathrm{Re}(c)>\frac{|b|^2}{2\theta}\). If the latter is satisfied, then the Lax–Milgram Theorem is applicable, which proves the existence of a unique weak solution of the boundary-value problem \(\eqref{eq:BVPf}\).

If we replace \(\eqref{eq:BVPf}\) by \[\begin{equation} \label{eq:BVP2} \begin{cases} (T-\lambda)u=f &\text{in } U,\\ u=0 & \text{on }\partial U,\end{cases} \end{equation}\] and introduce the corresponding sesquilinear form \[B(\varphi,u):=\langle \nabla \varphi, a \nabla u\rangle_{L^2}+\langle \varphi, b\cdot \nabla u\rangle_{L^2}+(c-\lambda)\langle \varphi,u\rangle_{L^2},\] then the above estimates imply that \(B(\cdot,\cdot)\) is bounded and is coercive for all \(\lambda\in\mathbb{C}\) with \(\mathrm{Re}(-\lambda)\) sufficiently large. Thus the Lax–Milgram Theorem proves the existence of a unique weak solution of the boundary-eigenvalue problem \(\eqref{eq:BVP2}\) for these \(\lambda\). Note that if \(\lambda\) is an eigenvalue of \(T\), then \((T-\lambda)\) is not injective and hence the classical solution of the homogeneous problem \((T-\lambda)u=0\) is not unique. For each solution of the weak problem, we can add a classical solution of \((T-\lambda)u=0\) to get another solution of the weak problem. Hence the weak solution is not unique. Thus we can conclude that if \(\mathrm{Re}(-\lambda)\) is sufficiently large, then \(\lambda\) is not an eigenvalue of \(T\).

Under certain assumptions on the coefficient functions \(a_{ij}, b_i, c\) and also on \(U\) and \(g\) in \(\eqref{eq:BVP}\), one can prove that the weak solution has stronger regularity properties and is in fact a classical solution, i.e. \(u\in\mathcal{D}(T)\) satisfying \(\eqref{eq:BVP}\). Such regularity theorems are beyond the scope of this course.