Geometry of Mathematical Physics III
(Michaelmas 2023-24)

Andreas Braun
andreas.braun@durham.ac.uk Department of Mathematical Sciences, Durham University

Preliminaries

In these lectures we will explore a modern formulation of several aspects of fundamental physics which features symmetries and geometry. As we will see, making use of such mathematical concepts not only deepens our understanding of many classical topics, but paves the way to many of the great conquests of 20th century physics. Our point of view will mostly be that of action principles and classical field theories (or just the mechanics of point particles sometimes) and we will review those aspects that are needed as we go along. Sometimes, it will prove worthwile to change perspective and consider quantum mechanics instead, but we will limit ourselves to a few crucial topics for this.

There are many more advanced topics that are outside the scope of these lectures, but that will come too close for us to completely ignore. The short detours into this terrain that we will undertake are meant to stimulate your intellectual curiosity, but do not form part of the examinable content. The corresponding sections will be clearly marked with a \(^{\bf \ast}\).

These lectures combine material that naturally belongs together, but can rarely be found all in one source. Furthermore, many standard texts on the subjects covered pitch their material at a somewhat higher level. While I hope you take a look at the references given, don’t get discouraged if what you find seems very difficult. These notes form the core of the material you should use in this course and the exercises given reflect the level of understanding I expect you to achieve. Only what is discussed in the lecture notes (and does not have a \(^{\bf \ast}\)) forms the examinable material.

Introduction\(^\ast\)

In (classical) theoretical physics, we typically specify which kind of physical system we are talking about by stating the physical degrees of freedom, e.g. some generalized coordinates \(q_i\) and their derivatives \(\dot{q}_i\) with respect to ‘time’, which uniquely determine the configuration of our system, together with an action \(S[q_i,\dot{q}_i]\) that encodes the dynamics. We will introduce actions formally later, so if you are unfamiliar with actions or feel a little rusty, nevermind, all that matters for this discussion is that the action is a functional of \(q_i(t)\) and \(\dot{q}_i(t)\) which determines the equations of motion by demanding that the equations of motion are extrema of \(S\).

Defining systems using an action \(S\) a sensible1 definition of what is a symmetry might be

Definition 1. Let \(g\) be an invertible map \[g:\hspace{.5cm} \begin{aligned} q_i &\rightarrow g(q_i) \\ \dot{q}_i &\rightarrow g(\dot{q}_i) \end{aligned}\] such that \[S[g(q_i),g(\dot{q}_i)] = S[q_i,\dot{q}_i]\, .\] Then \(g\) is called a symmetry of the action \(S\). Note that this must hold true no matter what \(q_i\) and \(\dot{q}_i\) are.

Given an action we can ask about the set \(G\) of all of its symmetries (and this is something we will frequently do), and our definition above has two immediate consequences for that:

If you consider the above overly formal, we can unpack what it means intuitively: if you can map the system using both \(g\) and \(g'\) without changing \(S\), you might as well first act with \(g'\), after which \(S\) did not change (read the above equation from right to left). Then we subsequently apply \(g\), and again the action does not change. We hence acted with both maps without changing the action.

The properties of the set of symmetries we have just uncovered are exactly what is called a group (or rather, this is why groups are defined the way they are). Recall the following from Linear Algebra I:

Definition 2. A group is a set \(G\) equipped with an operation \(\circ: G \times G \rightarrow G\) such that

Proposition 1. The symmetries of a classical action form a group.

The composition \(\circ\) is just the composition of maps here and the identity \(e\) is the identity map \(g = \mathds{1}\). We just seen the identity is a symmetry and that the composition of symmetries is a symmetry as well, taking care of i) and ii). Composition of maps satisfies associativity (in each case we simply apply three maps after each other), so we also have iii). Finally, we have assumed that symmetries are invertible maps, so we also have iv). \(\square\)

1. Let \(\mathbb{C}\) be the complex numbers and \(\mathbb{C}^* = \mathbb{C}\setminus \{0\}\). Which of these is a group under addition? Which of these is a group under multiplication?

Note that the way we introduced groups did not depend on groups acting in a physics context. More generally, groups naturally appear as symmetries of whatever object you are interested in. Take a moment to think about how you would argue that the defining properties of a group are present for any symmetries that you can think of.

REMARK: In the last years it has become widely appreciated that one needs to rethink the notion of symmetry somewhat when discussing quantum systems with extended objects, and things become more complicated than what we discuss here. Here, a suitable notion of symmetry is different to ‘it is a transformations that leaves the action invariant’, which in turn leads to the possibility that symmetries are no longer groups, which is exactly what happens. This is an active field of research at present, here is a popular science article about the topic .

Symmetries have remarkable consequences on the physics in that they imply conservation laws via what is called ’Noether’s theorem’. Understanding which symmetries an action has hence immediately gives us conserved quantities which we can use in turn to constrain the dynamics. Here is a famous example without which we would not exist: the ‘Standard Model of particle physics’ classically has a symmetry which implies ‘baryon number conservation’. This in turn implies that the proton cannot decay into any of the particles lighter than itself, as this would violate the conservation law. Without this restriction, protons could decay into positrons (the electron’s anti particle) and the world as we know it would end in a flash. So we can explain that the proton is stable by a symmetry! If you try to come up with an extension of the Standard Model you better be careful not to violate this symmetry3.

Instead of investigating if a given action has some group of symmetries, we can hence turn things around and try to construct actions symmetric under a given group \(G\) if the consequences of the symmetry match experiments. This point of view is precisely what people mean when they say that ‘we understand’ fundamental physics using symmetries. Imagine we have good reasons to write down an action \[S[q] = S_1[q] + \alpha S_2[q]\] with a parameter \(\alpha\), and measurements tell us that \(\alpha = 0\). You will immediately write a publication and receive much fame if you can find a symmetry \(G\) which leaves \(S_1\) invariant, but not \(S_2\). In this case we simply add ’invariance under \(G\)’ as a fundamental requirement which forces \(\alpha=0\).

Having established the relevance of groups we of course want to study them in more detail. Two questions that immediately come to mind (and which correspond to topics 1 and 2) are ‘what are they like?’ and ‘how can they act?’. To give you a feeling about the first question, consider the

Example 1. The group \(U(1)\) is the group of complex numbers of unit modulus under multiplication. For \(\boldsymbol{\varphi} = 0.. 2 \pi\) we can write any group element as \[g = e^{i \boldsymbol{\varphi}} \, .\] However, it is not true that this group is isomorphic to the interval \([0..2\pi]\) as \(\boldsymbol{\varphi}=0\) and \(\boldsymbol{\varphi} = 2\pi\) are one and the same group element. It is isomorphic to a circle \(S^1\), see figure 1. The fact that this group is topologically non-trivial will turn out to be the reason we can have magnetic monopoles in the second half of this course! In topic 1 we will examine several classes of groups that are also non-trivial spaces (‘manifolds’) in their own right, these are called ‘Lie groups’.

 [fig:e8roots] A projection of the root system of the Lie group E_8, which can be used to construct its representations.

To appreciate the second question, consider a group that has elements which are \(n \times n\) matrices (which ones is not really important here). If this group is supposed to act on the generalized coordinates in our theory, you might be tempted to say that they might be a vector with \(n\) components and the group acts as \[\boldsymbol{q} \rightarrow M \boldsymbol{q} \, .\] However, this is not the only possibility. We might say that the generalzied coordinates in our theory are \(n \times n\) matrices \(\boldsymbol{Q}\) themselves and the group acts as \[\boldsymbol{Q} \rightarrow M^{-1} \boldsymbol{Q} M \, .\] Note that set of \(n \times n\) matrices is a vector space as well, but its dimension is \(n^2\), which is different from \(n\). The general study of how groups can linearly act on vector spaces of various dimensions (and which dimensions can occur) is called ‘representation theory’ and will discuss this in topic 2. As linear maps on vector spaces can always be written in terms of matrices, we can also say that representation theory is the question of how abstract relations such as \[g g' = g''\] can be concretely realized using matrices. In figure 2 you can see a glimpse of the beautiful structures that emerge when asking (and answering) such questions.

2. Consider the set \(S\) of real \(n \times n\) matrices.

In Epiphany term, we will start investigating what happens when we consider group actions that vary across space-time, i.e. we let \[g = g(t,\vec{x}) \, .\] Such things are called gauge symmetries and form the underpinning for the interactions of the Standard Model of particle physics. Interestingly, demanding such symmetries forces us to have forces which are transmitted by ‘gauge bosons’ such as photons or gluons.

After spending some time formulating gauge theories as classical field theories, we will investigate the impact of the ‘gauge group’ \(G\) on the physics and will discover that the topology of \(G\) plays a central role.

1 Lie Groups and Lie Algebras

1.1 Motivating Examples

Before giving formal definitions of Lie groups, Lie Algebras, let us look at some motivating examples and explore their properties a bit. These examples will serve as templates for all there is to come, so make sure you understand them well4.

Example 1.1. The group \(U(1)\): three easy pieces.

1) The group \(U(1)\) derives its name from unitary complex \(1 \times 1\) matrices. Acting with \(g\) on a complex number \(c\) \[\begin{equation} \label{eq:u1_on_c} c \rightarrow g c, \,\,\, g \in U(1), \,\,\, c \in \mathbb{C} \end{equation}\] we require that the inner form \(|c|^2\) remains unchanged. As \[|c|^2 = \bar{c} c \rightarrow \bar{c} \bar{g} g c = |g|^2 |c|^2\] this implies that \(g\) is a complex number of modulus one, so that we can write \(g = e^{i \phi}\) (as before). Hence

Definition 1.1. \(U(1)\) is the group of complex numbers of unit modulus under multiplication, i.e. the group law is \[e^{i \phi } e^{i \phi'} = e^{i(\phi+\phi')}\,.\]

Let’s see quickly that this is a group indeed.

  • there is an identity element \(e\) in \(G\) such that \(x\circ e = e \circ x = x\) for all \(x \in G\);
    \(\leftrightarrow\)
    we simply use \(1\)

  • if \(x,y \in G\) then \(x\circ y \in G\);
    \(\leftrightarrow\)
    the product of any two complex numbers of unit modulus is again of unit modulus

  • \((x\circ y)\circ z = x \circ (y\circ z)\) for all \(x,y,z \in G\);
    \(\leftrightarrow\)
    multiplication of complex numbers is associative

  • for each \(x \in G\), there exists an inverse \(x^{-1}\) in \(G\) such that \(x^{-1} \circ x = e\);
    \(\leftrightarrow\)
    for \(g = e^{i \phi}\), \(g^{-1}= e^{-i \phi}\).

Note that \(gg' = g'g\) for all \(g,g' \in U(1)\). This is a special property that has a name: ‘abelian’. \(U(1)\) is an example of an abelian group.

Definition 1.2. A group \(G\) is called abelian if \(xy = yx\) for all elements \(x\) and \(y\) of \(G\).

2) Already in the introduction we realized that the map from \(\mathbb{R}\) to \(U(1)\) given by writing \(g = e^{i \phi}\) is not a one-to-one map. If we are to do calculus on \(U(1)\), we cannot simply do calculus on \(\mathbb{R}\) and map that to \(U(1)\) which is a circle. This is the easiest non-trivial example of what we call a manifold.

3) We already introduced \(U(1)\) as acting on complex numbers as in \(\eqref{eq:u1_on_c}\). We might be interested in asking what happens when we perform an infinitesimal transformation, i.e. when \(\phi\) is very close to \(0\). In this case we can approximate the exponential to linear order and get \[c \rightarrow (1+i \phi) c\, .\] The approximation \(e^{i\phi} = (1+i \phi)\) is tangent to the group \(U(1)\) at \(g=1\), see figure 1.2.

We can try to reconstruct finite elements of \(U(1)\) by succesive infinitesimal transformations. Let us hence look at \[(1+i \phi)^2 = 1 + 2 i \phi - \phi^2 \, .\] This fails to reproduce the expansion of the exponential \[e^{i\phi} = 1 + i\phi + \frac{(i\phi)^2}{2} + \frac{(i\phi)^3}{3!} + \cdots\] but we can easily fix this by considering instead \[(1+i \phi/2)^2 = 1 + i \phi - \frac{ \phi^2}{4} \, .\] Now we at least have the linear term right, but the quadratic term is still off. Now consider \[(1+i \phi/3)^3 = 1 + i \phi + \frac{ (i\phi)^2}{3} + \frac{(i\phi)^3}{27} \, .\] Again, we are forced to have the \(1/3\) to get the linear term right and the quadratic term came closer to \(1/2\). This continues as we consider \((1+ i \phi/n)^n\) for higher values of \(n\), we always get the linear term right and the subsequent terms come closer to the expansion of the complex exponential. One can also understand the need for \(1/n\) as follows: we are trying to reproduce a finite group element with phase \(\phi\) by taking \(n\) consecutive infinitesimal group elements. For these to match up, we need the ‘phase’ of the infinitesimal group elements to be \(\phi/n\).

We might hence guess that we can recover a finite map from an infinitesimal one by looking at \((1+ i \phi/n)^n\) and letting \(n\) go to infinity, which turns out to be correct.

Proposition 1.1. \(\lim_{n \rightarrow \infty }(1+ i \phi/n)^n = e^{i\phi}\)

We can expand the powers to find \[\begin{aligned} \lim_{n \rightarrow \infty }(1+ i \phi/n)^n &= \lim_{n \rightarrow \infty } \sum_{k=0}^n \frac{(i\phi)^k}{n^k} \binom{n}{k}\\ & = \lim_{n \rightarrow \infty } \sum_{k=0}^n \frac{(i\phi)^k}{k!} \frac{n!}{(n-k)!n^k} \end{aligned}\] This already looks like the series of the exponential except for the factor \[\frac{n!}{(n-k)!n^k} = \frac{n(n-1)(n-2)\cdots(n-k+1)}{n^k}\, .\] There are exactly \(k\) factors in the numerator of this fraction which all approach \(n\) when \(n\rightarrow \infty\). Hence this factor converges to \(1\), and swapping the limit with the summation (which we can do by uniform convergence) we find that \[\begin{aligned} \lim_{n \rightarrow \infty }(1+ i \phi/n)^n &= \sum_{k=0}^\infty \frac{(i\phi)^k}{k!} = e^{i\phi} \end{aligned}\] \(\square\)

1.1. .

  • By working out the derivative of \[\lim_{n \rightarrow \infty }(1+ i \phi/n)^n\] with respect to \(\phi\), show that this expression satisfies the same differential equation as \(e^{i\phi}\). You may assume that you can swap the order of the limit and taking the derivative.

    As both functions have the same value at \(\phi=0\) this implies that they are equal by the uniqueness of solutions of ordinary differential equations.

  • Consider a square matrix \(A\) and let \(g = e^{iA}\), which is defined via the Taylor series of the exponential. Show that \[\nonumber \lim_{n \rightarrow \infty }(\mathds{1}+ i A /n)^n = e^{i A}\, .\]

Let’s back up a bit and see what this example has taught us:

  1. We defined a continuous group by demanding that its action on complex numbers leaves the inner form invariant.

  2. This group does not have a one-to-one map to \(\mathbb{R}\), it has a ‘non-trivial topology’ and is isomorphic to \(S^1\). Due to it being a group, it has something \(S^1\) does not have: there is a special point, the identity element \(1\).

  3. We found that infinitesimal transformations are tangent to the group at the identity element. We can recover group elements (and in fact the whole group) by iterating infinitesimal transformations and taking a limit. This is the same as exponentiating the infinitesimal element.

Example 1.2. The group \(SU(2)\): three easy pieces revisied.

1) The group \(SU(2)\) is the group of special unitary \(2 \times 2\) matrices. Special refers to \(\det g = 1\) and unitary means they keep the inner product in \(\mathbb{C}^2\) invariant under the action \[\boldsymbol{c} = \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \rightarrow g \boldsymbol{c} = \begin{pmatrix} a & b \\ c & d \end{pmatrix}\begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \, .\] The inner product \[|\boldsymbol{c}|^2 = \bar{\boldsymbol{c}} \cdot \boldsymbol{c} = \bar{c}_1 c_1 + \bar{c}_2 c_2\] transforms as \[|\boldsymbol{c}|^2 \rightarrow \bar{g} \begin{pmatrix} \bar{c}_1 \\ \bar{c}_2 \end{pmatrix} \cdot g \begin{pmatrix} c_1 \\ c_2 \end{pmatrix}\] Let us write this out in components (using summation convention, indices appearing twice are summed over) \[\bar{g} \begin{pmatrix} \bar{c}_1 \\ \bar{c}_2 \end{pmatrix} \cdot g \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} = \bar{g}_{ij} \bar{c}_j \,\, g_{ik} c_k = \bar{g}_{ij} g_{ik} \bar{c}_j c_k = \bar{c}_j g^\dagger_{ji} g_{ik} c_k = \bar{\boldsymbol{c}}^T g^\dagger g \boldsymbol{c} = \boldsymbol{c}^\dagger g^\dagger g \boldsymbol{c}\, .\] Here \(^\dagger\) (‘dagger’) is a shorthand for transpose \(^T\) and complex conjugate applied at the same time, \(g^\dagger = \bar{g}^T\). The above implies that we need \(g^\dagger = g^{-1}\). We hence make the

Definition 1.3. \(SU(2)\) is the group of complex \(2 \times 2\) matrices \(g\) with \(\det g = 1\) and \(g^\dagger g = \mathds{1}\).

Let us check that all of this makes sense, i.e. that this is a group indeed.

With the operation of matrix multiplication we hence have defined a group. Note that even though we needed to work through some equations using properties of the \(^\dagger\), we could have anticipated this by observing that we defined this group as a set of linear maps (to get associativity) that are a symmetry of the inner form on \(\mathbb{C}^2\).

2) Next, let us describe what \(SU(2)\) looks like and how we can parametrize it. For any invertible complex \(2 \times 2\) matrix \[g = \begin{pmatrix} a& b\\ c & d \end{pmatrix}\] we can write \[g^{-1} = \frac{1}{\det g} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}\] As \(\det g = 1\), \(g^{-1} = g^\dagger\) hence implies \[\begin{pmatrix} d & -b \\ -c & a \end{pmatrix} = \begin{pmatrix} \bar{a} & \bar{c} \\ \bar{b} & \bar{d} \end{pmatrix}\] i.e. the most general matrix in \(SU(2)\) can be written as \[\begin{equation} \label{eq:gen_su2_matrix} g = \begin{pmatrix} a & b \\ -\bar{b} & \bar{a} \end{pmatrix} \end{equation}\] and \(\det g = 1\) implies \(|a|^2 + |b|^2 = 1\). As \(a\) and \(b\) are complex numbers, we can write \(a=x_1 + i x_2\) as well as \(b = x_3 + ix_4\) and find \[\begin{equation} \label{eq:su2inr4} SU(2): \{\boldsymbol{x} = (x_1,x_2,x_3,x_4) \in \mathbb{R}^4 | x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\} \, . \end{equation}\] This is the defining equation of a three-sphere5 , i.e. \(SU(2)\) is a space which looks like the three-sphere \(S^3\) and \(a=x_1 + i x_2\), \(b = x_3 + ix_4\) define an embedding of \(S^3\) into \(\mathbb{R}^4\).

3) In the \(U(1)\) example we managed to write the whole group as the complex exponential of something simple, a real number, which generated the infinitesimal group elements. Let’s try something similar here and write \[g = e^{iA}\] for a matrix \(A\) and a group element \(g\) of \(SU(2)\). The exponential of a matrix is defined via the series \[e^{iA} = \sum_{k=0}^{\infty} \frac{(iA)^k}{k!} \, .\] Let us first impose unitarity \(g^\dagger = g^{-1}\) and see what this implies for \(A\). As \((A^n)^\dagger = (A^\dagger)^n\) we have \[g^\dagger = e^{-i A^\dagger}\, ,\] and as furthermore \[g^{-1} = e^{-iA}\] we find \[A^\dagger = A \, .\] Such matrices are called ‘Hermitian’ and they play an important role in quantum mechanics. You may be familiar with exponentials of Hermitian matrices giving unitary ones from there.

Next we investigate \[\begin{aligned} \det g &= \det e^{iA} \end{aligned}\] where we need a little help from the following:

Definition 1.4. The trace of a square matrix \(A\) is \(trA = \sum_j A_{jj}\), the sum of its diagonal elements.

1.2. Show using index notation that

  • \((gh)^\dagger = h^\dagger g^\dagger\)

  • \(tr(gh) = tr(hg)\)

  • \((g \boldsymbol{v}) \cdot (h \boldsymbol{w}) = \vec{v}\, (g^T h)\, \boldsymbol{w}\)

  • \(\det g^\dagger = \overline{\det g}\)

where \(g\) and \(h\) are complex \(n \times n\) matrices and \(\boldsymbol{v}\) and \(\boldsymbol{w}\) are \(n\)-dimensional vectors.

Proposition 1.2. For a \(2 \times 2\) matrix \(A\), we have    \(\det e^{iA} = e^{i trA}\).

: Let us first write \[\begin{aligned} \det g &= \det e^{iA} = \det \lim_{n \rightarrow \infty} (\mathds{1}+iA/n)^n = \lim_{n \rightarrow \infty} \det (\mathds{1}+iA/n)^n \\ &= \lim_{n \rightarrow \infty} \left[\det (\mathds{1}+iA/n)\right]^n \end{aligned}\] We can write the determinant explicitely as \[\det (\mathds{1}+ \frac{iA}{n}) = \left(1 + \frac{i A_{11}}{n}\right)\left(1+ \frac{i A_{22}}{n}\right) - \frac{i^2 A_{21}A_{12}}{n^2} = 1 + \frac{i trA}{n} + ... \,,\] where \(trA = \sum_j A_{jj}\) and the dots stand for terms of order \(n^{-2}\). In the limit \(n \rightarrow \infty\) these terms are subleading and we hence have \[\det g = \det e^{iA} = \lim_{n \rightarrow \infty} \left(1 + i\, trA/n\right)^n = e^{i trA}\, .\]

1.3. For a general \(k \times k\) matrix \(M\) show that

  1. \(\det e^{M}= e^{trM}\)  .

  2. Use this to conclude that for \(g=e^{M}\) we have \(\log \det g = tr\log g\)   . Here the \(\log\) of a matrix is defined as the inverse function of the exponential.

The requirement \(\det g = 1\) now implies \(e^{i trA} = 1\) which gives \(trA = 0\) 6. If we can write \(g \in SU(2)\) as a complex exponential, we hence have to use traceless hermitian \(2 \times 2\) matrices in the exponent. For \(g \in SU(2)\) writing \(g = e^{iA}\) implies that \(A \in T\) where \[T = \left\{ A | A^\dagger = A, trA = 0 \right\}\, .\] Whenever \(A,B\) are in \(T\), then also \(aA+bB\) for \(a,b, \in \mathbb{R}\) are in \(T\): we have \[\begin{aligned} tr( aA+bB ) & = a \, trA + b\, trB = 0 \\ (aA+bB)^\dagger & = \bar{a} A^\dagger + \bar{b}B^\dagger = aA +bB \, . \end{aligned}\] This means that \(T\) is a vector space over the real numbers. It is not too hard to convince yourself7 that \(T\) has dimension \(3\). This is a real vector space which contains (in general) complex matrices, so it really pays off to be able to think of vector spaces abstractly! We can make the following choice of basis vectors for \(T\) \[\begin{equation} \label{eq:pauli_matrices} \sigma_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_2 = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_3 = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\, . \end{equation}\]

These three matricess are know as the ‘Pauli matrices’.8. By a direct computation one can work out that

Proposition 1.3. The Pauli matrices satisfy \[= 2 i \epsilon_{ijk} \sigma_k\] where \([a,b] \equiv ab-ba\) is the commutator. :

1.4. Show that \[= 2 i \epsilon_{ijk} \sigma_k\] where \(\sigma_i\) are the Pauli matrices.

Note in particular that different \(g \in SU(2)\) in general do not commute, i.e. in general \(g_1 g_2 \neq g_2 g_1\). Similarly, the Pauli matrices do not commute with each other, so that in general \[\begin{aligned} e^{i \alpha_1 \sigma_1} e^{i \alpha_2 \sigma_2} \neq e^{i \alpha_2 \sigma_2} e^{i \alpha_1 \sigma_1} \\ e^{i \alpha_1 \sigma_1} e^{i \alpha_2 \sigma_2} \neq e^{i \alpha_1 \sigma_1 + i \alpha_2 \sigma_2} \end{aligned}\] \(SU(2)\) is an example of a non-abelian Lie group.

We can form group elements of \(SU(2)\) by exponentiating arbitrary real linear combinations of the Pauli matrices. Let \(\boldsymbol{\alpha} = (\alpha_1,\alpha_2,\alpha_3)\) and write \[g(\boldsymbol{\alpha}) = e^{i \alpha_j \sigma_j} = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma} }\,\,\,\,\, \alpha_k \in \mathbb{R}.\] As \(\alpha_k \sigma_k\) is traceless and Hermitian for any real \(\boldsymbol{\alpha}\), it follows that \(g \in SU(2)\).

We hence get a group element for every vector \(\boldsymbol{\alpha} \in \mathbb{R}^3\). This map cannot possibly be injective as \(\mathbb{R}^3\) is not the same as \(S^3\). To make contact with the earlier characterization \(SU(2) \simeq S^3\) let us try to work out what kind of coordinates the \(\alpha_k\) give us. As a warmup, let us first consider \[g((\alpha_1,0,0)) = e^{i \alpha_1 \sigma_1 } = \sum_{k=0}^\infty \frac{(i \alpha_1 \sigma_1)^k}{k!} = \sum_{k =\mbox{\scriptsize even}} \frac{(i \alpha_1 \sigma_1)^k}{k!} + \sum_{k =\mbox{\scriptsize odd}} \frac{(i \alpha_1 \sigma_1)^k}{k!}\] As \(\sigma_i^2 = \mathds{1}\), all of the matrix powers in the sum over even \(k\) are equal to \(\mathds{1}\), and all of the matrix powers in the sum over odd \(k\) are equal to \(\sigma_1\). Hence \[e^{i \alpha \sigma_1 } = \sum_{k =\mbox{\scriptsize even}} \frac{(i \alpha_1)^k}{k!} \mathds{1}+ \sum_{k =\mbox{\scriptsize odd}} \frac{(i \alpha_1 )^k}{k!} \sigma_1 = \cos(\alpha_1) \mathds{1}+ i \sin(\alpha_1) \sigma_1 \, .\] You can think of this as a generalization of Euler’s formula \(e^{i\phi} = \cos \phi + i \sin \phi\). Note in particular that \(\alpha_1 + 2 \pi\) maps to the same group element as \(\alpha_1\), so we see the non-injectivity of the exponential map explicitely.

As \(\sigma_1\) commutes with itself, we can write \[g((\alpha_1,0,0)) g((\alpha_1',0,0)) = e^{i \alpha_1 \sigma_1 } e^{i \alpha_1' \sigma_1 } = e^{i (\alpha_1 + \alpha_1') \sigma_1 } = g((\alpha_1+\alpha_1',0,0))\] so that matrices of this form are a subgroup of \(SU(2)\):

Definition 1.5. For a group \(G\), \(H \subset G\) is called a subgroup if the elements of \(H\) form a group with the group composition of \(G\).

Let us quickly check that elements of the form \(g((\alpha_1,0,0))\) form a group in their own right (we already know all of them are in \(SU(2)\)). I copied the definition for you so we can check

This subgroup ‘looks like’ or ‘works in the same way’ as \(U(1)\) parametrized as \(e^{i\phi }\) if we set \(\alpha_1 = \phi\). A more precise way to say this is to use the word group isomorphism:

Definition 1.6. For two groups \(G\) and \(H\), a group homomorphism is a map \(f: G \rightarrow H\) such that \[f(g_1 \circ_G g_2) = f(g_1) \circ_H f(g_2)\]

Note that in this definition we are using the group composition in \(G\) on the left side and the group composition in \(H\) on the right side. The map \(f\) is hence compatible with the group structures of \(G\) and \(H\). Note further that this definition does not assume that \(f\) is injective or surjective.

Example 1.3. The map \[g: x \rightarrow g(x)= e^x\] is a group homomorphism from \(\mathbb{C}\) (with composition \(\circ_+\) being addition) to \(\mathbb{C}^*\) (with composition \(\circ_\ast\) being multiplication). We can check that for every \(x,y \in \mathbb{C}\) we find \[g(x \circ_+ y) = g(x+y) = e^{x+y} = e^x e^y = g(x) g(y) = g(x) \circ_\ast g(y) \, .\] Saying that this is a homomorphism is just another way to express the ‘nice’ property of the exponential that \(e^{x+y} = e^x e^y\).

1.5. Let \(f\) be a homomorphism between two groups \(G\) and \(H\). Show that

  1. \(f(e_G) = e_H\) where \(e_G\) and \(e_H\) are the unit elements of \(G\) and \(H\), respectively.

  2. \(f(g^{-1}) = f(g)^{-1}\) for any \(g \in G\).

1.6. \(U(2)\) is the group of complex \(2 \times 2\) matrices \(g\) such that \(g^\dagger = g^{-1}\), with the group composition being matrix multiplication. Let \(F\) be the map which sends \[g \mapsto \det g \, .\] Show that \(F\) is a group homomorphism from \(U(2)\) to \(U(1)\).

Definition 1.7. For two groups \(G\) and \(H\), a group isomorphism is a map \(f: G \rightarrow H\) which is one-to-one and a group homomorphism.

If two groups have a group isomorphisms, they are the same: the have the same elements that have the same group composition rule, i.e. in the present case we can identify \[e^{i \phi} \leftrightarrow g((\phi,0,0))\, ,\] using the earlier presentation of \(U(1)\). As the composition rule of the elements \(g((\alpha_1,0,0))\) is the same as that of elements of \(U(1)\) this is a group isomorphism. It is hence fair to say that there is a \(U(1)\) sitting inside of \(SU(2)\).

We can summarize the above observation as

Proposition 1.4. There is an injective group homomorphism \(U(1) \rightarrow SU(2)\), the image of which is a \(U(1)\) subgroup of \(SU(2)\).

A similar computation shows that the same works not just for \(\boldsymbol{\alpha}\) of the form \(\boldsymbol{\alpha} = (\alpha_1,0,0)\) but for any subset of \(\boldsymbol{\alpha}\)s that can be written as \(\boldsymbol{\alpha}= t \boldsymbol{\alpha}_0\), so that there are in fact infinitely many \(U(1)\) subgroups of \(SU(2)\). These deserve a special name:

Definition 1.8. A subgroup \(G_\alpha\) of \(SU(2)\) whose elements are of the form \[G_\alpha = \{ e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} } |t \in \mathbb{R}\}\] for some fixed \(\alpha\) is called the one-parameter subgroup generated by \(\alpha\).

What is nice about the parametrization in terms of exponentials of matrices is that we can easily work out all infinitesimal elements. When \(\boldsymbol{\alpha}\) is very small, we can approximate by \[g(\boldsymbol{\alpha}) \simeq \mathds{1}+ i \alpha_j \sigma_j = \mathds{1}+ i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma} \,.\] Equivalently, the space of tangent vectors at \(g = \mathds{1}\) is spanned by the \(\sigma_j\) (times \(i\)), see figure 1.3 and a general vector is written as \(i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma}\). Convince yourself that this is indeed a vector space. As before, we can get back elements of \(SU(2)\) by an infinite iteration of infinitesimal elements: \[g(\boldsymbol{\alpha}) = \lim_{n \rightarrow \infty} (\mathds{1}+ i\, \boldsymbol{\alpha} \cdot \boldsymbol{\sigma}/n)^n = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma} }\, .\] What is more, we can think of recovering any infinitesimal generator \(i\boldsymbol{\alpha} \boldsymbol{\sigma}\) by considering a path \[s(t) = e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} }\] in \(SU(2)\) and taking a derivative w.r.t \(t\) evaluated at \(t=0\) (which corresponds to \(\mathds{1}\in SU(2)\): \[\left. \frac{\partial s(t)}{\partial t}\right|_{t=0} = \left.\frac{\partial}{\partial t}e^{i t \boldsymbol{\alpha} \boldsymbol{\sigma} }\right|_{t=0} = i \boldsymbol{\alpha} \boldsymbol{\sigma} \, .\]

A natural question that is hopefully on your mind is the surjectivity of the map from \(i \boldsymbol{\alpha} \boldsymbol{\sigma}\) to elements of \(SU(2)\). Can we reach any element via the exponential? We can approach this by brute force and start working out

1.7.

  1. Show that \[\begin{equation} \label{eq:cliffordpauli} \sigma_i \sigma_j + \sigma_j \sigma_i = 2 \delta_{ij} \mathds{1} \end{equation}\] where \(\sigma_i\) are the Pauli matrices.

  2. Show that \[g(\boldsymbol{\alpha}) = e^{i \boldsymbol{\alpha} \boldsymbol{\sigma}} = \begin{pmatrix} \cos(a) + i \sin(a) a_3/a & \sin(a) a_2/a + i \sin(a) a_1/a \\ -\sin(a) a_2/a + i \sin(a) a_1/a &\cos(a) - i \sin(a) a_3/a \end{pmatrix}\] where \(a = \sqrt{\alpha_1^2 + \alpha_2^2+\alpha_3^2}\). [hint: write \(\boldsymbol{\alpha} = a \boldsymbol{n}\) with \(|\boldsymbol{n}|^2=1\), i.e. \(n_j = \alpha_j/a\)]

As expressions of the type \(AB+BA\) appear frequently they are given a special name:

Definition 1.9. For two matrices \(A,B\), the anti-commutator \(\{.,.\}\) is \[\{A,B\} = AB+BA\, .\]

The remaining question is if we can choose an \(\alpha\) that maps to any element of \(SU(2)\). We will examine if every point on the \(S^3\) that is \(SU(2)\) is in the image of this map. We write a general group element as \[g = \begin{pmatrix} x_1 + i x_2 & x_3 + i x_4 \\ -x_3 + i x_4 & x_1 - i x_2 \end{pmatrix}\] where \(x_i \in \mathbb{R}\) subject to \(x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\). First note that \(x_1 \in [-1,1]\). Fixing \(x_1\) gives a slice of \(S^3\) that is a two-sphere of radius \(\sqrt{1 - x_1^2}\). Comparing to the form of \(g(\boldsymbol{\alpha})\) we can always choose \(a\) such that \(\cos(a) = x_1\). Note that this is not unique, but there always exists and \(a\) such that this is satisfied for any \(x_1\). The other variables are mapped as \[\begin{equation} \label{eq:alphavsx} \begin{pmatrix} x_2 \\ x_3 \\ x_4 \end{pmatrix} = \frac{\sin(a)}{a} \begin{pmatrix} \alpha_3 \\ \alpha_2 \\ \alpha_1 \end{pmatrix}\, . \end{equation}\] The question is now if we can always find \(\alpha\) such that this equation is satisfied for every \(x_1,x_2,x_3,x_4\) subject to \(x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\). As we have already set \(\cos(a) = x_1\), this fixes the length of \(\boldsymbol{\alpha}\) to be \[\boldsymbol{\alpha}^2 = \alpha_1^2 + \alpha_2^2 + \alpha_3^2 = a^2\] which is a two-sphere as well. The points on this two-sphere are mapped to the two-sphere \[x_2^2 + x_3^2 + x_4^2 = 1 - x_1^2\] by \(\eqref{eq:alphavsx}\), which is a one-to-one map. This is consistent as \[x_2^2 + x_3^2 + x_4^2 = 1-x_1^2 = 1- \cos(a)^2 = \sin(a)^2 = \frac{\sin(a)^2}{a^2} (\alpha_1^2 + \alpha_2^2 +\alpha_3^2)\, .\] Hence we have shown that

Theorem 1.1. Every element of the group \(SU(2)\) can be written as \[g(\boldsymbol{\alpha}) = e^{i \boldsymbol{\alpha}\cdot \boldsymbol{\sigma} }\] where \(\sigma_j\) are a basis for infinitesimal transformations (which can be chosen as the matrices \(\eqref{eq:pauli_matrices}\)) and \(\alpha_j \in \mathbb{R}\) .

1.8. Let \(G\) be the set of complex \(2 \times 2\) matrices of the form \[g = \begin{pmatrix} \alpha & \beta \\ - \bar{\beta} & \bar{\alpha} \end{pmatrix}\] for \(\alpha, \beta \in \mathbb{C}\) and \(|\alpha|^2 + |\beta|^2 \neq 0\).

  1. Show that \(G\) is a group using matrix multiplication as the group operation.

  2. Show that \(SU(2)\) is a subgroup of \(G\).

  3. Show that \(V:= \left\{\gamma | g = e^{i \gamma} \in G\right\}\) is a vector space and find a basis for \(V\).

Let’s back up again and see what this example has taught us:

  1. We defined a continuous group by demanding that its action on vectors in \(\mathbb{C}^2\) leaves the inner form invariant.

  2. This group does not have a one-to-one map to \(\mathbb{R}^3\), it has a ‘non-trivial topology’ and can be identified with \(S^3\). Again there is the special point \(\mathds{1}\) lying on this sphere.

  3. We found that infinitesimal transformations are tangent to the group at the identity element. We can recover group elements by iterating infinitesimal transformations and taking a limit. This is the same as exponentiating the infinitesimal element and gives us a surjective map to \(SU(2)\). This is quite nice, as it means we can do all of our computations using the algebra of Pauli matrices \(\sigma_i\) instead of the group \(SU(2)\).

Example 1.4. \(\mathbf{SO(3)}\) vs. \(\mathbf{SU(2)}\)[ex:SO3vsSU2]

Definition 1.10. The group \(SO(3)\) is the group of real \(3 \times 3\) matrices \(S\) such that \(S^T = S^{-1}\) and \(\det S=1\).

In this example, we will explore the global structure of \(SO(3)\) by examining a clever map from \(SU(2)\) to \(SO(3)\). Let’s take the components of \(\boldsymbol{v}\) and rearrange them in a \(2 \times 2\) matrix \(M_{\boldsymbol{v}}\): \[M_{\boldsymbol{v}} = \begin{pmatrix} v_3 & v_1 - i v_2 \\ v_1 + i v_2 & -v_3 \end{pmatrix}\, .\] This is just a funny way of writing \(\mathbb{R}^3\). Note that addition of vectors in \(\mathbb{R}^3\) becomes addition of matrices \(M_v\), and multiplication of vectors in \(\mathbb{R}^3\) by a real number \(c\) becomes multiplication of \(M_v\) by \(c\).

Now consider \[F(g): M_{\boldsymbol{v}} \rightarrow g M_{\boldsymbol{v}} g^\dagger = F(g)[M_{\boldsymbol{v}}]\] for \(g \in SU(2)\). What this means is that for every \(g\) in \(SU(2)\), we get a map \(F(g)\) acting on \(\mathbb{R}^3\).

Proposition 1.5. \(F(g)\) is a group homomorphism from \(SU(2)\) to \(SO(3)\).

: problem class 1

REMARK: This homomorphism is not injective (it is not a group isomporphism), we have \[F(g) = F(-g) \, .\] as in both cases we act in the same way on \(\mathbb{R}^3\): \[\begin{aligned} F(g): &M_{\boldsymbol{v}} \rightarrow g M_{\boldsymbol{v}} g^\dagger \, \\ F(-g): &M_{\boldsymbol{v}} \rightarrow (-g) M_{\boldsymbol{v}} (-g)^\dagger = g M_{\boldsymbol{v}} g^\dagger \end{aligned}\] The element \(g = - \mathds{1}\) is in its kernel9, \[F(-\mathds{1}): M_{\boldsymbol{v}} \rightarrow -\mathds{1}M_{\boldsymbol{v}} (-\mathds{1})^\dagger = M_{\boldsymbol{v}}\] which is \(\mathds{1}\in SO(3)\).

You might wonder if the group homomorphism from \(SU(2)\) to \(SO(3)\) is surjective. Consider the following simple example of how a one-parameter subgroup of \(SU(2)\) is mapped: \[g(0,0,\theta) = \begin{pmatrix} e^{i\theta} & 0 \\ 0 & e^{-i\theta} \end{pmatrix} = e^{i \theta \sigma_3}\, .\] We have \[\begin{aligned} g(0,0,\theta) M_{\boldsymbol{v}} g(0,0,\theta)^\dagger =& \begin{pmatrix} e^{i\theta} & 0 \\ 0 & e^{-i\theta} \end{pmatrix} \begin{pmatrix} v_3 & v_1 - i v_2 \\ v_1 + i v_2 & -v_3 \end{pmatrix} \begin{pmatrix} e^{-i\theta} & 0 \\ 0 & e^{i\theta} \end{pmatrix} \\ =& \begin{pmatrix} v_3 & e^{2i \theta} (v_1 - i v_2) \\ e^{-2i \theta} (v_1 + i v_2) & -v_3 \end{pmatrix} \end{aligned} \,.\] As \[\begin{aligned} e^{2i \theta} (v_1 - i v_2) &= (\cos(2\theta)+i \sin(2\theta) ) (v_1 - i v_2) \\ &= v_1 \cos(2\theta) + v_2 \sin(2\theta) -i (v_2 \cos(2\theta)- v_1 \sin(2\theta) ) \end{aligned}\] this map sends \[\begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} \mapsto \begin{pmatrix} v_1 \cos (2 \theta) + z_2 \sin(2 \theta) \\ v_2 \cos(2 \theta)-z_1 \sin (2 \theta) \\ v_3 \end{pmatrix} = \begin{pmatrix} \cos (2 \theta) & \sin(2 \theta) & 0 \\ - \sin (2 \theta) & \cos(2 \theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} v_1\\ v_2\\ v_3 \end{pmatrix} \, .\] In other words \[\Phi(g(0,0,\theta)) = \begin{pmatrix} \cos (2 \theta) & \sin(2 \theta) & 0 \\ - \sin (2 \theta) & \cos(2 \theta) & 0 \\ 0 & 0 & 1 \end{pmatrix} \, .\] This \(g\) hence maps to a rotation around the \(v_3\) axis by \(2 \theta\).10 Similarly, one can show that rotations around other axes are generated by using other group elements of \(SU(2)\); e.g. for a rotation around the axis \(\boldsymbol{v}_0\) simply use \(e^{i \theta \boldsymbol{v_0} \cdot \boldsymbol{\sigma}}\).

If we can write any element of \(SO(3)\) as a composition of rotations around fixed axis, such as \(\boldsymbol{v}_1\), \(\boldsymbol{v}_2\) and \(\boldsymbol{v}_3\), we have hence proven that the homomorphism from \(SU(2)\) to \(SO(3)\) is surjective. This is indeed true as the following proposition shows:

Proposition 1.6. Every element \(R\) of \(SO(3)\) can be written as a product of three rotations \(R_{\boldsymbol{v}_i}(\phi_i)\) around fixed axis \(\boldsymbol{v}_i\) by angles \(\phi_i\): \(R = R_{\boldsymbol{v}_1}(\phi_1) R_{\boldsymbol{v}_2}(\phi_2) R_{\boldsymbol{v}_3}(\phi_3)\).

: see . You can also find a proof of this in many texts on mechanics.

Theorem 1.2. There is surjective group homomorphism from \(SU(2)\) to \(SO(3)\). It is not injective, and its kernel is the cylic group with two elements: \(\mathbb{Z}_2 = \{\mathds{1},-\mathds{1}\} \in SU(2)\). As there are exactly two points in \(SU(2)\) that are mapped to each point in \(SO(3)\), this map is called a (double) covering map. We can write \[SO(3) \simeq SU(2)/\mathbb{Z}_2 = S^3/\mathbb{Z}_2\] where the \(\mathbb{Z}_2\) acts by identifying antipodal points on the three-sphere.

: We have shown above that it is a non-injective homomorphism with kernel \(\{\mathds{1},\mathds{1}\}\) and that any rotation around a fixed axis is in the image. As any element in \(SO(3)\) can be written as a product of three rotations, any element of \(SO(3)\) is in the image of this homomorphism, it is surjective. We have also seen that \(SU(2)\) is isomorphic to \(S^3\). Antipodal points on \(S^3\) correspond to sending \(x_i \rightarrow -x_i\) for any solution to \[x_1^2 + x_2^2 + x_3^2 + x_4^2 =1.\] As \[g = \begin{pmatrix} x_1 + i x_2 & x_3 + i x_4 \\ -x_3 + i x_4 & x_1 - i x_2 \end{pmatrix}\] \(g\) and \(-g\) are hence antipodal points in \(SU(2)\). We have seen that \(g\) and \(-g\) are mapped to the same element of \(SO(3)\). If we hence identify \(g\) and \(-g\) in \(SU(2)\), this group homomorphism becomes a group ismorphism. As the map from \(SU(2)\) to \(SO(3)\) is surjective, we conclude that \(SO(3) \simeq SU(2)/\mathbb{Z}_2 = S^3/\mathbb{Z}_2\). \(\square\)


REMARK: \(^\ast\) One may be forgiven to think that the group \(SO(3)\) of rotations in \(\mathbb{R}^3\) is a two-sphere by imagining all the positions that a vector \(\boldsymbol{v}\) in \(\mathbb{R}^3\) can be rotated to. This is NOT true, however, as there are non-trivial rotations that leave the chosen vector \(\boldsymbol{v}\) invariant. The subgroup of \(SO(3)\) that leaves any \(\boldsymbol{v}\neq 0\) invariant is a \(U(1) = S^1\) which miraculously combines with \(S^2\) to form \(S^3/\mathbb{Z}_2\). Studying the same for the double cover \(SU(2)\) reveals that \(S^3\) is in fact a fibration of \(S^1\) over \(S^2\). This is called the Hopf fibration and is very pretty.

The topology of \(SU(2)\) vs. that of \(SO(3)\) \(^\ast\)

As a final observation, let us examine the topology of those two groups by considering closed loops, i.e. continuous maps \(\phi:[0,1] \rightarrow G\) such that \(\phi(0) = \phi(1)\). For \(SU(2)\) there are no non-trivial such maps: any closed loop in \(S^3\) can be shrunk to a point. Such spaces are called simply connected.

Now consider a path going from \(\mathds{1}\) to \(-\mathds{1}\) in \(SU(2)\). Under \(F\), this maps to a closed path in \(SO(3)\) that starts and ends at \(\mathds{1}\). Let us see if we can shrink this curve in \(SO(3)\). If we continously deform this curve, it will still lift to an open curve in \(SU(2)\), although now it may go from \(g\) to \(-g\) in \(SU(2)\). But this means there is no way of shrinking it! Hence \(SO(3)\) is not simply connected. If we consider looping twice around any loop in \(SO(3)\), we can lift to a closed curve in \(SU(2)\), which we already know can be collapsed. We have hence shown that the fundamental group of \(SO(3)\) contains a \(\mathbb{Z}_2\) element (in fact, this is the whole fundamental group). For a given manifold with non-trivial fundamental group, there is a unique way to find a covering space (called universal cover) that is simply connected: \(SU(2)\) is the universal cover of \(SO(3)\).


In this example we have seen:

  1. Another continuous group defined by demanding that its action on vectors in \(\mathbb{R}^3\) leaves the inner form invariant.

  2. This group has an even more interesting topology; it is isomorphic to \(S^3/\mathbb{Z}_2\).

  3. Curiously, we found this via a surjective homomorphism from \(SU(2)\) to \(SO(3)\). This in particular gave us an action of \(SU(2)\) on vectors in \(\mathbb{R}^3\) instead of the usual action on \(\mathbb{C}^2\). This came at the price that \(g\) and \(-g\) act in the same way though.

In the motivating examples, we have mostly used a fairly pedestrian approach, but have already discovered many interesting things. In the following, these will be appropriately formalized and generalized.

1.2 Differentiable Manifolds

The groups we have investigated in our motivating examples were fundamentally different from \(\mathbb{R}^n\). Whereas we could cover parts using (subsets) of \(\mathbb{R}^n\), we could not find one-to-one maps to these as a whole. As you might anticipate, such a behaviour is not exclusive to continuous groups, but gives rise to the more general notion of a differentiable manifold. In this section, we will introduce these objects and give some more elementary examples. For a lively introduction to elementary topology I can recommend the book .

Let us first review the notion of open and closed sets in \(\mathbb{R}^n\). An open ball in \(\mathbb{R}^n\) centred at \(\boldsymbol{p}\) is the set \[B_r (\boldsymbol{p}) = \{ \boldsymbol{x} \in \mathbb{R}^n | \parallel\boldsymbol{x}-\boldsymbol{p}\parallel ^2 < r^2 \} \, .\] Using this, we can define open and closed sets as

Definition 1.11. A subset \(U\) of \(\mathbb{R}^n\) is open if for every point \(p\) in \(U\) there is an \(r\) such that \(B_r (\boldsymbol{p})\) is fully contained in \(U\).

Definition 1.12. A subset \(U\) of \(\mathbb{R}^n\) is closed if its complement \(\mathbb{R}^n \setminus U\) is open.

Note that not every subset of \(\mathbb{R}^n\) has to be either closed or open. Furthermore, these properties are not mutually exclusive. Take some time to think of some examples that are open but not closed, closed but not open, not open and not closed, or both open and closed. Defining what we mean by open and closed set in a way we just did is somtimes called defining a topology.

We now want to be able to talk about spaces which are not as simple \(\mathbb{R}^n\). If such spaces have a notion of distance (e.g. because they are sitting inside \(\mathbb{R}^n\)) we can simple repeat the above to describe which sets are open. If the space we want to talk about sits inside \(\mathbb{R}^n\) as in the examples we gave in the last section, such a notion can be inherited using the induced topology. However, we want to be clear about which structures we put on a space which starts out as merely a set of points. The following is a crucial observation.

Proposition 1.7. Arbitrary unions and finite intersections of open sets in \(\mathbb{R}^n\) are open again. For closed sets of \(\mathbb{R}^n\) it consequently works the opposite way: arbitrary intersections and finite unions are closed again.

:

1.9. Prove that arbitrary unions and finite intersections of open sets in \(\mathbb{R}^n\) are again open. Why is the intersection of an infinite number of open sets not open in general ?

Definition 1.13. A topological space is a pair \((X,\mathcal{O})\) of a set \(X\) together with a set \(\mathcal{O}\) of subsets of \(X\) called open sets such that

  • Any union of open sets is open

  • The intersection of any two open sets is open

  • The empty set \(\emptyset\) and \(X\) itself are open

The set \(\mathcal{O}\) is also called the topology of \((X,\mathcal{O})\).

We can colloquially think of open sets that contain a point \(p\) as neighboorhoods of \(p\), so the notion of topology gives a coarse way of saying which points are close without introducing such concepts as distance. For some set of points, it is up to us to declare what these neighboorhoods are. There are extreme choices such as the trivial topology in which only \(X\) and \(\emptyset\) are open and the discrete topology in which all subsets of \(X\) are open sets.

Example 1.5. The standard topology of \(\mathbb{R}^n\). For \(\mathbb{R}^n\) there is a standard choice of topology that fits with our usual intuition about what it means for two points to be close: we declare that \[B_r(\boldsymbol{p}) \equiv \{ \boldsymbol{x} \in \mathbb{R}^n |\,\, ||\boldsymbol{x} - \boldsymbol{p}||^2 < r^2 \}\] are open. We then generate all open sets by extending this according to the axioms: arbitrary unions and finite intersections of these are again open.

We can turn any subset of \(\mathbb{R}^n\) into a topological space by inheriting the notion of open set from \(\mathbb{R}^n\), this is called the induced topology:

Definition 1.14. For a subset \(S\) of \(\mathbb{R}^n\) we define the induced topology by declaring that \(V \subset S\) is open if \(V = U \cap S\) with \(U\) open in \(\mathbb{R}^n\). Closed sets are again defined as complements of open sets.

1.10. Which of the following sets are closed? Which are open?

  1. \(\{ 0 < x < \pi \} \subset \mathbb{R}\) with coordinate \(x\)

  2. \(\{x_1 < -2 \} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)

  3. \(\{ 0 < x \leq \pi \} \subset \mathbb{R}\)

  4. \(\{ 0 < x_1 < 1 \} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)

  5. \(\mathbb{R}^n \subseteq \mathbb{R}^n\)

  6. \(\{(x_1,x_2)\subset \mathbb{R}^2 \, |\, x_1^2 \leq 42 - x_2^2\} \subset \mathbb{R}^2\) with coordinates \((x_1,x_2)\)

  7. \(\{(x_1,x_2) |x_1^2 + x_2^2 = 1\} \subset \mathbb{R}^3\)

  8. \(\{(x_1,x_2) |x_1^2 + x_2^2 = 1\} \subset \{(x_1,x_2,x_3) |x_1^2 + x_2^2 + x_3^4 = 1\}\) with the topology induced from \(\mathbb{R}^3\)

Having replaced the notion of distance with the more coarse notion of topology, we need to say when we call maps continuous.

Definition 1.15. A map \(f:X\rightarrow Y\) between topological spaces \(X\) and \(Y\) is called continuous if the set \(f^{-1}(U)\) is open in \(X\) whenever \(U\) is open in \(Y\).

For maps \(f:\mathbb{R}^n \rightarrow \mathbb{R}^m\) and using the standard topology, this agrees with the usual \(\epsilon \delta\) definition from analysis.

Definition 1.16. A one-to-one map \(f:X\rightarrow Y\) between topological spaces \(X\) and \(Y\) is called a homeomorphism if both \(f\) and \(f^{-1}\) are continuous.

These are the maps that preserve the structure of topological spaces.
Something we are used to from \(\mathbb{R}^n\) is that different points can always be given separate neightboorhoods. Among other things, this implies that limits of convergent sequences must be unique. This does not need to hold in general, however. If you look at the definition, we could in fact get away with defining a topology where only the empty set and the whole set \(X\) is open. This is called the ‘indiscrete topology’ and no two points can be separated.


Example 1.6. \(^\ast\) Here is a more interesting example of a space where we can’t separate some points: Consider \(\mathbb{R}^2\) and identify points \(x,y\) by \((x,y) \simeq (lx,ly)\) for \(l \in \mathbb{R}\setminus \{0\}\). This defines a map \(\pi: \mathbb{R}^2 \rightarrow X\), where \(X\) is the quotient by the equivalence relation \(\simeq\). Declare that \(U\) in \(X\) is open if \(\pi^{-1}(U)\) is open in \(\mathbb{R}^2\) (this is called the quotient topology, which is the finest topology such that \(\pi\) is continuous). Consider a subset of \(X\) that contains the origin \((0,0)\). If it only contains the origin, \(\pi^{-1}((0,0)) = (0,0)\), so this is not an open set. In fact, the only subset of \(X\) that contains the origin and is open is \(X\) itself. Hence the origin is only contained in an open set that also contains all other points. Happily, \(\mathbb{R}\mathbb{P}^1\) is defined as \(\mathbb{R}^2\setminus (0,0)/\simeq\), and does not suffer from this problem.


Definition 1.17. A topological space \((X,\mathcal{O})\) is called Hausdorff is for any two points \(p_1\) and \(p_2\) there are open sets \(U_1\) and \(U_2\) in \(\mathcal{O}\) such that \(p_1 \in U_1\), \(p_2 \in U_2\) and \(U_1 \cap U_2 = \emptyset\).

Example 1.7. We can define \(n\)-dimensional spheres as the subset of \(\mathbb{R}^n\) that obeys \[S^n \equiv \{\boldsymbol{x}\,\, | \,\,\boldsymbol{x}^2 = 1\} \, ,\] equipped with the inherited topology from \(\mathbb{R}^{n+1}\). You should convince yourself that this gives a sphere a topology that obeys the definition and that is furthermore Hausdorff.

We are now ready to define differentiable manifolds.

Definition 1.18. A topological space \((X,\mathcal{O})\) is called an \(n\)-dimensional differentiable manifold if the following conditions are met

  • \((X,\mathcal{O})\) is Hausdorff.

  • For every point \(p\) of \(X\) there is an open set \(U_i \in \mathcal{O}\) and a homeomorphism \(\phi_i\) that maps \(U_i\) to an open subset \(\phi_i(U_i)\) of \(\mathbb{R}^n\). These are called coordinate charts or patches. The collection of patches \((U_i,\phi_i)\) is called an atlas.

  • We only need countably many \(U_i\) to cover all of \(X\).

  • The coordinate changes \(\phi_i\circ \phi_j^{-1}\) and their inverses \(\phi_j\circ \phi_i^{-1}\) are \(C^\infty\) (‘smooth’) in their domains, i.e. they are continuous one-to-one maps that have infinitely many continuous derivatives.


REMARK: \(^\ast\) For a given topological space, there may in general be several incompatible ways of turning it into a differentiable manifold. This realization came as quite a shock to mathematicians in the 50s, when John Milnor showed there are at least \(7\) different such structures on the \(7\) sphere. This can even happen for \(\mathbb{R}^n\), but only when \(n=4\). In this case, there is an amazing result by Taubes which says that there is a continuum of inequivalent ways in which \(\mathbb{R}^4\) is a differentiable manifold. It is fascinating that the number \(4\) is singled out by asking such an elementary question. Details of this story can be found in .


The objects we have just defined locally look like \(\mathbb{R}^n\), and the patches that provide this identification are sewn together in a smooth way as shown in figure 1.4. This means that we can translate calculus on \(\mathbb{R}^n\) to calculus on \(X\). The coordinate changes are maps \[\phi_i\circ \phi_j^{-1}: \phi_j(U_i \cap U_j) \rightarrow \phi_i(U_i \cap U_j) \, .\] Note that what we have done is to construct such spaces in such a way that we can do calculus on them, but there is no notion of lenght that we have introduced. Of course one could try to ‘import’ this from \(\mathbb{R}^n\), but this would depend on how we have decomposed our manifold into patches and will generally not be preserved by the coordinate changes.

1.11. Consider the sets of points in \(\mathbb{R}^2\) with coordinates \((x,y)\) defined implicitely by the following relations

  • \(y = x^3\)

  • \(xy = 0\)

  • \(x^2+y^4 = 0\)

  • \(x \geq y\)

  • \(y^2 + x^3 - 3x -2 = 0\)

Using the induced topology from \(\mathbb{R}^2\), decide in each case if this is a differentiable manifold.
  [hint: plot them!]

Example 1.8. Let us try to see how we can make a circle into a differentiable manifold. The first thing to notice is that we need to choose a topology \(\mathcal{O}\) on the set \(\{e^{i\psi}| \psi \in \mathbb{R}\}\). As \(U(1)\) naturally sits inside \(\mathbb{C}\simeq\mathbb{R}^2\), we can use this to define a topology. Using the induced topology, i.e. we declare that \[U \in U(1) \,\,\ \mbox{is open if and only if}\,\, U = U(1) \cap V\] for \(V\) open in \(\mathbb{R}^2\), turns \(U(1)\) into a topological space which is Hausdorff. The properties we need to check are inherited from \(\mathbb{R}^2\) being a Hausdorff topological manifold.

We already saw that we can write \[g = e^{i \psi}\] which however does not give us good coordinates: \(g(0)=g(2\pi) = 1\) for example. Note that it would be a bad idea to let \(\psi \in (-\pi .. \pi]\). This would not be an open set, so we cannot use this as a coordinate patch. Here is why coordinate patches are defined that way: even though this would give a one-to-one map to \(U(1)\), we still cannot do calculus on \(U(1)\) by doing calculus on \((-\pi .. \pi]\). A smooth function on \((-\pi .. \pi]\) would not even give us a continuous function on \(U(1)\) unless the limit of its value when going \(-\pi\) equals its value at \(\pi\).

Let us hence get rid of the multivaluedness of the coordinate \(\psi\) by restriciting its range such that \(\psi \in (-\pi .. \pi)\). Now we have a one-to-one map from the open interval \((-\pi .. \pi)\) to all of \(U(1)\) except the point \(g = -1\). To make sure we can cover all of \(U(1)\) by coordinates, we hence need a second patch. Let us set \(g = e^{i\pi + i \theta}\) and again let \(\theta \in (-\pi .. \pi)\). Now we can cover all of \(U(1)\) except \(g=1\). In summary, we have \[\begin{aligned} g& = e^{i\psi} \, , \,\,\, \phi \in (-\pi .. \pi) \\ g& = e^{i\pi + i \theta} \, , \,\,\, \theta \in (-\pi .. \pi) \end{aligned}\] However, we can now describe all points except \(1,-1\) in two ways using either \(\psi\) or \(\theta\). The coordinate changes are \[\begin{aligned} \psi &= \pi + \theta \, ,\,\, \theta < 0 \\ \psi &= -\pi + \theta \, ,\,\, \theta > 0 \end{aligned}\] Now we can use the open intervals described by \(\psi\) and \(\theta\) to construct functions on \(U(1)\), if the functions we consider agree on the overlap region \(g \neq 1,-1\). If it bothers you that the overlap region is composed of two disconnected pieces, it is not difficult to introduce more patches such that the overlap between each pair is either empty or a single interval. Try to write this down clearly for 3 or 4 patches. Note in particular that the choice above in which every patch covers all but a single point is somewhat special.

Example 1.9. Stereographic projection on \(S^3\)

As observed earlier, the group \(SU(2)\) looks like a three-dimensional sphere \[x_1^2 + x_2^2 + x_3^2 + x_4^2 = 1\, .\] In this example we will try to see how we can cover \(S^3\) with coordinate patches that are mapped to open subsets of \(\mathbb{R}^3\). This can be done by using what is called stereographic projection.11.

The first patch \(U_-\) covers everything except the point \((1,0,0,0)\) and the second patch \(U_+\) covers everything except the point \((-1,0,0,0)\). In \(U_-\) we assign a coordinate vector \(\boldsymbol{\varphi}_- \in \mathbb{R}^3\) to every point \(\boldsymbol{p} \in S^3 \setminus \{(1,0,0,0)\}\) by connecting \(\boldsymbol{p}\) to \((1,0,0,0)\) by a line and finding the point in the hyperplane at \(x_1=0\) pierced by this line, see figure 1.5. For \(U_+ = S^3 \setminus \{(-1,0,0,0)\}\) we repeat this procedure, but now take \((-1,0,0,0)\) instead of \((1,0,0,0)\) and denote the associated coordinate vector by \(\boldsymbol{\boldsymbol{\varphi}}_+\).

The line \(L_-(\boldsymbol{p})\) connecting \((1,0,0,0)\) to \(\boldsymbol{p} = (x_1,x_2,x_3,x_4)\) can be parametrized by \[L_-(\boldsymbol{p}) = \begin{pmatrix} 1 \\ 0 \\ 0 \\0 \end{pmatrix} + t \begin{pmatrix} x_1-1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix}\] and reaches \(\boldsymbol{p}\) at \(t=1\) and pierces the hyperplane at \(x_1=0\) for \(t = 1/(1-x_1)\). We hence find the coordinates of \(U_-\) as \[\boldsymbol{\varphi}_- = \frac{1}{1-x_1}\begin{pmatrix} x_2 \\ x_3 \\ x_4 \end{pmatrix}\, .\] Similarly, \(L_+(\boldsymbol{p})\) is given by \[L_+(\boldsymbol{p}) = \begin{pmatrix} -1 \\ 0 \\ 0 \\0 \end{pmatrix} + t \begin{pmatrix} x_1+1 \\ x_2 \\ x_3 \\ x_4 \end{pmatrix}\] which implies \[\boldsymbol{\varphi}_+ = \frac{1}{1+x_1}\begin{pmatrix} x_2 \\ x_3 \\ x_4 \end{pmatrix}\, .\] Note that these maps only make sense in the patches in which they are defined. The inverse of the maps to \(\boldsymbol{\varphi}_\pm\) is \[\begin{aligned} x_1 & = \frac{\pm(1- |\boldsymbol{\varphi}_\pm|^2)}{|\boldsymbol{\varphi}_\pm|^2+1} \\ \begin{pmatrix} x_2\\ x_3\\ x_4 \end{pmatrix} & = \frac{2}{|\boldsymbol{\varphi}_\pm|^2+1} \boldsymbol{\varphi}_\pm \end{aligned}\] which also allows us to transition from the coordinates \(\boldsymbol{\varphi}_-\) to \(\boldsymbol{\varphi}_+\).

What we have achieved is mapping \(U_\pm\) to \(\mathbb{R}^3\) in a one-to-one fashion. The overlap between these two patches is \(\mathbb{R}^3 \setminus \{(0,0,0)\}\) and we can think of each patch as filling in the origin. However under the transition points that are close to the origin in one patch are far away in the other (take some time to think about why this is true). What is the origin in one patch sits at infinity in the other. We can hence think loosely of \(S^3\) as \(\mathbb{R}^3\) to which a single point at infinity has been added. If you have trouble imagining this, the same logic applies to spheres in any dimensions, in particular the two- and one-dimensional spheres.
With the two patches \(U_\pm\) and the homeomorphisms given by stereographic projection, \(S^3\) becomes a differentiable manifold.

1.12. Consider the stereographic projection of the three-sphere \(S^3\).

  1. Show that the inverse of the map between \(\boldsymbol{\varphi}_\pm\) and the \(x_i\) is given by \[\nonumber \begin{aligned} x_1 & = \frac{\pm (1- |\boldsymbol{\varphi}_\pm|^2)}{|\boldsymbol{\varphi}_\pm|^2+1} \\ \begin{pmatrix} x_2\\ x_3\\ x_4 \end{pmatrix} & = \frac{2}{|\boldsymbol{\varphi}_\pm|^2+1} \boldsymbol{\varphi}_\pm \end{aligned}\]

  2. Consider the coordinate patches defined by stereographic projection on \(S^3\) and find the transition function from \(U_+\) to \(U_-\).


1.13. \(\ast\) The projective space \(\mathbb{C}\mathbb{P}^1\) is the space of complex lines in \(\mathbb{C}^2\) through the origin. We can also define it as follows. Take \(\mathbb{C}^2 \setminus \{0,0\}\), where \(\mathbb{C}^2\) has complex coordinates \(z_1,z_2\), and impose the equivalence relation \((z_1,z_2) = (\lambda z_1,\lambda z_2)\) for \(\lambda \in \mathbb{C}^* = \mathbb{C}\setminus \{0\}\). In other words \[\mathbb{CP}^1 = \frac{\mathbb{C}^2 \setminus \{(0,0)\}}{(z_1,z_2) \sim (\lambda z_1,\lambda z_2)} \,\,\, \mbox{for}\,\, \lambda \in \mathbb{C}^*\, .\] This space is given a Hausdorff topology by declaring that \(U \in \mathbb{C}\mathbb{P}^1\) is open if it is the image of an open set in \(\mathbb{R}^2\) under the above quotient.

In this exercise we will show that this space is isomorphic to the two-sphere by showing that we can use the same coordinate patches and transition functions.

  • Define two patches \(U_1 : z_1 \neq 0\) and \(U_2: z_2 \neq 0\) on \(\mathbb{C}\mathbb{P}^1\). Define coordinates \(\xi_1 = z_2/z_1\) and \(\xi_2 = z_1/z_2\) on the \(U_i\). Explain why these give a one-to-one map of each \(U_i\) to \(\mathbb{C}\). What is the coordinate change \(\xi_1 = \xi_1(\xi_2)\) on the overlap \(U_1 \cap U_2\).

  • Consider the stereographic projection of the two-sphere. Show that you end up with the same coordinate change as in a) when identifying \(\mathbb{R}^2\) and \(\mathbb{C}\) appropriately.



Example 1.10. \(^\ast\) [ex:implicitfct] Manifolds and the implicit function theorem.

We can describe a subspace \(X\) of \(\mathbb{R}^3\) given by the vanishing of a scalar function \(f(x,y,z)\): \[X: \{(x,y,z) \in \mathbb{R}^n | f(x,y,z) = 0 \} \, .\] as a manifold as follows. By the implicit function theorem (see AMV II) we can find a function \(g(x,y)\) such that \(f(x,y,g(x,y)) = 0\) in a neighboorhood \(V \subset \mathbb{R}^3\) of a point \(x_0,y_0,z_0\) where \(\partial f/\partial z (x_0,y_0,z_0)\neq 0\). Let us call \(\hat{U} = V \cap X\) and use \(\hat{x},\hat{y}\) as coordinates in \(\mathbb{R}^2\). For a point \(p = (x,y,z)\) in \(U\) we set \[\hat{\phi}: (x,y,z) \rightarrow (\hat{x},\hat{y})\] If \(\partial f/\partial z (x_0,y_0,z_0)=0\) but e.g. \(\partial f/\partial x (x_0,y_0,z_0) \neq 0\) we can use the same theorem for \(x = h(y,z)\) in a patch \(\tilde{U}\): \[\tilde{\phi}: (x,y,z) \rightarrow (\tilde{y},\tilde{z})\, .\] Recalling that \(z=g(x,y)\) and \(h= (y,z)\), the coordinate changes are given by \[\hat{\phi}\circ \tilde{\phi}^{-1} (\tilde{y},\tilde{z}) = \hat{\phi} (h(y,z),y,z) = (h(y,z),z)\] and \[\tilde{\phi}\circ \hat{\phi}^{-1} (\hat{x},\hat{y}) = \tilde{\phi} (x,y,g(x,y)) = (y,g(x,y))\, .\] A sketch of this situation is given in figure 1.6. Exercise: using the above strategy, find coordinate patches and coordinate changes on the two-sphere \(S^2\) where \(f = x^2+y^2+z^2\).
The only points \((x_0,y_0.z_0)\) at which this strategy fails is when \[\begin{aligned} f(x_0,y_0,z_0) & = 0 \\ \partial f/\partial x (x_0,y_0,z_0)=\partial f/\partial y (x_0,y_0,z_0)=\partial f/\partial z (x_0,y_0,z_0) & = 0\, . \end{aligned}\] Hence \(X\) can not be given the structure of a differentiable manifold at such points. These points are called singularities of the surface \(f(x,y,z)=0\).


Another concept which we encountered in the examples we studied was that of a tangent vector. The way we constructed these was simple and can be immediately generalized.

Definition 1.19. A path is a continuous map \(S\) from a open interval \((a,b) \subset \mathbb{R}\) to \(X\). Letting \(t = (a,b) \subset \mathbb{R}\) we can write this as \[S: t \mapsto q(t) \in X \, .\] We demand that such a map gives a differentiable function when expressed in local coordinates, where we can write \(\boldsymbol{s} = (x_1(t), x_2(t), x_3(t), \cdots)\) for differentiable functions \(x_i(t)\). Note that the image of \(S\) does not need to lie entirely in a single patch, so expressed in local coordinates the parameter \(s\) need not go from \(a\) to \(b\).

REMARK: As we have assumed that coordinate changes are smooth we can demand that the \(x_i(t)\) are differentiable functions in any coordinate patch.

Definition 1.20. A tangent vector at \(p\) is the derivative of a path passing through \(p\) with respect to its parameter \(t\), evaluated at \(p\). Assuming that \(t_0\) is such that \(q(t_0) = p\) we can write \[T_p(S) := \left.\frac{\partial q(t)}{\partial t}\right|_{t_0} \, .\] In local coordinates where \(p\) is mapped to a point \(x_0 \in \mathbb{R}^n\) the tangent vector has components \[\left[(T_p)(S)\right]_i = \left.\frac{\partial x_i(t)}{\partial t} \right|_{t_0} \, .\] For two different paths \(S\) and \(S'\) both passing through \(p\), we identify \(T(S)\) and \(T(S')\) if these vectors agree (which can easily be checked in local coordinates).

Example 1.11. Tangent vectors of \(SU(2)\) at \(p = \mathds{1}\) have the form \(i \sum_j \alpha_j \sigma_j\) with \(\sigma_j\) the Pauli matrices and \(\alpha_j \in \mathbb{R}\). This can be seen by writing down a path \[S: t \rightarrow e^{i t \alpha_j \sigma_j} \, .\] which passes through \(\mathds{1}\) at \(t=0\).

We now have \[T_\mathds{1}(S) = \left. \frac{\partial }{\partial t} e^{i t \alpha_j \sigma_j} \right|_{t = 0} = i \alpha_j \sigma_j\, .\]

Proposition 1.8. The tangent vectors \(T_p(S)\) at a point \(p\) form a real \(n\)-dimensional vector space \(T_pX\), the tangent space at \(p\). : To show this, we need to check that (i) both multiples \(c\,\, T(S)\) for \(v \in \mathbb{R}\) and (ii) sums \(T(S)+T(S')\) also satisfy our definition of what a tangent vector is. Finally, we need to show that (iii) this vector space has a basis of dimension \(n\). We will do this in a local patch where we can choose coordinates and denote the image of \(p\) by \(\boldsymbol{x}_0\).

  • Given a path \(S: x_i(t)\) that defines a tangent vector \(\left[T_p(S)\right]_i = \partial x_i(t)/\partial t|_{t_0}\) at \(p\), i.e. \(\phi(p) = \boldsymbol{x}_0\) we can consider the path \(S_c\) defined by \(\boldsymbol{x}(c t)\). This path also runs through \(p\) and the components of its tangent vector are \[\left[T_p(S_c)\right]_i = \left.\frac{\partial x_i'(ct)}{\partial t}\right|_{\boldsymbol{x}_0} = c \left.\frac{\partial x_i'(t)}{\partial t}\right|_{\boldsymbol{x}_0} = c \left[T_p(S)\right]_i.\] For any tangent vector, there is hence another one with components that are a rescaled by a real number \(c\).

  • Given two tangent vectors at \(p\) associated to paths \(S\) (with coords \(\boldsymbol{x}(t)\)) and \(S'\) (with coords \(\boldsymbol{x}'(t)\)), we can form the following path \(S''\) (again in local coords) \[\boldsymbol{x}''(t) = \frac{1}{2}\left( \boldsymbol{x}(2t) + \boldsymbol{x}'(2t) \right) \,\] As \(\boldsymbol{x}(t)\) and \(\boldsymbol{x}'(t)\) both pass through \(x_0\), \(\boldsymbol{x}''(t)\) does so as well. At \(x_0\) we can compute \[\left[T_p(S'')\right]_i = \left. \frac{\partial x_i''(t)}{\partial t}\right|_{x_0} = \frac{1}{2}\left(\left. \frac{\partial x_i(2t)}{\partial t}\right|_{x_0} + \left.\frac{\partial x_i'(2t)}{\partial t}\right|_{x_0}\right) = \left[T_p(S')\right]_i + \left[T_p(S'')\right]_i\]

  • To see this, note that we can choose paths that have \(x_i(t) = t\) and all other components vanishing. For such paths \[\left. \frac{\partial\boldsymbol{x}(t)}{\partial t}\right|_{x_0} = (0,\cdots, 0,1,0,\cdots,0) \,,\] with an entry only at the \(i\)th component. There are hence \(n\) linearly independent elements.

1.14. \(O(1,1)\) are the real \(2 \times 2\) matrices \(O\) which leave the bilinear form \(x_1^2 - x_2^2\) invariant when acting on \(\boldsymbol{x} = (x_1,x_2)\) as \[\boldsymbol{x} \rightarrow O \boldsymbol{x}\, .\]

  1. Show that \(O(1,1)\) is a group using matrix multiplication.

  2. Find the general form of elements of \(O(1,1)\).

  3. Give \(O(1,1)\) the structure of a differentiable manifold by equipping it with a suitable topology and write down coordinate charts.

  4. Find the tangent space of \(O(1,1)\) at the identity element.

1.3 Lie groups

We are now ready to formally welcome Lie groups to these lectures. The idea here is simple: Lie groups unite the structures of groups and differentiable manifolds in a compatible way:

Definition 1.21. A Lie group is a group that is also a differentiable manifold such that the group operations \[\begin{aligned} \circ :\,& G \times G \rightarrow G \hspace{1cm} &(x,y) \rightarrow x \circ y \\ ^{-1}:\,& G \rightarrow G \hspace{1cm} &x \rightarrow x^{-1} \end{aligned}\] are differentiable maps. We will always assume that our Lie groups are finite-dimensional manifolds.

Example 1.12. The group \(\mathbb{C}^* \equiv C \setminus \{0\}\) is a Lie group under multiplication. The map \[(x,y) \rightarrow xy\] is a differentiable map from \(\mathbb{C}^* \times \mathbb{C}^*\) to \(\mathbb{C}^*\), and \(x \rightarrow 1/x\) is a differentiable map from \(\mathbb{C}^*\) to \(\mathbb{C}^*\). This is an example of an abelian group.

Proposition 1.9. The group \(GL(n,\mathbb{R})\) of real invertible \(n \times n\) matrices is a Lie group under matrix multiplication. Here, the space of invertible matrices is given a topology by considering it as a subspace of \(\mathbb{R}^{n\cdot n}\) equipped with the standard topology.

: Recall the definition (from AMV II): a map is differentiable if it can locally be approximated by a linear map. Let us see if this is true for matrix multiplication. For two matrices \(P,Q \in GL(n,\mathbb{R})\), the group operation is the map \[(P,Q) \rightarrow PQ \, .\] To examine if this can be approximated by a linear map we change \(P\) to \(P+\Delta_P\) and \(Q\) to \(Q+\Delta_Q\): \[\begin{aligned} (P+\epsilon\Delta_P,Q+\epsilon\Delta_Q) \rightarrow (P+\epsilon\Delta_P) (Q + \epsilon\Delta_Q) &= PQ + P \epsilon\Delta_Q + \epsilon\Delta_P Q + \epsilon^2 \Delta_P \delta_Q \\ &\simeq PQ + \epsilon (P \Delta_Q + \Delta_P Q) \end{aligned}\] which is manifestly linear in both \(\Delta_P\) and \(\Delta_Q\). Cramer’s rule for constructing inverse matrices similarly shows that \(P \rightarrow P^{-1}\) is differentiable \(\square\)

Theorem 1.3. A closed subgroup \(H\) of \(GL(n,\mathbb{R})\) is again a Lie group.

REMARK: The word ‘closed’ does not refer to the group operation on elements being closed in \(H\) (this must be true for \(H\) to be a subgroup anyway) but is meant in the topological sense: \(H\) is a closed subset of \(GL(n,\mathbb{R})\). : An elementary proof of this result can be found in .

Definition 1.22. Lie groups that are closed subgroups of \(GL(n,\mathbb{R})\) are called matrix Lie groups.

REMARK: Not all Lie groups are of this type, but in this course we will only study those.

We can hence get our hands on a lot of examples by finding closed subgroups of \(GL(n,\mathbb{R})\). We start with

Definition 1.23. The orthogonal group \(\mathbf{O(n)}\) is the group of real \(n\times n\) matrices \(g\) such that \[g^T g = \mathds{1}\, .\] The special orthogonal group \(\mathbf{SO(n)}\) is the subgroup of matrices in \(O(n)\) that have determinant \(\det g = 1\).

REMARK: The group \(O(n)\) consists of those invertible maps acting on a real vector space \(\mathbb{R}^n\) such that the canonical inner form stays invariant: \[\boldsymbol{x} \cdot \boldsymbol{y} \rightarrow \boldsymbol{x}' \cdot \boldsymbol{y}' = \boldsymbol{x}^T \,g^T g\, \boldsymbol{y} = \boldsymbol{x} \cdot \boldsymbol{y} \, .\]

Corollary 1.1. \(O(n)\) and \(SO(n)\) are Lie groups.

: One can quickly check that these are indeed groups, they are obviously subgroups of \(GL(n,\mathbb{R})\). The conditions that \(g\) has to satisfy in order to be in \(O(n)\) only hold true on the closed subset where the defining relation \(g^Tg=\mathds{1}\) holds true. To be more precise, for any matrix that does not satisfy these equations, we can find a little ball \(GL(n,\mathbb{R})\) such that \(g^T g \neq \mathds{1}\) for every member of this ball. The complement of \(O(n)\) in \(GL(n,\mathbb{R})\) is hence open, which means that \(O(n)\) is closed.

For \(SO(n)\) we can repeat a similar argument.

1.15. Make the argument above precise, i.e. show that for every \(g \in GL(n,\mathbb{R}) \setminus O(n)\), i.e. \(g \in GL(n,\mathbb{R})\) such that \(g^T g \neq \mathds{1}\), there is an open set \(U_g\) containing \(g\) such that \(U_g\) is entirely contained in \(GL(n,\mathbb{R}) \setminus O(n)\).
hint: \(GL(n,\mathbb{R})\) inherits its topology from the vector space \(V_{n \times n}\) of real \(n \times n\) matrices, which is isomorphic to \(\mathbb{R}^{n^2}\): the \(n^2\) entries of such a matrix are just the components of a vector in \(\mathbb{R}^{n^2}\) from this perspective. We can hence describe the open ball of radius \(r\) around a matrix \(M\) with components \(M_{ij}\) as \[B_r(M) \left\{ \left. N \in V_{n \times n} \right| \sum_{ij} \left( N_{ij} -M_{ij} \right)^2 < r \right\} \, .\]

REMARK: For \(g \in O(n)\) it follows that \(\det ( g^T g ) = (\det g)^2 = \det \mathds{1}= 1\). As \(g\) is a real matrix, we hence have \(\det g = \pm 1\). The space of such matrices is hence disjoint with two components, the one that contains the identify (which has \(\det \mathds{1}=1\)) is called \(SO(n)\) and is a subgroup. The other component is not a subgroup.

REMARK: Conditions such as \(g^T g = \mathds{1}\) and \(\det g = 1\) are typically called ‘closed conditions’ as the sets they define are closed sets in the vector space of all matrices.

1.16. \(GL(n,\mathbb{C})\) is the group of invertible complex \(n \times n\) matrices. Show that \(GL(n,\mathbb{C})\) is a Lie group.

Definition 1.24. The unitary group \(\mathbf{U(n)}\) is the group of complex \(n\times n\) matrices \(g\) such that \(U^\dagger U = \mathds{1}\). The special unitary group \(\mathbf{SU(n)}\) is the subgroup of matrices in \(U(n)\) that have determinant \(\det g = 1\).

REMARK: The group \(U(n)\) consists of those invertible maps acting on a complex vector space \(\mathbb{C}^n\) such that the canonical inner form stays invariant: \[\bar{\boldsymbol{x}} \cdot \boldsymbol{y} \rightarrow \bar{\boldsymbol{x}}' \cdot \boldsymbol{y}' = \bar{\boldsymbol{x}} \,\bar{g}^T g\, \boldsymbol{y} = \bar{\boldsymbol{x}} g^\dagger g \boldsymbol{y} = \bar{\boldsymbol{x}} \cdot \boldsymbol{y} \, .\]

Corollary 1.2. The unitary and special unitary groups are Lie groups.

: These are both closed subgroups of \(GL(n,\mathbb{R})\) by identifying \(\mathbb{C}\) with \(\mathbb{R}^2\). \(\square\)

1.4 Lie algebras

The idea of a Lie algebra is to formalize the notion of infinitesimal transformation. We first define Lie algebras abstractly.

Definition 1.25. A Lie algebra \(\mathfrak{g}\) is a vector space together with a bilinear map (‘Lie bracket’) \[: \mathfrak{g} \times \mathfrak{g} \rightarrow \mathfrak{g}\] that is antisymmetric \([x,y] = -[y,x]\), and satisfies the Jacobi identity: \[\begin{equation} \label{eq:Jacobi} [x,[y,z]] + [y,[z,x]] + [z,[x,y]] = 0 \, . \end{equation}\] for all \(x,y,z \in \mathfrak{g}.\)

REMARK: This definition does not say if we should think of \(\mathfrak{g}\) as a real or complex vector space, so one and the same algebra can have different ‘real’ or ‘complex’ forms (or even forms over other fields).

Theorem 1.4. Every Lie group comes equipped with a Lie algebra which is equal to its tangent space at the identity element. \[\mathfrak{g} = T_{\mathds{1}}G \, .\]

: This is already a vector space by construction. For matrix Lie groups, we can simply take the bilinear form \([,]\) to be the commutator, which clearly satisfies the Jacobi identity. We will show in lemma Lemma 1.1 below that the commutator of two Lie algebra elements indeed returns a Lie algebra element. The general case requires some more technology we have not introduced, see for details.

Corollary 1.3. The dimension of the Lie algebra (as a vector space) is equal to the dimension of its Lie group (as a differentiable manifold).

1.17. Find the dimension of the group \(SO(n)\) by finding the dimension of its Lie algebra.

Example 1.13. The Lie algebra \(\mathfrak{u}(1)\) of \(U(1)\) are the purely imaginary numbers and \([\gamma,\gamma']= 0\) for all \(\gamma,\gamma' \in \mathfrak{u}(1)\).

Example 1.14. The Lie algebra of \(\mathbb{C}^*\) are the complex numbers and \([\gamma,\gamma']= 0\) for all \(\gamma,\gamma'\) in the Lie algebra \(\mathbb{c}^*\) of \(\mathbb{C}^*\).

Example 1.15. The Lie algebra \(\mathfrak{su}(n)\) of \(SU(n)\) was found for the case \(n=2\) in example Example 1.2. It is a real vector space with basis the complex \(n \times n\) matrices \(\gamma\) such that \(\gamma^\dagger = - \gamma\) and \(tr\gamma = 0\). We can use \(i\) times the three Pauli matrices \(\sigma_j\) as basis vectors for \(n=2\). Note that while these have complex entries, this is a real vector space !

Example 1.16. As we have seen before, the group \(SU(2)\) is a double cover of \(SO(3)\). This means that a small neighboorhood of the identity \(\in SU(2)\) is isomorphic to a small neighboorhood of the identity \(\in SO(3)\), so that these two groups have isomorphic Lie algebras. You can also check this explicitely by working out the Lie algebra of \(SO(3)\), see problem classes 2 and 3. We can hence have different groups that have the same Lie algebras.

Definition 1.26. For any Lie algebra, we can choose a basis \(\{t_a\}_{a=1}^{\dim \mathfrak{g}}\) of so called generators \(t_a\). In this basis the Lie bracket reads \[\begin{equation} \label{eq:structure_constants} [t_a, t_b] = f_{ab}{}^c t_c \qquad (a,b,c=1,\dots,\dim\mathfrak{g}) \end{equation}\] where the \(f_{ab}{}^c\) are called structure constants, which express the component of the Lie bracket \([t_a,t_b]\) along the generator \(t_c\). Repeated indices are summed over.

REMARK: While there are reasons for putting one index up in the expression \(f_{ab}{}^c\), you can completely ignore this for now. Just think about \(f_{ab}{}^c\) as producing a number for any \(a,b,c\) and think about the positioning of the indices as a pure convention.

The Jacobi identity,\(\eqref{eq:Jacobi}\), implies that \[\begin{equation} \label{Jacobi2} f_{ab}{}^d f_{dc}{}^e + f_{bc}{}^d f_{da}{}^e + f_{ca}{}^d f_{db}{}^e =0 \end{equation}\] for the structure constants.

Example 1.17. A basis of the Lie algebra \(\mathfrak{su}(2)\) of \(SU(2)\) is given by \(t_a = i \sigma_a\) for \(\sigma_a\) the Pauli matrices. We can work out \[= - [ \sigma_a, \sigma_b] = - 2 \epsilon_{abc} i \sigma_c\] so that we conclude that \(f_{ab}{}^c = - 2\epsilon_{abc}\) for \(\mathfrak{su}(2)\).

Definition 1.27. The exponential map is a map \(\exp: \mathfrak{g} \rightarrow G\) sending \(\gamma \in \mathfrak{g}\) to \[e^\gamma := \sum_k \frac{\gamma^k}{k!}\, \in G \, .\]

REMARK: For every matrix \(\gamma \in \mathfrak{g}\), one can show that the above indeed converges and that it is indeed in \(G\) if \(\gamma\) is in \(\mathfrak{g}\), see or the non-examinable box below for a sketch of a proof.


More on the exponential map\(^\ast\)
Let us explain how the exponential map comes about by taking a slightly more geometric perspective. Elements of the Lie algebra are associated with elements of the tangent space of \(G\) at the identity and we can think of both Lie algebra elements and Lie group elements as matrices, which can be multiplied. For any \(\gamma \in T_\mathds{1}(G)\), it turns out that \[L(\gamma)|_g = g \gamma \, .\] is a tangent vector at \(g\) for any point \(g\in G\). This defines what is called a vector field \(L(\gamma)\), i.e. something that attaches a tangent vector to any point on \(G\). The vector fields we have just defined are called left-invariant vector fields and have the nice property that  \[g' L(\gamma)|_g = g'g \gamma = L(\gamma)|_{g'g} \, .\] Now whats important about vector fields is that one can flow along with them. E.g. flowing out from the identity is done by solving the differential equation \[\frac{\partial g(t)}{\partial t} = L(\gamma)|_{g(t)} = g(t) \gamma \, .\] The solution to this flow is a path \[g(t) = e^{t\gamma} \, ,\] and the fact that we constructed this as a flow shows that the exponential actually lands in the group.

You can also understand convergence of the series we used to define the exponential by observing that any power of \(\gamma\) with produce a matrix with entries that are polynomials in the components of \(\gamma\). For \(k \rightarrow \infty\) the \(k!\) then grows faster at some point which makes the sum converge to a finite value.


1.18. Consider the set \(G\) of matrices \[G = \left\{\begin{pmatrix} a & b \\ 0 & c \end{pmatrix}| a,b,c \in \mathbb{R}, ac \neq 0 \right\}\]

  1. Show that \(G\) is a Lie group using matrix multiplication as the group composition.

  2. Find the Lie algebra \(\mathfrak{g}\) of \(G\).

  3. Compute the exponentials of the basis elements of the Lie algebra you have found.

Lemma 1.1. Let \(G\) be a Lie group and \(\mathfrak{g}\) be its Lie algebra. We then have

  • \(g \gamma g^{-1}\, \in \, \mathfrak{g}\) for all \(\gamma \in \mathfrak{g}\) and \(g \in G\).

  • \([\gamma, \delta] \in \mathfrak{g}\) for all \(\gamma,\delta \in \mathfrak{g}\)

: To see the first part, let’s try to construct a path that gives us \(g \gamma g^{-1}\) as a tangent vector upon differentiating. We could try \[e^{t g \gamma g^{-1}} = \sum_k \frac{(g t \gamma g^{-1})^k}{k!} = g \left( \sum_k \frac{(t \gamma)^k}{k!} \right) g^{-1} = g e^{t \gamma}g^{-1}\, .\] As all of the factors on the rhs are in \(G\), it follows that \(g e^{t \gamma}g^{-1} \in G\). As \(g e^{t \gamma}g^{-1}\) is a path in \(G\) that passes through \(\mathds{1}\) at \(t=0\) and \[\left.\frac{\partial}{\partial t} g e^{t \gamma}g^{-1}\right|_{t=0} = g \gamma g^{-1}\] it follows that \(g \gamma g^{-1} \in \mathfrak{g}\). Although we are talking about matrix Lie algebras in this course where we can just multiply elements \(g \in G\) with elements \(\gamma \in \mathfrak{g}\), you might feel a little uneasy about just multiplying them. In this case, you can read the above statement as a definition of what \(g \gamma g^{-1}\) is: it is the Lie algebra element you get from the path \(g e^{t \gamma}g^{-1}\).

For the second part, consider \(e^{t\gamma } \delta e^{-t \gamma}\) for \(\delta \in \mathfrak{g}\). It follows from i) that this is in \(\mathfrak{g}\) for all \(t\). As a tangent space, the Lie algebra is in particular an \(n\)-dimensional vector space which sits inside the vector space of \(n \times n\) matrices. As such it is closed under taking limits. Hence \[\lim_{t \rightarrow 0} ( e^{t\gamma } \delta e^{-t \gamma} - \delta) /t = \gamma \delta - \delta \gamma = [\gamma,\delta]\, .\] is in \(\mathfrak{g}\). \(\square\)

A natural question about the exponential map concerns its injectivity and surjectivity. Clearly, it cannot be injective for every group \(G\). We have already seen it is not for \(U(1)\) and \(SU(2)\), which in turn had to be like that, because this is how these groups become topologically non-trivial.

It can also not always be surjective as the following simple counter-example shows.

Example 1.18. Elements of the Lie algebra \(\mathfrak{sl}(2,\mathbb{R})\) of \(SL(2,\mathbb{R})\) must obey \[e^{\gamma} = g\] for \(g \in SL(2,\mathbb{R})\). This implies that \(\gamma\) is real by taking complex conjugation. Furthermore \[\det e^{\gamma} = e^{tr\gamma} = 1\] implies that \(\gamma\) is traceless. Finally, \(e^{\gamma}\) always maps to \(SL(2,\mathbb{R})\) if the above conditions are met: the inverse is simply \(e^{-\gamma}\). The Lie algebra \(\mathfrak{sl}(2,\mathbb{R})\) hence contains traceless real matrices. Now consider the matrix \[g = \begin{pmatrix} -4 & 0 \\ 0 & -1/4 \end{pmatrix} \in SL(2,\mathbb{R})\] We claim there is no element \(\gamma \in \mathfrak{sl}(2,\mathbb{R})\) s.t. \(e^\gamma=g\). : If such an element exists, we can immediately write down a square root of \(g\) as \(\sqrt{g} = e^{\tfrac12\gamma}\). But as we show now, no such square root (in \(SL(2,\mathbb{R})\)) exists. The eigenvalues of \(g\) are \(4\) and \(1/4\), so there is one eigenvalue of \(\sqrt{g}\) that is \(\pm 2i\) and another one that is \(\pm \tfrac12i\). However, for \(\sqrt{g}\) to be in \(SL(2,\mathbb{R})\) it must be a real matrix, so that the eigenvalues are given by an equation of the form \(\lambda^2 + p \lambda + q =0\) with \(p,q\) real. Hence \[\lambda_\pm = - p/2 \pm \sqrt{(p/2)^2 -q}\] so that there are two eigenvalues which are either real or complex conjugates of each other. However \(\pm 2i\) is not real and never the complex conjugate of \(\pm \tfrac12i\). Hence there is no \(\sqrt{g}\) such that \((\sqrt{g})^2 = g\). But this implies that there cannot be a \(\gamma\) with \(g = e^\gamma\) as could write down such a \(\sqrt{g}\) otherwise. \(\square\)

Although it does not hold in general, there are favourable circumstances where the exponential map is surjective. We have already seen this for \(SU(2)\) and \(SO(3)\) already (see also exercises). To spell out the general result clearly, we need one more

Definition 1.28. A subset of \(\mathbb{R}^n\) is called compact if it is closed and bounded, i.e. one can find a ball of finite size that entirely contains it.

Example 1.19. \(U(1)\) and \(SU(2)\) are both compact.

Fact 1.1. The orthogonal and unitary groups \(O(n), SO(n), U(n), SU(n)\) are all compact.

Theorem 1.5. If \(G\) is a connected, compact matrix Lie group, the exponential map for \(G\) is surjective.

: We only give a sketch, details are found in . The main idea is to observe that the exponential map is surjective for \(U(1)\) and then try to replicate this setting. The crucial step is to show for any compact matrix Lie group \(G\) that every element \(g \in G\) lies inside some \(U(1)\) subgroup of \(G\). Once this ‘torus theorem’ is established, we can simply use the generators of the \(U(1)\) to reach \(g\) by the exponential map. \(\square\)
REMARK: It is not true that the exponential map is surjective for compact groups only, e.g. \(\mathbb{C}^*\) is not compact but we can write every element in \(\mathbb{C}^*\) as \(e^{z}\) for some complex number \(z\).


Classification of compact Lie algebras\(^\ast\)

Definition 1.29. Lie algebras of compact Lie groups are called compact Lie algebras.

Definition 1.30. An ideal of a Lie algebra is a subset \(I\subset \mathfrak{g}\) such that \([\iota,x] \subset I\) for all \(\iota \in I\) and all \(x \in \mathfrak{g}\).

Definition 1.31. A simple Lie algebra is a Lie algebra that has no non-trivial ideals.

Theorem 1.6. Any compact Lie algebra can be decomposed into the direct sum of \(u(1)\) Lie algebras and of simple Lie algebras: \[\begin{equation} \label{cpct_semisimple} \mathfrak{g}= u(1) \oplus \dots \oplus u(1) \oplus \mathfrak{g}_1 \oplus \dots \oplus \mathfrak{g}_l~. \end{equation}\] : .

Simple Lie algebras were in turn classified by Killing and Cartan, and this classification was put in its definitive form by Dynkin. The matrix Lie algebras form four infinite series, the so called classical Lie algebras \(A_n=su(n+1)\), \(B_n=so(2n+1)\), \(C_n=usp(2n)\), \(D_n=so(2n)\). But there are a few more exceptional Lie algebras which are not of matrix type: \(E_6\), \(E_7\), \(E_8\), \(F_4\), \(G_2\). See for a down-to-earth introduction to the subject and for a more advanced perspective. The structure of these algebras boils down to the following pictures, called Dynkin diagrams, which determine the structure constants in a suitable basis of \(\mathfrak{g}\).

image


2 Representations

Although we have introduced groups in a rather concrete form as subspaces of \(GL(n,\mathbb{R})\) or \(GL(n,\mathbb{C})\), we can also drop this association and just keep their abstract structure. Taking this point of view, we will explore how groups can act on vector spaces, and how their structure prefers some vector spaces over others, in this section. Such questions belong to a subject called representation theory, and this is a vast field that we will only scratch the surface of. There is dedicated lecture, MATH4241: Representation Theory IV that gives a more detailed and general account of this subject. Here, we will take a practical approach and mostly only explore those aspects of direct use to us.

2.1 Generalities and Basic Examples

Definition 2.1. For a vector space \(V\), we will denote the group of invertible linear maps acting on \(V\) by \(GL(V)\). Hence for \(V = \mathbb{R}^n\), \(GL(V) = GL(n, \mathbb{R})\) and for \(V = \mathbb{C}^n\), \(GL(V) = GL(n, \mathbb{C})\).

Definition 2.2. A representation of a group \(G\) is a group homomorphism \(r: G \rightarrow GL(V)\), where \(V\) is a finite-dimensional (real or complex) vector space.

REMARK: What this means is that we ‘represent’ the group \(G\) by matrices in \(GL(V)\). Given \(r\), we can hence act with the group \(G\) on vectors in \(V\) using linear maps. This is often expressed as

Although a representation is defined as the map \(r\) which takes elements of \(G\) to elements of \(GL(V)\), it is quite common to speak about the elements in \(V\) that the image of \(G\) in \(GL(V)\) acts on as a ‘representation’ in the physics literature.

When we define a representation we have essentially two options, and we will see examples of both:

REMARK: We only ask a representation to be a homomorphism, i.e. \(r\) need not be injective. This means some aspects of \(G\) can get lost in a representation.

Given that we defined \(SU(n)\) as a group of matrices acting on \(\mathbb{C}^n\), we can just use this as an example of a representation.

Definition 2.3. The defining representation (also called fundamental representation) of \(SU(n)\) is the representation where the matrices defining this groups are allowed to be themselves: \(r(g)=g\). The fundamental representation of \(SU(n)\) is complex \(n\)-dimensional and is denoted by ‘the \(\mathbf{n}\) of \(SU(n)\)’.

REMARK: We may construct another representation of \(SU(n)\) that is also \(n\)-dimensional and is called ‘the \(\mathbf{\bar{n}}\) of \(SU(n)\)’ by acting with \(\bar{g}\) instead of \(g\). This representation is isomorphic to the \(\mathbf{n}\) representation.

Definition 2.4. The defining representation of \(SO(n)\) is the representation where the matrices defining this groups are allowed to be themselves: \(r(g)=g\). The defining representation of \(SO(n)\) is real \(n\)-dimensional and is denoted by ‘the \(\mathbf{n}\) of \(SO(n)\)’.

Even though matrix groups sit inside vector spaces, they themselves are not vector spaces, so a group action on itself is not a representation (in general). Here is an example of a non-trivial representation that exists for all Lie groups. It exploits the fact that Lie groups come equipped with an intrinsic vector space: each one of them has its own Lie algebra, which is a vector space.

Definition 2.5. The adjoint representation is a map \(Ad: G \rightarrow GL(\mathfrak{g})\) which sends a group element \(g\) to the \(GL(\mathfrak{g})\) element \(Ad(g)\), which is defined by its action on \(\mathfrak{g}\): \[Ad(g): \gamma \mapsto g \gamma g^{-1} \, .\]

REMARK: Don’t get confused, there are two maps we need to distinguish. There is the map \(Ad(g)\) sending \(g\) to the \(GL(\mathfrak{g})\) element \(Ad(g)\), but this itself is a map acting on the vector space \(\mathfrak{g}\). The definition above works by telling us how \(Ad(g)\) acts on any \(\gamma \in \mathfrak{g}\) f given \(g\), i.e. \(\gamma \rightarrow g \gamma g^{-1}\) is the linear map in \(GL(\mathfrak{g})\) that is the image of \(g \in G\) under the representation \(Ad\). That this is well-defined follows directly from theorem Lemma 1.1 part i) where we showed that \(g \gamma g^{-1} \in \mathfrak{g}\). As it is also a linear non-degenerate map, it is hence in \(GL(\mathfrak{g})\). Taken as a vector space, the Lie algebra \(\mathfrak{g}\) of a Lie group \(G\) is hence acted on by a representation of \(G\).

Example 2.1. As \(e^{i\phi} i\theta e^{-i\phi} = i \theta\) for all \(g \in U(1)\), the adjoint representation of \(U(1)\) is trivial.

Example 2.2. The adjoint representation of \(SU(2)\) is precisely the map we used to map it to \(SO(3)\). Recall that it acts on its own Lie algebra here, which is a real three-dimensional vector space, so this makes perfect sense.

Example 2.3. The adjoint representation of \(SU(3)\) acts on a real eight-dimensional vector space: the matrices in \(\mathfrak{su}(3)\) are traceless anti-hermitian \(3\times 3\) matrices. Their number of real components is \(2\) from the diagonal plus \(3 \times 2\) from off-diagonal terms. This is the reason there are eight different gluons in strong interactions.

2.1. Writing \(\mathbb{R}^3\) as \[M_{\boldsymbol{v}} = \begin{pmatrix} v_3 & v_1 - i v_2 \\ v_1 + i v_2 & -v_3 \end{pmatrix}\, .\] we considered the action of \(g \in SU(2)\) on \(\mathbb{R}^3\) defined by \[F(g): M_{\boldsymbol{v}} \mapsto g M_{\boldsymbol{v}}g^\dagger\] in the lectures. Show that this is a representation, and that this representation is the adjoint representation of \(SU(2)\).

2.2. Let \({\bf q} \in \mathbb{C}^3\) be acted on in the fundamental representation of \(SU(n)\) and \(\gamma\) in the adjoint representation of \(SU(n)\) (this is often expressed as \({\bf q}\) ‘lives’ in the fundamental and \(\gamma\) ‘lives’ in the adjoint of \(SU(n)\).)

Describe the action of \(SU(n)\) on

  • \({\bf v}= \gamma {\bf q}\)

  • \(\bar{\bf q}\)

  • A matrix \(Q\) with components \(Q_{ij} = q_i q_j\)

and decide in each case if this defines a representation.

2.3. Let \(g \in SO(3)\) be given by \[g = \begin{pmatrix} \cos \phi & \sin{\phi}& 0 \\ - \sin(\phi) & \cos(\phi) & 0 \\ 0 & 0 &1 \end{pmatrix}\, .\] Find the action of \(g\) in the adjoint representation and describe it using a basis of the vector space \(\mathfrak{so}(3)\). As \(\mathfrak{so}(3)\) is the same as \(\mathbb{R}^3\), we can describe its elements as column vectors after having chosen a basis. Using the basis you have chosen, write the adjoint action as a \(3 \times 3\) matrix acting on a column vector.

Definition 2.6. A representation is called faithful if \(r\) is injective.

Example 2.4. We have seen examples of \(SU(2)\) acting (faithfully) on \(\mathbb{C}^2\) in example Example 1.2 and (non-faithfully) on \(\mathbb{R}^3\) in example [ex:SO3vsSU2].

Example 2.5. We can act with \(SU(2)\) (faithfully) on \(\mathbb{C}^{4}\) by using the block-diagonal representation \[r:g \rightarrow \begin{pmatrix} g & 0\\ 0 & g \end{pmatrix}\]

Clearly this seems a bit redundant and we want to distinguish between such cases and those that truly give us something new. One way to phrase this is in terms of invariant subspaces.

Definition 2.7. A subspace \(W \subseteq V\) is called invariant if \(r(g)w \in W\) for all \(g \in G\) and all \(w \in W\).

Example 2.6. Coming back to the example Example 2.5, we can decompose \(V = \mathbb{C}^2 \oplus \mathbb{C}^2 \oplus \mathbb{C}^2 \oplus \cdots\) and every one of those summands is an invariant subspace.

Definition 2.8. A representation \(r:G\rightarrow GL(V)\) is irreducible if the only invariant subspaces are \(V\) and \(\{0\}\). Otherwise it is called reducible.

2.4. Let \(G\) be a Lie group and \(H\) be a subgroup of \(G\) that is also a Lie group.

  1. Explain why any representation \(r(G)\) of \(G\) also gives us a representation \(r(H)\) of \(H\).

  2. Let’s assume \(r(G)\) is irreducible. Can you think of an example where the representation \(r(H)\) is reducible? Can you think of an example where the representation \(r(H)\) is irreducible?

2.5. Let \(P\) be a homogeneous polynomial in two complex variables \(z_1\) and \(z_2\) of degree \(d\), i.e. we can write \[P(\boldsymbol{z}) = \sum_{k=0}^d \alpha_k z_1^k z_2^{d-k}\] for complex numbers \(\alpha_k\).

There is a natural action of \(SU(2)\) on \(\boldsymbol{z}= (z_1,z_2)\), which is just \[\boldsymbol{z} \mapsto g \boldsymbol{z}\, .\]

For a polynomial \(P(\boldsymbol{z})\), we can then define an action by \(SU(2)\) as \[r_d(g): P(\boldsymbol{z}) \mapsto P(g^{-1 }\boldsymbol{z}) \, .\] Show that this defines a representation of \(SU(2)\).

[remark: in the above formula, \(g^{-1 }\) does not act on the argument of \(P\) but on \(\boldsymbol{z}\), i.e. the action on \(P(A\boldsymbol{z})\) for a \(2 \times 2\) matrix \(A\) would be \(r_d(g): P(A\boldsymbol{z}) \mapsto P(Ag^{-1 }\boldsymbol{z})\). ]

Example 2.7. Mapping all \(\mathfrak{g}\in G\) to \(\mathds{1}\in GL(V)\) is a group homomorphism, so this trivial representation always exists. This is as un-faithful as possible and reducible: every subspace of \(V\) is an invariant subspace. Objects transforming in this representation are called scalars or singlets. They are often referred to as ‘living in the \(\mathbf{1}\) of \(G\)’.

Definition 2.9. A representation \(r:G\rightarrow GL(V)\) is unitary if \(V\) has an inner form 12 \(\langle.,.\rangle\) and \(\langle x,y\rangle = \langle r(g)x,r(g)y\rangle\) for all \(g \in G\) and all \(x,y \in V\).

Example 2.8. The fundamental representation of \(SU(n)\) is faithful, irreducible, and unitary.

Given that we have introduced most matrix groups as preserving some inner form, this seems like a natural concept. It’s power lies in the following

Theorem 2.1. Let \(r:G\rightarrow GL(V)\) be a finite-dimensional unitary representation. Then it can be completely decomposed into irreducible representations \(r_i(G)\): \[r(G) = \bigoplus_i r_i(G)\,\,, \hspace{1cm} V = \bigoplus_i V_i \,\,, \hspace{1cm} r_i(G) \in GL(V_i)\, .\]

If you like, you can think of \(r(G)\) as respecting the same block-diagonal form for all \(g \in G\) in an appropriate basis of \(V\). We have already seen a reducible representation that can be decomposed into irreducible ones, see example Example 2.5.

:

Let \(r(G)\) be a reducible representation (otherwise there is nothing to prove) and consider any of its invariant subspaces \(W\). The main step of the proof is to show that the orthogonal complement \[W^\perp := \{ v \in V | \langle v,w \rangle = 0\,\, \forall w \in W \}\] is an invariant subspace as well.

For any \(r(g)\) we can define its dual \(r^*(g)\) by \[\langle v, r(g) u \rangle = \langle r^*(g) v, u \rangle\, .\] for all \(v,u \in V\). It follows that \[\langle v, u \rangle = \langle r(g) v, r(g) u \rangle = \langle r^*(g)r(g) v, u \rangle\] so that \(r^*(g)r(g) =\mathds{1}\).

Now for all \(w \in W\), \(v \in W^\perp\) and all \(g \in G\) we have \[0 = \langle v,w \rangle = \langle v, r(g) w \rangle = \langle r^*(g) v, w \rangle = \langle (r(g))^{-1} v, w \rangle\] where \((r(g))^{-1}\) is the inverse of the matrix \(r(g) \in GL(V)\). As every element in \(G\) has an inverse and \((r(g))^{-1} = r(g^{-1})\) (as shown in problems), we can just write \[0 = \langle r(g) v, w \rangle\] for all \(w \in W\), \(v \in W^\perp\) and all \(g \in G\). This means whatever \(g\) acts on \(v \in W^\perp\), we stay in \(W^\perp\). Hence \(W^\perp\) is an invariant subspace as well, which is what we wanted to show.

Now we can decompose \[r(G) = r_W(G) \oplus r_{W^\perp}(G)\,\,, \hspace{1cm} V = W \oplus W^\perp \,\,.\] as both \(W\) and \(W^\perp\) are invariant subspaces. If both \(r_W(G)\) and \(_{W^\perp}(G)\) are irreducible we are done. Otherwise, we can simply run the same argument again to achieve a finer decomposition. This iteration must terminate as \(V\) is finite dimensional. \(\square\)

For unitary representations all is hence nice and easy. But what can we do when we do not have an inner form that is respected by \(r(g)\) ? Using ‘Weyl’s unitarity trick’ we can just cook one up (if \(G\) is compact)!

Theorem 2.2. Let \(G\) be a compact Lie group and \(r(G)\) a finite-dimensional representation on a vector space with inner form \(\langle .,. \rangle\). Then there exists an inner form that is invariant under \(r(G)\) and hence the same statement as in Theorem Theorem 2.1 holds.

: Let \(\langle . , .\rangle\) be some inner form on \(V\). As \(G\) is a compact group, \(\langle r(g) v, r(g) w \rangle\) is bounded for fixed \(v,w\): this expression cannot diverge for \(g \rightarrow \hat{g}\) anywhere on \(G\) as such a \(\hat{g}\) cannot be in \(G\). But it follows from \(G\) being topologically closed that any sequence of group elements \(g_i \in G\) has a limit that is also in \(G\). Hence there must be a maximal value of \(\langle r(g) v, r(g) w \rangle\) for fixed \(v\) and \(w\) and we can use that as the bound.

Furthermore, \(G\) is some bounded subspace in \(\mathbb{R}^m\) for some \(m\) for the matrix Lie groups we are treating in these lectures, and as such has a finite volume. We can then integrate a bounded function over it and receive a finite answer. In particular, we can use any realization of \(G\) as a subset of \(\mathbb{R}^n\) to define \[\langle v,w \rangle_G := \int_G \langle r(g)v,r(g)w \rangle dV \, .\] What is happening here is that we are averaging over the action of the group on \(\langle v,w \rangle\). Let’s act with a group element \(h\) on \(v\) and \(w\) \[\langle r(h)v,r(h)w \rangle_G = \int_G \langle r(g)r(h)v,r(g)r(h)w \rangle dV = \int_G \langle r(gh)v,r(gh)w \rangle dV \, .\] where we have used that \(r\) is a group homomorphism. Now if \(g\) sweeps out the whole group, so does \(gh\) for any \(h \in G\). In particular, every group element \(g'\) can be uniquely written as \(g' = gh\) for some \(g\), just take \(g = g'h^{-1}\). Hence \[\langle r(h)v,r(h)w \rangle_G = \int_G \langle r(gh)v,r(gh)w \rangle dV = \int_G \langle v, w \rangle dV = \langle v,w \rangle_G \, .\] and we are done. \(\square\)
An important feature of complex irreducible representations is Schur’s lemma:

Theorem 2.3. Let \(r\) be an irreducible representations of \(G\) on a finite-dimensional complex vector space \(V\), and let \(T:V \rightarrow V\) be a linear map such that \[r(g) T = T\,\, r(g)\] for all \(g \in G\). Then

  1. \(T=0\)

  2. or

  3. \(T= c \mathds{1}\) for some complex number \(c\)

: First observe that \(\ker T\) is an invariant subspace: if \(v \in \ker T\) we have \[0 = Tv = r(g)Tv = T r(g) v\] so \(r(g) v \in \ker T\) as well. As we have assumed that \(r\) is irreducible, \(\ker T = V\) or \(\ker T =\{0\}\). If \(\ker T = V\) it follows that \(T=0\), so case a) is realized and we are done.
Let us hence assume from now on that \(\ker T =\{0\}\). As a complex matrix, \(T\) has at least one non-zero eigenvalue, let that eigenvalue be \(c\) and the associated eigenvector be \(v_c\). Now consider the map \(\hat{T} : = T - c \mathds{1}\) for which \(v_c \in \ker \hat{T}\). We have \[r(g) \hat{T} = \hat{T} r(g)\] as the identity commutes with every matrix. Now we can again observe as above that \(\ker \hat{T}\) is an invariant subspace and hence must be \(\{0\}\) or \(V\). We already know that \(\ker \hat{T} \neq 0\), so it must be that \(\ker \hat{T} = V\) which implies \(\hat{T}=0\), i.e. \(T = c \mathds{1}\). \(\square\)

2.2 Representations of \(U(1)\)

The complex representations of \(U(1)\) can be found by more or less elementary considerations.

Theorem 2.4. Complex irreducible representations of \(\,U(1)\) are all unitary, take \(U(1)\) to \(GL(1,\mathbb{C})\), i.e. act on \(\mathbb{C}\), and only depend on an integer \(n\). For \(g = e^{i \, \phi}\) we can write the homomorphism \(f_n:U(1) \rightarrow GL(1,\mathbb{C})\) as \[\begin{equation} \label{eq:u1repstartingpoint} f_n(g) = g^n = e^{in\, \phi} \, . \end{equation}\]

: We first show that all complex irreducible representations of \(U(1)\) are one-dimensional. Consider such a representation \(f(g)\) of \(U(1)\) for which \(f(g) \in GL(m,\mathbb{C})\) As \(U(1)\) is an abelian group we have \[f(g)f(h) = f(gh) = f(hg) = f(h) f(g)\, .\] for all \(g,h \in U(1)\). Now let us fix \(h\) for the time being. Then we can set \(r(g) = f(g)\) in Schur’s lemma, and use \(T = f(h)\) to conclude that \(f(h)\) must be proportional to the identity map. This is true for all \(h \in U(1)\) (different \(h\) might produce different \(c\) however) which implies that the representation \(f(g)\) is one-dimensional: any subspace of \(\mathbb{C}^m\) is an invariant subspace, and the only subspaces giving irreducible ones are complex one-dimensional ones.

As \(f\) is a homomorphism we need \[f(hg) = f(h) f (g)\, .\] Differentiating both sides w.r.t \(h\) and setting \(h=1\) we find \[g f'(g) = f'(1)f(g)\, .\] Let us write the constant \(f'(1)\) by \(n = f'(1)\). The the unique solution to the above differential equation that satisfies \(f(1) = 1\) (a consequence of \(f\) being a group homomorphism) is \[f(g) = g^n = e^{i\phi n} \, .\] Letting \(\phi = 2\pi\) yields \(g=1\), so that \(f(1) = 1\) additionally requires \(e^{2 \pi i n}=1\), i.e. \(n \in \mathbb{Z}\). For \(f(g)\) we have \(f(g)^\dagger = f(g)^{-1}\), so these are all unitary (using the standard inner form on \(\mathbb{C}\)). \(\square\)

REMARK: The integer \(n\) is often called the ‘charge’ in physics, and you will see later that it deserves this name when we study electromagnetism, but this also shows up in other contexts with a \(U(1)\) action. The realization that \(n\) is an integer has the profound consequence that charges are quantized, i.e. they are multiples of some fundamental charge (corresponding to \(n=1\)). This doesn’t explain why protons and electrons have the same charges, but already implies that their charges must satisfy \[\frac{q_{\mbox{\footnotesize electron}}}{q_{\mbox{\footnotesize proton}}} = \frac{n_{\mbox{\footnotesize electron}}}{n_{\mbox{\footnotesize proton}}}\,\] i.e. the ratio must be a rational number.

2.6. .

  1. Describe a \(U(1)\) subgroup of \(SU(2)\). Is \(U(1) \times U(1)\) a subgroup of \(SU(2)\) as well?

  2. Let \(A\) be an element of the vector space that is acted on by the adjoint representation of \(SU(2)\). For the \(U(1)\) subgroup of \(SU(2)\) you identified above, find the action on \(A\) and use this to decompose the action of \(U(1)\) into irreducible representations.

2.7. Consider the map \(r_\kappa:U(1) \rightarrow GL(3,\mathbb{C})\) defined by \[r_\kappa(e^{i\phi}) = e^{\phi \lambda \kappa}\] where \(\kappa \in \mathbb{C}\) and \[\lambda = \begin{pmatrix} 0 & i & 0 \\ i & 0 & i \\ 0 & i & 0 \end{pmatrix}\]

For which values of \(\kappa\) is \(r_\kappa\) a representation of \(U(1)\)? [hint: think about what happens to eigenvectors of \(\lambda\)]

2.3 Representations of Lie algebras

If we have found a representation of a Lie group \(G\) on some vector space \(V\), every element \(g \in G\) is assigned an element \(r(g) \in GL(V)\). We can think the same way about representations of Lie algebras. The only difference is that we want to preserve the algebra structure.

Definition 2.10. A Lie algebra homomorphism is a linear map \(f: \mathfrak{g} \rightarrow \mathfrak{h}\) between Lie algebras \(\mathfrak{g}\) and \(\mathfrak{h}\) such that \([f(\gamma),f(\delta)] = f([\gamma,\delta])\).

Definition 2.11. A representation of a Lie algebra \(\mathfrak{g}\) is a Lie algebra homomorphism \(\rho: \mathfrak{g}\rightarrow \mathfrak{gl}(V)\) for a finite-dimensional vector space \(V\).

Definition 2.12. A representation of a Lie algebra \(\mathfrak{g}\) is called reducible if there exists an invariant subspace, i.e. there exists a \(W \subset V\) with \(W \neq \{0\}\) and \(W \neq V\) s.t. \[\rho(\gamma) w \in W\,\,\, \forall w \in W\,\, \forall \gamma \in \mathfrak{g} \, .\]

In the same way that a path \(g(t)\) passing through \(\mathds{1}\) determines an element \(\gamma\) of the Lie algebra, we can use \(r(g(t))\) to determine an associated representation \(\rho(\gamma)\): all we need to do is consider \(r(g(t))\) instead of \(g(t)\) and do the same computation.

Proposition 2.1. Given a finite-dimensional representation \(r\) of a Lie group \(G\), there is a unique associated representation \(\rho\) of its Lie algebra \(\mathfrak{g}\) such that \[\begin{equation} \label{eq:algrepfromgrouprep} r(e^{t\gamma}) = e^{t \rho(\gamma)}\, , \end{equation}\] we can compute this by working out \[\begin{equation} \label{eq:algrepfromgrouprep2} \rho(\gamma) = \left. \frac{\partial}{\partial t} r\left(e^{t \gamma}\right) \right|_{t=0}\, . \end{equation}\]

: First of all \(\eqref{eq:algrepfromgrouprep2}\) shows how to compute the map \(\rho(\gamma)\) from \(r(g)\), i.e. it is uniquely given once \(r(g)\) is fixed.

Nect we check that we satisfy the definition. Consider the path \[g(t) :=e^{t\gamma} e^{t\delta}\, .\] On the one hand we have \[\left. \frac{\partial}{\partial t} r\left(g(t)\right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} e^{t\delta} \right) \right|_{t=0} =\left. \frac{\partial}{\partial t} r\left( e^{t\gamma}\right) r \left( e^{t\delta} \right) \right|_{t=0} = \rho(\gamma) + \rho(\delta)\, .\] Now consider \[\begin{aligned} e^{t\gamma} e^{t\delta} &= \sum_{k} \frac{(t\gamma)^k}{k!}\sum_{l} \frac{(t\gamma)^l}{l!} = (1 + t \gamma + t^2 \gamma^2/2 + \cdots) (1 + t \delta + t^2 \delta^2/2 + \cdots) \\ & = \left(\sum_k \frac{t^k(\gamma+\delta)^k}{k!}\right) + \frac{t^2}{2} (\gamma \delta - \delta \gamma) + t^3 (\cdots) + \cdots \, . \end{aligned}\] Such a relation can be given more concisely as what is called the ‘Baker-Campbell-Hausdorff formula’, see for details.

Hence \[\begin{aligned} \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} e^{t\delta} \right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t(\gamma +\delta)}+t^2(...) \right) \right|_{t=0} = \left. \frac{\partial}{\partial t} r\left( e^{t(\gamma +\delta)} \right) \right|_{t=0} = \rho(\gamma + \delta)\, . \end{aligned}\] In the first step, we have used the relation established above, the second step is just the chain rule, and the third step uses the definition of a Lie algebra representation.

Hence we have shown that \[\rho(\gamma) + \rho(\delta) = \rho(\gamma + \delta) \, .\] Furthermore using the chain rule we have that \[\begin{aligned} \rho(c \gamma) &= \left. \frac{\partial}{\partial t} r\left( e^{tc\gamma} \right) \right|_{t=0} \\ & = c \left. \frac{\partial}{\partial t} r\left( e^{t\gamma} \right) \right|_{t=0} & = c \rho(\gamma) \end{aligned}\] Hence we have shown that \(\rho\) is a linear map of \(\mathfrak{g}\).

Now we check that it respects the algebra \([\cdot,\cdot]\), i.e. is a Lie algebra homomorphism. Recall that we shown earlier that \[e^{tg\gamma g^{-1}} = ge^{t\gamma}g^{-1}\] so we find \[r\left(e^{tg\gamma g^{-1}} \right)= r\left(ge^{t\gamma}g^{-1}\right) = r\left(g\right) r \left( e^{t\gamma}\right) r(g^{-1}) \, .\] Taking a derivative w.r.t \(t\) on both sides and then setting \(t=0\) we get \[\rho(g \gamma g^{-1}) = r(g) \rho(\gamma) r(g^{-1})\,.\] Now comes the final trick: we set \(g=e^{t \delta}\) in the above equation, take another derivative w.r.t. \(t\) and set \(t=0\) again. The rhs becomes \[\left. \frac{\partial}{\partial t} e^{t \delta} \rho(\gamma) e^{-t \delta} \right|_{t=0} = \rho(\delta)\rho(\gamma) - \rho(\gamma)\rho(\delta) = [\rho(\delta),\rho(\gamma)]\] For the lhs, recall that \(\rho\) is a linear map between vector space, so that13 \[\frac{\partial}{\partial t} \rho\left(\kappa(t)\right) = \rho\left( \frac{\partial}{\partial t} \kappa(t)\right)\, ,\] The lhs hence becomes \[\left. \frac{\partial}{\partial t}\rho(e^{t \delta} \gamma e^{-t \delta}) \right|_{t=0} = \left. \rho\left( \frac{\partial}{\partial t} e^{t \delta} \gamma e^{-t \delta}\right) \right|_{t=0} = \rho([\delta,\gamma])\] Hence we have shown that \[\rho([\delta,\gamma]) = [\rho(\delta),\rho(\gamma)]\, ,\] i.e. we have defined a Lie algebra homomorphism. \(\square\)

REMARK: The converse to the above theorem is not always true, i.e. given a representation \(\rho\) of a Lie algebra, there does not need to be a group representation that relates to it via \(\eqref{eq:algrepfromgrouprep}\). We have actually encoutered this before. Recall that \(SO(3)\) and \(SU(2)\) have isomorphic Lie algebras. Hence I can think of the Lie algebra of \(SU(2)\) as a (Lie algebra) representation of \(\mathfrak{so}(3)\). However, if I exponentiate the Lie algebra \(\mathfrak{su}(2)\), I do not get a representation of \(SO(3)\) but just \(SU(2)\) in the defining representation. We will examine this in a little more detail later.

2.8. Consider the Lie group \(G\) of upper triangular \(2 \times 2\) matrices \[G = \left\{\begin{pmatrix} a & b \\ 0 & c \end{pmatrix}| a,b,c \in \mathbb{R}, ac \neq 0 \right\}\]

  1. Let \(\boldsymbol{v} \in \mathbb{R}^3\), \(\boldsymbol{v} = (v_1,v_2,v_3)\). Define an action of \(G\) on \(\boldsymbol{v}\) by writing \[v_m := \begin{pmatrix} v_1 & v_2 \\ 0 & v_3 \end{pmatrix}\] and letting \(g \in G\) act as \[r(g)v_m := g v_m g^{-1} \, .\] Convince yourself that this is a representation of \(G\). Write the action of \(g\) on \(\boldsymbol{v}\) defined above in terms of a \(3 \times 3\) matrix acting on \(\boldsymbol{v}\): \[r(g) \boldsymbol{v} = M(g) \boldsymbol{v}\] for a \(3 \times 3\) matrix \(M(g)\) acting on the vector \(\boldsymbol{v} \in\mathbb{R}^3\) in the usual way.

  2. Writing elements of the representation \(r(G)\) in terms of the matrices \(M(g)\), work out the associated representation \(\rho\) of the Lie algebra \(\mathfrak{g}\) of \(G\).

  3. Check that they obey the same Lie algebra as the Lie algebra \(\mathfrak{g}\) of \(G\) (see problem 20), i.e. find a bijective Lie algebra homomorphism between the Lie algebra \(\mathfrak{g}\) of \(G\) and the Lie algebra representation \(\rho(\mathfrak{g})\) associated with \(r(G)\).

Example 2.9. The adjoint representation of the Lie algebra is a map \(ad: \mathfrak{g} \rightarrow GL(\mathfrak{g})\) which maps \(\delta \in \mathfrak{g}\) to a linear map acting on \(\mathfrak{g}\) that acts on \(\gamma \in \mathfrak{g}\) as \[\begin{equation} \label{eq:adjoint_action} ad(\delta): \gamma \rightarrow [\delta,\gamma] \, , \end{equation}\] i.e. \(ad(\delta)\) is in \(GL(\mathfrak{g})\). We can work out this representation is associated to the usual adjoint representation of the group, Definition \(\eqref{def:adjoint}\), using \(\eqref{eq:algrepfromgrouprep2}\): \[ad(\delta) (\gamma) = \left. \frac{\partial}{\partial t} Ad(e^{t\gamma}) (\gamma) \right|_{t=0} = \left. \frac{\partial}{\partial t} e^{t \delta} \gamma e^{-t\delta} \right|_{t=0} = [\delta,\gamma]\, .\]

Earlier, we mentioned that we can characterise Lie algebras by their structure constants \(f_{ab}{}^c\) once a basis \(\{t_a\}\) was chosen, \(\eqref{eq:structure_constants}\).

Proposition 2.2. The structure constants define a representation by setting \[\begin{equation} \label{eq:adjoint_structure_c} \left(\rho_{adj}(t_a)\right)^{b}{}_c = f_{ac}{}^b \, . \end{equation}\] This representation is the adjoint representation written in the basis \(\{t_a\}\), i.e. the adjoint action in the basis \(\{t_a\}\) is given by matrices \(\rho_{adj}(t_a)\) with components \(\rho_{adj}(t_a)^b{}_c = f_{ac}{}^b\).

:

2.9. .

  1. Check that \(\eqref{eq:adjoint_structure_c}\) defines a representation of \(\mathfrak{g}\).

  2. Show that the adjoint action in the basis \(\{t_a\}\) is given by the matrices \(\rho_{adj}(t_a)\) with components \(f_{ac}{}^b\) by showing that \[\begin{equation} \label{ad_vs_adj} ad(t_a)(\gamma^b t_b) = \left(\rho_{adj}(t_a)\right)^{b}{}_c \gamma^c t_b~. \end{equation}\]

Theorem 2.5. Let \(r\) be a complex representation of a compact Lie group \(G\) acting on \(V\), and let \(\rho\) be the associated Lie algebra representation. Writing the basis elements of the Lie algebra \(\mathfrak{g}\) of \(G\) as \(\{i t_a \}\), we can choose a basis of \(V\) such that \(\rho(t_a)\) are Hermitian matrices, \(\rho(t_a)^\dagger = \rho(t_a)\).

:

As we have seen when using Weyl’s unitarity trick, we can choose an inner form \(\langle .,.\rangle\) on the complex vector space \(V\) such that \[\langle r(g) v,r(g) w\rangle =\langle v, w\rangle\, .\] As this is an inner form on a complex vector space, we have that for any \(c \in C\) \[\langle v,c w\rangle = c \langle v, w\rangle \hspace{1cm} \langle c v,w\rangle = \bar{c} \langle v, w\rangle \,.\]

With this inner form, we can choose a basis \(e_i\) on \(V\) such that \[\langle e_i , e_j \rangle = \delta_{ij} \, .\] Using this basis, we can write \(r(g)\) as matrices \(r(g)_{ij}\).

For \(v = v_i e_i\) and \(w = w_i e_i\) we now work out \[\begin{aligned} \bar{v}_i w_i = \langle v , w \rangle &= \langle r(g) v_i e_i , r(g) w_j e_j \rangle = \langle r(g)_{ik} v_k e_i , r(g)_{jl} w_l e_j \rangle \\ &= \overline{r(g)_{ik} v_k} r(g)_{jl} w_l \langle e_i , e_j \rangle = \bar{v}_k r^\dagger(g)_{ki} r(g)_{jl} w_l \delta_{ij} = \bar{v}_k r^\dagger(g)_{ki} r(g)_{il} w_l \end{aligned}\] i.e. \(r^\dagger(g)_{ki} r(g)_{jl} = \delta_{kl}\) so that \(r(g)^\dagger = r(g)^{-1}\).

As we have assumed that \(\{i t_a \}\) is a basis of \(\mathfrak{g}\), we can write \[g = e^{i t_a \gamma^a}\] for some real numbers \(\gamma^a\). By defintion this implies that \(\rho(t_a)\) satisfies \[r(g) = e^{i \rho(t_a)}\, .\] Now \(r(g)^\dagger = r(g)^{-1}\) gives us \(\rho(\gamma)^\dagger = \rho(\gamma)\). \(\square\)

2.4 Representations of \(SU(2)\) and \(SO(3)\)

2.4.0.1 Representations of \(SU(2)\)

Here is a neat way to explicitely construct representations of \(SU(2)\), and as we will see, it gives us all the irreducible ones. \(SU(2)\) naturally acts \(\mathbb{C}^2\) in the the fundamental representation. Given some complex polynomial \(P(\boldsymbol{z})\) in two variables \(\boldsymbol{z}= (z_1,z_2)\), we can then let \(SU(2)\) act on \(P(\boldsymbol{z})\) in this way. This is particularly nice if \(P(\boldsymbol{z})\) is a homogenous polynomial of degree \(d\), i.e. we can write \[P(\boldsymbol{z}) = \sum_{k=0}^d a_k z_1^k z_2^{d-k} \, .\] The space of such polynomials is a vector space \(\Pi_d\) of dimension \(d+1\). You can think of the \(a_k \in \mathbb{C}\) as the components of the vector and the monomials as the basis vectors. Letting \(SU(2)\) act on \(\mathbb{C}^2\), we have a corresponding induced action on the vector space of polynomials.

Proposition 2.3. The map \[r_d(g) P := P(g^{-1}\boldsymbol{z} ) \, .\] where \(g^{-1} \in SU(2)\) acts on \(\boldsymbol{z} = (z_1,z_2)\) as \[\begin{pmatrix} z_1 \\ z_2 \end{pmatrix} \rightarrow g^{-1} \begin{pmatrix} z_1 \\ z_2 \end{pmatrix}\] defines a representation \(r_{d}\) of \(SU(2)\) on the complex vector space \(\Pi_d\) of dimension \(d+1\).

: exercises

We can now figure out the representations \(\rho_d\) of \(su(2)\) that are associated with the \(r_d\) described in proposition Proposition 2.3. Let us choose \(\ell_j \equiv \frac{i}{2}\sigma_j\) as the generators of the Lie algebra \(su(2)\): \[\ell_1 = \tfrac12 \begin{pmatrix} 0 & i \\ i & 0 \end{pmatrix} \,\,, \hspace{.3cm} \ell_2 = \tfrac12 \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix} \,\,, \hspace{.3cm} \ell_3 = \tfrac12 \begin{pmatrix} i & 0 \\ 0 & -i \end{pmatrix}\, .\] Their action on the monomials \(z_1^l z_2^{d-k}\) is then (see problem class 4): \[\label{eq:ellkactionrd} \begin{align} \ell_1 &: z_1^k z_2^{d-k} \rightarrow -\frac{i}{2}\left(k z_1^{k-1}z_2^{d-k+1} +(d-k)z_1^{k+1}z_2^{d-k-1} \right)\\ \ell_2 &: z_1^k z_2^{d-k} \rightarrow \frac{1}{2}\left(-k z_1^{k-1}z_2^{d-k+1} +(d-k)z_1^{k+1}z_2^{d-k-1}\right) \\ \ell_3 &: z_1^k z_2^{d-k} \rightarrow i(d/2-k) z_1^k z_2^{d-k} \end{align}\]

Theorem 2.6. For every integer \(d \geq 0\), there is a single finite-dimensional irreducible representations \(r_d\) of \(SU(2)\) on a complex vector space \(\Pi_d\) of dimension \(d+1\). These are all of the complex irreducible finite-dimensional representations of \(SU(2)\).

2.4.0.2 Representations of \(SU(2)\): proof of the main theorem \(^\ast\)

Before taking on the theorem, let me prove a little lemma that will be quite useful:

Lemma 2.1. Let \(r\) be a complex representation of \(SU(2)\) acting on \(V\). Then all eigenvalues of \(\rho(\sigma_i)\), for \(\rho\) the associated representation of \(\mathfrak{su}(2)\), are real.

(of the lemma):

Let us denote \(\exp(i \rho(\sigma_j)) \equiv r_j\) and \(\rho(\sigma_j) \equiv \rho_j\), so \(\exp(i\rho_j) = r_j\). As \(SU(2)\) is compact, we can choose an inner form \(\langle .,.\rangle\) on the complex vector space \(V\) such that \[\langle r_j v,r_j v\rangle =\langle v, v\rangle\, .\] When using Weyl’s unitarity trick in Theorem Theorem 2.5, we further found that we can always choose a basis such that \(\rho_i^\dagger = \rho_i\). Now let \(v\) be an eigenvector of \(\rho_j\) with eigenvalue \(e_v\) and work out \[e_v \langle v, v \rangle = \langle v, \rho_j v \rangle = \langle \rho_j v, v \rangle = \bar{e}_v \langle v, v \rangle\, .\] so that \(e_v\) must be real. \(\square\)


Now we are ready to prove the theorem: :

Let’s start by slightly enlarging the scope and study representation of the Lie algebra \(\mathfrak{sl}(2,\mathbb{C}) = \mathfrak{su}(2) \otimes \mathbb{C}= \mathfrak{su}_\mathbb{C}(2)\). As we have seen in problem class 4, irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\) are in one-to-one correspondence with irreducible representations of \(\mathfrak{su}(2)\). What this means is that we consider a complex instead of a real vector space over the Pauli matrices as the algebra under consideration. This allows us to define \[H \equiv \frac{1}{2}\rho_d(\sigma_3)\,\, , \hspace{2cm} L_\pm = \frac{1}{2} \left(\rho_d(\sigma_1) \pm i \rho_d(\sigma_2) \right)\, .\] These obey the algebra \[= \pm L_\pm \,\,, \hspace{2cm} [L_+,L_-]=2H\, .\] Let us start by assuming that \(w_n\) is an eigenvector of \(H\) with eigenvalue \(n\), so that \(H w_n = n w_n\). Then \[H L_+ w_n = (L_+ H + [H,L_+] ) w_n = (L_+ H + L_+) w_n = (L_+ n + L_+ ) w_n = (n+1) (L_+ w_n)\, .\] This equation means that \(L_+ w_n\) is another eigenvector of \(H\), but now the eigenvalue is \(n+1\). Hence \(L_+\) is a ‘raising operator’ that increases the eigenvalue \(n\) by one. A similar computation reveals that \(L_- w_n\) has eigenvalue \(n-1\), so \(L_-\) is a ‘lowering operator’.

As we only care about representations of \(SU(2)\) we can use the lemma above and conclude that \(H\) only has real eigenvalues. As we only consider finite-dimensional vector spaces, one of these eigenvalues must be the largest. Let us call this eigenvalue \(m\) and \(w_m\) the associated eigenvector14. Then we must have \[L_+ w_m = 0 \, .\] Otherwise \(L_+ w_m\) would be another eigenvector with eigenvalue \(m+1\), which violates the assumption that we have chosen the largest.

We can then repeatedly act with \(L_-\) to produce more eigenvectors with smaller eigenvalues. As we are looking for finite-dimensional representations, this must terminate at some point, i.e. for some \(d \in \mathbb{Z}\), \((L_-)^{d+1} w_m = 0\). A basis of our representation are hence the vectors \[w_{m-l} \equiv (L_-)^l w_m \,\,\,\, , l = 0 \cdots d\, .\] and its dimension is \(d+1\). To find out which values can appear, we introduce \[\Delta \equiv \frac{1}{4}\left( \rho_d(\sigma_1)^2 + \rho_d(\sigma_2)^2 + \rho_d(\sigma_3)^2\right) = \frac{1}{2}\left(L_+ L_- + L_- L_+\right) + H^2\,\] We already know that \(\sigma_i^2 = \mathds{1}\) in the fundamental representation. That \(\Delta = c \mathds{1}\) here as well follows from Schur’s lemma after observing that \[= [\Delta, L_\pm] = 0 \, .\] This does not imply that \(c=3/4\) however, as we are not in the fundamental representation! To fix \(c\), observe that \[\Delta w_m = \left(\frac{1}{2}(L_+ L_- + L_- L_+) + H^2\right) w_m = \left(L_- L_+ + H(H+\mathds{1}) \right) w_m = m(m+1)w_m\, .\] Hence \(c= m(m+1)\). As \(L_- w_{m-d} = 0\) and furthermore \(L_+L_- = \Delta - H(H-1)\) we have that \[\begin{aligned} 0 &= \left(\Delta - H (H-\mathds{1})\right) w_{m-d} = (m(m+1)-(m-d)(m-d-1)) w_{m-d} \\ &= (1 + d) (2 m - d) w_{m-d} \end{aligned}\] which implies \(2m-d = 0\). As \(d\) is an integer, this implies that \(m\) takes half-integer values. By construction, these are finite-dimensional irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\).
We can easily restrict all of the matrices appearing in this representation to anti-hermitian ones to find a representation of \(\mathfrak{su}(2)\). As we have that \(\rho(\sigma_i) = \rho(\sigma_i^\dagger)\), we get a representation of \(\mathfrak{sl}(2,\mathbb{C})\) as \[\sum_j a_j \rho(\sigma_i)\,\,\, \mbox{for}\,\,\, a_j \in \mathbb{C}\] and a representation of \(su(2)\) as \[\sum_j i a_j \rho(\sigma_i)\,\,\, \mbox{for}\,\,\, a_j \in \mathbb{R}\, .\] In problem class 4 we have seen using the above that irreducible representations of \(\mathfrak{sl}(2,\mathbb{C})\) are in one-to-one correspondence with irreducible representations of \(\mathfrak{su}(2)\), so we can think of the representations just found as representations of \(\mathfrak{su}(2)\). 15
In order to compare this with the representations \(\rho_d\) of \(su(2)\) associated to the representations \(r_d\) of \(SU(2)\) that we already know exist, it is convenient to rescale the basis vectors \(w_j\) as follows. We define \(v_m \equiv w_m\) and \[L_- v_k = (m+k) v_{k-1}\] which implies \[\begin{aligned} L_+ v_k &= \frac{1}{m+k+1} L_+ L_- v_{k+1} = \frac{1}{m+k+1} \left(\Delta - H(H-1) \right) v_{k+1} \\ &= \frac{1}{m+k+1} \left(m(m+1) - k (k+1) \right) = (m-k) v_{k+1} \, . \end{aligned}\] In this basis, the action of the \(\ell_i\) is \[\begin{aligned} \ell_1 v_{m-k} &= \frac{i}{2} (L_+ + L_-) v_{m-k} = \frac{i}{2} \left(k v_{m-k+1} + (d-k)v_{m-k-1}\right)\\ \ell_2 v_{m-k} &= \frac{1}{2} (L_+ - L_-) v_{m-k} = \frac{1}{2} \left(k v_{m-k+1} - (d-k)v_{m-k-1}\right) \\ \ell_3 v_{m-k} &= iH v_{m-k} = i (m-k) v_{d/2-k} \end{aligned}\] where \(m=d/2\) for an integer \(d\). Comparing with \(\eqref{eq:ellkactionrd}\) we see that these representations are identified if we associate \[z_1^k z_2^{d-k} \simeq (-1)^k v_{m-k}\, .\] This means all the representations of \(\mathfrak{su}(2)\) we have found are the associated representations of the group representations \(r_d\) we already know exist.

This representation \(\rho_d\) is irreducible as can be seen as follows. Take any invariant subspace \(V\) of \(\Pi_d\). By assumption the action of the \(\ell_k\) maps any vector of \(V\) to another vector of \(V\). As \(V\) is a complex vector space, complex linear combinations are again in \(V\). This implies that if \(P \in V\), we also have that any linear combination of \[\ell_+^n := (z_2 \frac{\partial}{\partial z_1})^n P \,\, , \hspace{1cm} \ell_-^p := (z_1 \frac{\partial}{\partial z_2})^p P\] is in \(V\) (these are just powers of the polynomial versions of raising and lowering operators). We can hence apply a suitable power of \(\ell_-\) to map \(P\) to the single monomial \(z_1^d\). This monomial is hence in \(V\), which implies that any complex multiple of it is in \(V\) as well. But now we can use \(\ell_+\) to conclude the same for any other monomial. As the monomials are a basis of \(\Pi_d\), it follows that \(V = \Pi_d\). The Lie algebra representations \(\rho_d\) are hence irreducible.

This implies that \(r_d\) is irreducible as well. If \(W \in \Pi_d\) is an invariant subspace of \(r_d\), then it must be invariant under \(e^{t \rho(\gamma)}\) for all \(t\) and \(\gamma \in \mathfrak{su}(2)\), so in particular under \(\partial/\partial t e^{t \rho(\gamma)}\) and hence under \(\rho_d(su(2))\). But the Lie algebra representation \(\rho\) is irreducible as we already know.
So now we know all irreducible representations of \(SU(2)\): if there were others, the associated Lie algebra representation would have had to show up in our analysis. \(\square\)

2.4.0.3 Representations of \(SO(3)\)

We are now ready to discuss representations of \(SO(3)\). As the Lie algebra of \(SO(3)\) is the same as the Lie algebra of \(SU(2)\), it has the same irreducible representations. Coming to the groups, recall that there is a \(2\) to \(1\) map from \(SU(2)\) to \(SO(3)\) that we investigated in problem class 1 which mapped both \(\mathds{1}\in SU(2)\) and \(-\mathds{1}\in SU(2)\) to \(\mathds{1}\in SO(3)\). We can hence construct representations of \(SO(3)\) from representations of \(SU(2)\) if \(r(-\mathds{1}) = \mathds{1}\). Let us look at the action of \(r_d(-\mathds{1})\) on a monomial \[r_d(-\mathds{1}): z_1^k z_2^{d-k} \rightarrow (-1)^d z_1^k z_2^{d-k} \, .\] This map is the identity only if \(d\) is an even integer, i.e. \(m=d/2\) is an integer. We have seen that every representation of a Lie group gives us an associated representation of its Lie algebra. The above shows that the converse is not true, the representations of \(so(3)\) where \(m\) is half-integer cannot come from any representations of \(SO(3)\). On the other hand, we can lift any finite-dimensional representation \(R\) of \(SO(3)\) to one of \(SU(2)\):

2.10. Show that any irreducible complex representation of \(SO(3)\) also defines an irreducible complex representation of \(SU(2)\).

Hence we have

Theorem 2.7. The \(r_d\) for \(d = 2m\), \(m \in \mathbb{Z}\) are all of the finite-dimensional complex irreducible representations of \(SO(3)\).

2.4.0.4 \(SO(3)\), \(SU(2)\), and Spin

In physics in \(\mathbb{R}^3\), the half-integer \(m\) is called the spin: if there is a physical object that transforms in the representation \(r_d\), we say it has spin \(m=d/2\). This applies both to field theories, where \(SO(3)\) acts on the components of a field, and to quantum mechanics, where \(SO(3)\) acts on states. If \(d=0\) we have a one-dimensional representation, e.g. a scalar field, that does not transform at all, this is the spin \(0\) case. An ordinary vector in \(\mathbb{R}^3\) transforms in the three-dimensional representation \(r_2\) of \(SU(2)\), so you would call a field \(\mathbf{\phi} = (\phi_1,\phi_2,\phi_3)\) transforming like a vector in \(\mathbb{R}^3\) a ‘vector field’ as well. Here \(m=1\), so this is ‘spin 1’.

The representations of \(SO(3)\) show up in most courses on quantum mechanics when treating the hydrogen atom. Using wavefunctions gives us a very concrete version of these representations: the ‘spherical harmonics’.
It is a fact of nature that there are particles of ‘spin 1/2’, e.g. the electron or quarks. You might find this irritating as we might want to classify particles according to how they transform under space-time symmetries, i.e. \(SO(3)\) for rotations, and for \(m=1/2\) we do not get a representation of \(SO(3)\), but only one of its Lie algebra. One way to explain this is that in quantum mechanics, multiplying any state vector by a non-zero complex number does not change the state we are in. Taking this into account means studying projective representations, which for \(SO(n)\) are in one-to-one correspondence with ordinary representations of the associated ‘spin groups’: \(Spin(3) = SU(2)\).

Definition 2.13. The spinor representation is the \(\mathbf{2}\) of \(SU(2)\), and objects transforming in this representation are called spinors (of \(SO(3)\)). The covering group \(SU(2)\) of \(SO(3)\) is likewise called the ‘spin group’ \(Spin(3)\).

REMARK:  We saw earlier how to map \(SU(2)\) to \(SO(3)\). For the element of \(SU(2)\) of the form \[g_{SU(2)} = \begin{pmatrix} e^{i \phi/2} & 0 \\ 0 & e^{-i\phi/2} \end{pmatrix}\] the corresponding element in \(SO(3)\) was \[g_{SO(3)} = \begin{pmatrix} \cos ( \phi) & \sin ( \phi) & 0 \\ -\sin ( \phi) & \cos ( \phi) & 0 \\ 0 & 0 & 1 \end{pmatrix}\] Let us assume we are performing a rotation by \(360^\circ\) using the (usual) rotation group \(SO(3)\) in \(\mathbb{R}^3\), i.e. we let \(\phi\) go from \(0\) to \(2 \pi\) in the above matrix \(g_{SO(3)}\). In the corresponding \(SU(2)\) matrix \(g_{SU(2)}\) this takes us from \(\mathds{1}\) to \(-\mathds{1}\), i.e. we do not come back to where we started from, and need to let \(\phi\) go from \(0\) to \(4 \pi\) to return to \(\mathds{1}\). In this sense, a spinor needs to be rotated by \(720^\circ\) for a full rotation!

2.5 Representations of \(SU(n)\) and other Lie Groups \(^\ast\)

For more general Lie groups such as \(SU(n)\), you will not be surprised to hear that there is a richer representation theory. We know already two representations: the fundamental and the adjoint. Instead of developing the general theory, we will only try to sketch how one might go about creating such representations. Given a representation of a group \(G\), this implies representations of any subgroup \(H\) by simply restricting the homomorphism \(r:G \rightarrow GL(V)\) to \(H \subset G\). Every group \(SU(n)\) for \(n>2\) contains many copies of \(SU(2)\) as subgroups, and we have seen how we could construct representations of \(SU(2)\) using the operators \(H, L_+,L-\). This motivates to try and lift the method used for \(SU(2)\) to that of \(SU(n)\) by writing the (complexified) Lie algebra of \(\mathfrak{su}(n)\) in terms of number operators \(H_i\), lowering operators and raising operators. This is called a ‘Cartan-Weyl basis’ and leads to what are called ‘root systems’ which can in turn be used to classify certain classes of Lie algebras. Such a root system is shown in figure 2.

2.6 Tensor Representations*

Definition 2.14. Given two vector spaces \(V\) and \(W\) we can form their tensor product \(V \otimes W\). Let \(\boldsymbol{e}_i\), \(i=1.. \dim V\), be a basis of \(V\) and \(\boldsymbol{f}_j\), \(i=j.. \dim W\), be a basis of \(W\). Then \(V \otimes W\) is a vector space with basis consisting of tuples \((\boldsymbol{e}_i,\boldsymbol{f}_j)\) (also written as \(\boldsymbol{e}_i \otimes \boldsymbol{f}_j\)).

REMARK: It follows from the definition that \(\dim V \otimes W = \dim V \cdot \dim W\). Computing with tensor products works almost the same as with usual products, we have \[\begin{aligned} \boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v}' \otimes \boldsymbol{w} &= (\boldsymbol{v} + \boldsymbol{v}') \otimes \boldsymbol{w} \\ \boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v} \otimes \boldsymbol{w}' &= \boldsymbol{v} \otimes (\boldsymbol{w}+ \boldsymbol{w}'), \end{aligned}\] and for \(c \in \mathbb{R}\) (or \(\mathbb{C}\)) \[c (\boldsymbol{v} \otimes \boldsymbol{w}) = (c\boldsymbol{v}) \otimes \boldsymbol{w} = \boldsymbol{v} \otimes (c\boldsymbol{w}) \, .\] However \(\boldsymbol{v} \otimes \boldsymbol{w} \neq \boldsymbol{w} \otimes \boldsymbol{v}\): the first slot is reserved for vectors from \(V\) and the second for vectors from \(W\), so writing \(\boldsymbol{w} \otimes \boldsymbol{v}\) does not even make sense if \(\boldsymbol{v} \in V\) and \(\boldsymbol{w} \in W\). Not every vector in \(V \otimes W\) can be written as a product, e.g. \(\boldsymbol{v} \otimes \boldsymbol{w} + \boldsymbol{v}' \otimes \boldsymbol{w}'\) for \(\boldsymbol{v} \neq \boldsymbol{v}'\) and \(\boldsymbol{w} \neq \boldsymbol{w}'\).

Example 2.10. Consider \(\mathbb{R}^3 \otimes \mathbb{R}^3\) and let \(e_1,e_2,e_3\) be a basis of the first \(\mathbb{R}^3\) and \(f_1,f_2,f_3\) of the second. A basis of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) is then \[\begin{aligned} e_1 \otimes f_1, e_1 \otimes f_2, e_1 \otimes f_3, \\ e_2 \otimes f_1, e_2 \otimes f_2, e_2 \otimes f_3, \\ e_3 \otimes f_1, e_3 \otimes f_2, e_3 \otimes f_3, \end{aligned}\] whereas \(\mathbb{R}^3 \oplus \mathbb{R}^3\) has a basis \[e_1,e_2,e_3, f_1,f_2,f_3\, .\] Note that whereas \(\mathbb{R}^3 \oplus \mathbb{R}^3\) is six-dimensional, \(\mathbb{R}^3 \otimes \mathbb{R}^3\) is 9-dimensional. Note that you can naturally think of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) as the vector space of real \(3 \times 3\) matrices: we can write any element of \(\mathbb{R}^3 \otimes \mathbb{R}^3\) as \[\boldsymbol{v} = \sum_{ij} a_{ij} e_i \otimes f_j \, .\]

What makes tensor products interesting in the present context is that we can form new representations out of old ones by tensoring the vector spaces they act on.

Example 2.11. \(\mathbf{2} \otimes \bar{\mathbf{2}}\)[ex:2x2] Let’s say you have a vector space \(\mathbb{C}^2\) that ‘lives’ in the fundamental representation of \(SU(2)\), and one \(\mathbb{C}^2\) that lives in the anti-fundamental and you form their tensor product. The question we are asking is: how does \(SU(2)\) act on the tensor product? For a vector in \(\mathbb{C}^2\) in the fundamental we have \[\boldsymbol{z} \rightarrow g \boldsymbol{z}\] and in the antifundamental \[\boldsymbol{z} \rightarrow \bar{g} \boldsymbol{z} \, .\] This is how we would things down using a chosen fixed basis, \(\boldsymbol{v} = (z_1,z_2)\), so we might also write this more abstractly as (for the first case): \[v = \sum_i z_i e_i \rightarrow g_{ij} z_j e_i\] We can think of this as either acting with \(g\) on \(\boldsymbol{z}\) (this is called the active interpretation) or as acting with \(g^T\) on the tuple of basis vectors (this is called the passive interpretation): \[e_j \rightarrow g_{ij} e_i = g_{ji}^T e_i\] i.e. \[\begin{pmatrix} e_1 \\ e_2 \end{pmatrix}\rightarrow g^T \begin{pmatrix} e_1 \\ e_2 \end{pmatrix} \, .\] We can use either if we like it, and this will help us to figure out how to act on elements of \(\mathbb{C}^2 \otimes \mathbb{C}^2\). As the first copy transforms with \(g\) and the second with \(\bar{g}\) we have \[v = \sum_i a_{ij}\, e_i \otimes f_j \rightarrow \sum_{ijkl} a_{ij}\,\, \left( g_{ki} e_k \right)\, \otimes \, \left( \bar{g}_{lj} f_l \right) = \sum_{ijkl} g_{ki} a_{ij} \bar{g}_{lj} \,\, e_k \otimes f_l\] so in summary the vectors in \(\mathbb{C}^2 \otimes \mathbb{C}^2\) behave as \[a_{ij} \rightarrow \sum_{kl} g_{ik} a_{kl} g^\dagger_{lj}\, ,\] i.e. if we collect the \(a_{ij}\) in a matrix \(A\) we get \[A \rightarrow g A g^\dagger \, .\]

We can repeat the same logic to find out how arbitrary representations acting on vector spaces \(V\) and \(W\) act on \(V \otimes W\):

Definition 2.15. Let \(r_V(G) \in GL(V)\) and \(r_W(G) \in GL(W)\), and let the components of these matrices be \(r_V(G)_{ij}\) and \(r_W(G)_{ab}\). Then the tensor product representation \(r_{V \otimes W}\) acts on a vector \(\boldsymbol{U} \in V \otimes W\) with components \(U_{ia}\) as \[U_{ia}' := r_V(G)_{ij} r_W(G)_{ab} \,\, U_{jb} \, .\]

Example 2.12. \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\)
Continuing example [ex:2x2] we know that we can decompose the representation acting on \(\mathbb{C}^2 \otimes \mathbb{C}^2\) into irreducible representations. But which ones? As we have seen, \(SU(2)\) acts on \(a_{ij}\) as \[A \rightarrow A'\, \hspace{1cm} a_{ij}' = g_{ik} a_{kl} (g^\dagger)_{lj}\] or in matrix notation \[A \rightarrow A' = g A g^\dagger\, .\]

The trace of \(A\) hence transforms as \[\begin{equation} \label{eq2t2bar_tr} trA \rightarrow trg A g^{-1} = trA \, . \end{equation}\] Now what this implies is that the representation \(\mathbf{2} \otimes \bar{\mathbf{2}}\) is reducible, as we can never map matrices with a vanishing trace to ones with a non-vanishing trace. Let’s try to understand this a bit more clearly. The matrices \(A\) have the form \[A = \begin{pmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{pmatrix}\] and we think of the four complex components \(a_{ij}\) as components of a vector in a vector space \(V\) isomorphic to \(\mathbb{C}^4\) that we chose to write as a matrix. Within this vector space there is a complex three-dimensional vector subspace \(W\) defined by \(a_{11} + a_{22} = 0\), and as \(\eqref{eq2t2bar_tr}\) shows, the group action on \(V\) maps vectors in \(W\) again to vectors in \(W\), i.e. \(W\) is an invariant subspace. More concretely, \(W\) is the subspace of matrices of the form \[W = \left \{ A \left| A = \begin{pmatrix} z_1 & z_2 \\ z_3 & -z_1 \end{pmatrix}\right. , (z_1,z_2,z_3) \in \mathbb{C}^3\right\}\, .\] You might want to convince yourself that this is indeed a vector subspace. Similarly \(W^\perp\) is the one-dimensional subspace containing matrices of the form \[W ^\perp= \left \{ A \left| A = \begin{pmatrix} z_4 & 0 \\ 0 & z_4 \end{pmatrix}\right. , z_4 \in \mathbb{C}\right\}\, .\] which again form an invariant subspace under the group action \(\eqref{eq2t2bar_tr}\). The inner form under which this is \(^\perp\) is just the standard inner form on \(\mathbb{C}^4\), which we can write as \(\langle A, A' \rangle = \sum_{i,j} \bar{a}_{ij} a_{ij}'\) using two matrices \(A,A'\). Also note that for any \(A\) we can write \[A = \begin{pmatrix} z_1 & z_2 \\ z_3 & -z_1 \end{pmatrix} + \begin{pmatrix} z_4 & 0 \\ 0 & z_4 \end{pmatrix} \, .\] The above shows that the representation \(\mathbf{2} \otimes \bar{\mathbf{2}}\) is not irreducible, but decomposes into a one-dimensional and a three-dimensional complex representation, i.e. we can write \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1}^\perp \oplus \mathbf{1}\). The only remaining thing to show is hence that \(\mathbf{1}^\perp\) transforms in the \(\mathbf{3}\) of \(SU(2)\). The action here is the same as the adjoint representation of \(SU(2)\), except that we are acting on a complex vector space of dimension three instead of a real one. The irreducibility of the adjoint representation implies that there is no invariant complex subspace if we act on \(\mathbb{C}^3\) instead of \(\mathbb{R}^3\), so this is the \(\mathbf{3}\) of \(SU(2)\).

2.11. Consider the representation \(\mathbf{n} \otimes \bar{\mathbf{n}}\) of \(SU(n)\). Explain why this is always reducible. Can you identify the irreducible representations and invariant subspaces?

Proposition 2.4. \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\) : .

2.12.

  1. Find the transformation of elements of \(\mathbf{2} \otimes \mathbf{2}\).

  2. Show that the representations \(\mathbf{2}\) and \(\bar{\mathbf{2}}\) are isomorphic by showing they are related by a change of basis \[\boldsymbol{z}' = \begin{pmatrix} 0 & 1 \\ -1 & 0 \end{pmatrix}\boldsymbol{z}\] [Note: of course, \(\bar{v}\) transforms also as \(\bar{v} \rightarrow \bar{g} \bar{v}\) if \(v \rightarrow g v\). In a complex vector space, complex conjugation is not a change of basis however!]

  3. Use the above to argue that \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\). Can you identify the invariant subspaces?

REMARK: More generally, tensor products can be decomposed into irreducible representations, i.e. we may write \[r_{V \otimes W}(G) = \oplus_i r_{V_i}(G)\] whenever we know any representation of \(G\) can be decomposed into irreducible representations. The change of basis relating the natural basis of the tensor product to a basis showing the decomposition on the right hand side of the above equation is a well-known problem, and the coefficients appearing in the change of basis are called ’Clebsch-Gordan coefficients’. Details on this can be found in most books on quantum mechanics or , , and for the more mathematically minded.

REMARK: \(\ast\) There are a number of examples in physics in which \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\) and \(\mathbf{2} \otimes \mathbf{2} = \mathbf{1} \oplus \mathbf{3}\) plays an important role in organzing degrees of freedom of a theory. Two important ones are explained below:

spin \(\tfrac12\)
The rotation group in 3D is the group \(SO(3)\). As we have seen, \(SO(3) = SU(2)/ \mathbb{Z}_2\), and it turns out that the relevant group describing the behavior of rotations acting on fermions (such as electrons, protons, etc ..) is in fact is \(SU(2)\) and not \(SO(3)\). The relation between usual rotations and maps in \(SU(2)\) is exactly given by the homomorphism we constructed earlier, more on this topic will be discussed later. This means we can write down the wavefunction of a fermion as \[\psi = \begin{pmatrix} \psi_+ \\ \psi_- \end{pmatrix}\] which lives in the \(\mathbf{2}\) representation of \(SU(2)\) under rotations. If you have a system composed of two fermions \(\psi_1\) and \(\psi_2\), the total wavefunction \(\Psi\) is a tensor product of the two wavefunctions16 \[\Psi = \psi_i \otimes \psi_2\] and hence lives in the \(\mathbf{2} \otimes \mathbf{2}\). Decomposing this into irreducible representations, we find a singlet and a triplet of wavefunctions. This is what physicists sometimes call ‘addition of angular momentum’ and it explains why a Helium atom or Positronium have a singlet (‘para-’) or triplet (‘ortho-’) behavior under rotations. Note that the triplet we found behaves just as the adjoint of \(SU(2)\), which corresponds to the usual defining (‘vector’) representation of \(SO(3)\).

quarks
The two lightest quarks are the up and the down quark. The have nearly identical masses, but differ in their electric charges. The can form bound states which are called Baryons, and many different of these were found since the 40s. Until the discovery of quarks, people were bewildered by how many there were and how to organize them into some sort of pattern. The simplest bound states, called mesons, contain only two quarks. The force binding them together is the strong nuclear force which is a lot stronger than electromagnetism, and from the perspective of the strong force, the up and the down quark look identical if we forget the small mass difference. We can combine their wavefunctions into one \[\psi_q = \begin{pmatrix} \psi_u \\ \psi_d \end{pmatrix}\] and the statement that they are identical in strong interactions means there is an \(SU(2)\) symmetry acting in the \(\mathbf{2}\) on \(\psi_q\). Because this looks exactly like the \(SU(2)\) action of rotations on fermions, it was called ‘iso-spin’ which is a terrible name there is no other relation to spin than this. But this means that bound states of e.g. a quark and an anti-quark transform in the \(\mathbf{2} \otimes \bar{\mathbf{2}} = \mathbf{1} \oplus \mathbf{3}\) representation. The triplet are called ‘pions’ \((\pi^+, \pi^-, \pi^0)\).

In fact, there is a third quark called the strange quark which also has (almost) the same mass as up and down. This enhances the \(SU(2)\) to an \(SU(3)\) and we should be studying \(\mathbf{3} \oplus \bar{\mathbf{3}} = \mathbf{1} \oplus \mathbf{8}\) for mesons and \[\mathbf{3} \oplus \mathbf{3} \oplus \mathbf{3} = \mathbf{1} \oplus \mathbf{8} \oplus \mathbf{8} \oplus \mathbf{10}\] for Baryons which are made up of three quarks.

Mesons whose structure can be understood as representations of \mathbf{3}\otimes \bar{\mathbf{3}} of SU(3). As quarks are also fermions, mesons transform as \mathbf{2} \oplus \bar{\mathbf{2}} under rotations, which makes them split into a singlet (‘scalar’) and triplet (‘vector’). The particles on the left are scalar mesons and the ones on the right are vector mesons. [fig:mesons] Mesons whose structure can be understood as representations of \mathbf{3}\otimes \bar{\mathbf{3}} of SU(3). As quarks are also fermions, mesons transform as \mathbf{2} \oplus \bar{\mathbf{2}} under rotations, which makes them split into a singlet (‘scalar’) and triplet (‘vector’). The particles on the left are scalar mesons and the ones on the right are vector mesons. [fig:mesons]



Historically, this way of thinking was in fact used to motivate the existence of quarks by Murray Gell-Mann and Yuval Ne’eman in 1961 as they saw that observed particles could be fit into this pattern. They called it the ‘Eightfold way’ in a nod to Buddhism and since the adjoint of \(SU(3)\) is eight-dimensional. However, one particle in the \(\mathbf{10}\) had not been seen in experiments yet, so they predicted it. It is now called the \(\Omega^-\) and was discovered in 1964 which among other things earned Gell-Mann a Nobel price in 1969. If you want to read more about this story, is a good starting point.


  1. We will be more precise with these things later.↩︎

  2. This implies that \(x \circ x^{-1}=e\) as well. Let \(y = x \circ x^{-1}\). Then \[y = y^{-1} \circ y \circ y = y^{-1} \circ x \circ x^{-1} \circ x \circ x^{-1} = y^{-1}\circ y = e\,.\] Note how we made use of associativity here.↩︎

  3. It turns out a \(^\ast\)tiny\(^\ast\) little bit is okay, and that there are quantum effects that in fact do this. This is good news, as this is what is needed for baryogenesis in the early universe, i.e. is needed to explain why there is matter but no antimatter in the universe.↩︎

  4. The theoretical physicists Richard Feynman was famous for his approach of ‘example based research’: find an good example for what you want to study and understand it really well. Then develop the general theory such that the main features that ‘make it work’ are kept. Although it is not often presented like that, a lot of mathematics came about in this way.↩︎

  5. A sphere of dimension \(n\) is the set of points in \(\mathbb{R}^{n+1}\) for which \(x_1^2 + x_2^2 + ... + x_{n+1}^2=1\).↩︎

  6. We could also allow matrices with trace a multiple of \(2 \pi\) at this point, but we will ignore this possibility here. I invite you to come back to this point later to see why this is a good idea.↩︎

  7. Try to write down the most general complex \(2 \times 2\) matrix which obeys \(A^\dagger = A\) and \(trA = 0\) in terms of real numbers.↩︎

  8. They are named after Wolfgang Pauli who introduced them in the 1920s to describe the spin of electrons. Why and how that works will be explained later.↩︎

  9. For a group homomorphism, the kernel are those elements send to the identity element.↩︎

  10. Recall that \(\theta\) going from \(0\) to \(2 \pi\) was a full \(U(1)\) inside \(SU(2)\). Under this map this is mapped to a rotation that goes with double speed from \(0\) to \(4 \pi\). This is how it had to be such that \(-\mathds{1}\) in \(SU(2)\), which is \(\theta=\pi\), is mapped to \(\mathds{1}\) in \(SO(3)\) and is a consequence of the map \(F\) being two-to-one.↩︎

  11. This idea goes back to antiquity, although probably not for \(S^3\) or \(SU(2)\).↩︎

  12. A inner form is a symmetric bilinear map \(\langle \cdot, \cdot \rangle: V \times V \rightarrow \mathbb{R}\) s.t. \(\langle v, v \rangle \geq 0\) for all \(v \in V\) and \(\langle v, v \rangle = 0\) if and only if \(v=0\).↩︎

  13. Think of how you would differentiate \(A \boldsymbol{v}(t)\) for a constant matrix \(A\) and \(t\) dependent vector \(\boldsymbol{v}(t)\).↩︎

  14. Here, I have made the tacit assumption that \(w_m\) is unique, i.e. there is only a single eigenvector with the maximal eigenvalue \(m\). You can try to see what will happen if you repeat the following argument without this assumption, and you will find that this results in a reducible representation.↩︎

  15. Recall that the punchline here was that these representations and their irreducibility is both determined solely by representing the Pauli matrices, i.e. fixing \(\rho(\sigma_j)\). In the case of \(\mathfrak{sl}(2,\mathbb{C})\) its representation is a complex vector space with basis \(\{\rho(\sigma_j)\}\), in the case of \(\mathfrak{su}(2)\) you get a real vector space with basis \(\{i\rho(\sigma_j)\}\).↩︎

  16. In fact, it is an antisymmetric version (under exchange) if you have twice the same particle. I will ignore this in the present discussion as it does not really alter the conclusions.↩︎