Geometry of Mathematical Physics III
(Epiphany 2023-24)

Andreas Braun
andreas.braun@durham.ac.uk Department of Mathematical Sciences, Durham University

1 The Lorentz Group and its Representations

1.1 The Lorentz group and its Lie algebra

The Lorentz group is one of the most important examples of a Lie group appearing in physics. It arises in a very similar way to most of the groups we have discussed so far as a symmetry group that respects some quadratic form, in this case the ‘invariant length’ of special relativity. A detailed account of many elementary aspects of the Lorentz group can be found e.g. in

The fundamental postulate of relativity is that the speed of light is the same in all inertial frames. Let us take two points \(p\) and \(q\) in space-time through which a ray of light passes and assume that they have coordinates \(t_p,\boldsymbol{x}_p\) and \(t_q,\boldsymbol{x}_q\) in one inertial frame, and coordinates \(t_p',\boldsymbol{x}_p'\) and \(t_q',\boldsymbol{x}_q'\) in another. We hence need \[c^2 = (\boldsymbol{x}_p - \boldsymbol{x}_q)^2/(t_p-t_q)^2 = (\boldsymbol{x}_p' - \boldsymbol{x}_q')^2/(t_p'-t_q')^2\] In other words \[- c^2(t_p-t_q)^2 + (\boldsymbol{x}_p - \boldsymbol{x}_q)^2 = 0\] must be invariant under a change of frames. It is not hard to come up with coordinate transformation that satisfy this requirement, e.g a rotation \(\in SO(3)\) acting purely on the coordinates \(\boldsymbol{x}\) works. If time is involved in our coordinate change, we need to take the relative minus sign into account. An example would be acting the matrix \[\begin{equation} \label{eq:simp_boost} \Lambda_{01} = \begin{pmatrix} \cosh (\lambda) & -\sinh (\lambda) & 0 & 0 \\ -\sinh (\lambda) & \cosh (\lambda) & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}\, . \end{equation}\] This keeps \(-(ct)^2 + x_1^2\) invariant as \[\begin{aligned} -(ct)^2 + x_1^2 &\rightarrow -(ct')^2 + (x_1')^2 \\ &=- (\cosh (\lambda) \,\, ct - \sinh(\lambda)\,\, x_1 )^2 + (-\sinh (\lambda)\,\, c t + \cosh(\lambda)\,\, x_1 )^2 \\ & = -(ct)^2 (\cosh^2 (\lambda) - \sinh^2 (\lambda)) + x_1^2 (\cosh^2 (\lambda) - \sinh^2 (\lambda))\\ &= -(ct)^2 + x_1^2 \end{aligned}\] as \(\cosh^2 (\lambda) - \sinh^2 (\lambda)=1\) for any \(\lambda\) (this is the hyperbolic analogue of \(\cos^2\phi + \sin^2\phi =1\)).

Note that the origin of the primed system at \(x_1'=0\) satisfies \[-\sinh (\lambda)\,\, c t + \cosh(\lambda)\,\, x_1 = 0\] so that it moves in the unprimed system with a velocity \[v = x_1/t = c \, \frac{\sinh(\lambda)}{\cosh(\lambda)} = c \tanh(\lambda) = c \,\, \frac{e^\lambda - e^{-\lambda}}{e^\lambda + e^{-\lambda}} < c\, .\] For this reason \(\lambda\) is called rapidity in the literature. Note that for every \(\lambda\), this speed is always less that the speed of light. Instead of using such transformations to figure out time dilation, length contraction, etc ... we are going to examine the structure of

Definition 1.1. The Lorentz group \(L\) is the group of linear maps on \(\mathbb{R}^4\) (with coordinates \((x^0,x^1,x^2,x^3)\)) that preserve the quadratic form \[|x|_M^2 \equiv - (x^0)^2 + (x^1)^2 + (x^2)^2 + (x^3)^2\]

REMARK: \(\mathbb{R}^4\) with this quadratic form is also often called \(\mathbb{R}^{1,3}\) or ‘Minkowski space’. It is then appropriate to call the Lorentz group \(O(1,3)\). We have already learned that the principle of relativity is obeyed by (at least) two types of transformations: rotations in \(\mathbb{R}^3\) which leave time untouched, and boosts such as \(\eqref{eq:simp_boost}\) which mediate between relatively moving systems.

Note that \(|x|_M^2\) is not an inner form as it is not positive definite.

For two coordinate systems with relative velocity \(\boldsymbol{v}\) the coordinate change is

Definition 1.2. A boost associated with two relatively moving inertial frames with relative speed \(\boldsymbol{v}\) is a Lorentz transformation \(B\) with \(B(\boldsymbol{v})^0_{\,\,0} = \cosh{\lambda}\), \(B(\boldsymbol{v})^i_{\,\,0} = B(\boldsymbol{v})^0_{\,\,i} = -v^i/c \,\,\cosh{\lambda}\), and \[\begin{equation} \label{eq:boostij} B(\boldsymbol{v})^i_{\,\,k} = \delta^{i}_{\,\,k} + \frac{(\cosh \lambda)^2}{1+ \cosh \lambda} \frac{v^i v^k}{c^2}\, . \end{equation}\] where \(\tanh \lambda = |\boldsymbol{v}|/c\).

In order to facilitate the book-keeping of the minus sign in this definition1the following notation is in widespread use. Define \((x^0,x^1,x^2,x^3) = (ct,x,y,z)\) as the ‘four-vector’ of coordinates combining spatial coordinates and time. Define \[x_\mu \equiv \eta_{\mu \nu} x^\nu\] where \(\eta_{\mu \nu}\) are the components of the diagonal matrix \[\eta = \begin{pmatrix} -1 & & & \\ & 1 & & \\ & & 1 & \\ & & & 1 \end{pmatrix}\,\] and we are using the summation convention. The inverse of the matrix \(\eta\) is clearly \(\eta\) again, we need to put the indices up as this satisfies \[x^\mu = \eta^{\mu \nu} x_\nu\] where \(\eta^{\mu \nu} = \left(\mbox{diag}(-1,1,1,1)\right)^{\mu \nu}\). Note that \[\eta_{\mu \nu} \eta^{\nu \rho} = \delta_\mu{}^\rho \, ,\] \(\delta_\nu{}^\rho\) is the usual Kronecker delta which is \(1\) if both indices are equal and zero otherwise. We can hence write the length \(|x|_M\) of a vector in Minkowski space as \[|x|_M^2 = x^\mu x^\nu \eta_{\mu \nu} = x_{\mu} x^{\mu} = x_\mu x_\nu \eta^{\mu \nu} \, .\]

Let \(\Lambda\) have components \(\Lambda^\mu_{\,\,\nu}\) and assume \(\Lambda\) linearly maps a 4-vector \(\boldsymbol{x}\) to a 4-vector \(\boldsymbol{x}'\) \[x^\mu{}' = \Lambda^\mu_{\,\,\sigma} x^\sigma \, .\] Now if \(\Lambda\) is in the Lorentz group we need \(|x'|_M^2 = |x|_M^2\), i.e. \[|x'|_M^2 = \Lambda^\mu_{\,\,\sigma} x^\sigma \Lambda^\nu_{\,\,\rho} x^\rho \eta_{\mu \nu} = x^\sigma x^\rho \Lambda^\mu_{\,\,\sigma} \eta_{\mu \nu} \Lambda^\nu_{\,\,\rho}= x^\mu x^\nu \eta_{\mu \nu} \, .\] In other words \[\Lambda^\mu_{\,\,\sigma} \Lambda^\nu_{\,\,\rho}\eta_{\mu \nu} = \eta_{\sigma \rho}\] or in matrix notation \[\Lambda^T \eta \Lambda = \eta \, \Rightarrow \eta \,\Lambda^T \eta = \Lambda^{-1}\] Up to the insertion of \(\eta\)s \(\Lambda^T\) is hence the same as \(\Lambda^{-1}\). Note that we have the transformation behaviour \[\begin{aligned} x^\mu & \rightarrow x'^\mu = \Lambda^\mu_{\,\, \nu} x^\nu \\ x_\mu = \eta_{ \mu \rho }x^\rho & \rightarrow x'_\mu = \eta_{ \mu \rho } x'^\rho = \eta_{ \mu \rho } \Lambda^\rho_{\,\, \nu} x^\nu = \eta_{ \mu \rho } \Lambda^\rho_{\,\, \nu} \eta^{\nu \sigma} x_\sigma = x_\sigma (\eta \Lambda^T \eta)^\sigma_{\,\,\mu} = x_\sigma (\Lambda^{-1})^\sigma_{\,\,\mu} \end{aligned}\] Thats how it had to be, as we constructed Lorentz transformations in such a way that \(x_\mu x ^\mu\) is invariant!
Objects \(x^\mu\) transforming as above are called ‘Lorentz vectors’. Objects transforming like \(x_\mu\) are called ‘Lorentz covectors’. We can think of the matrix \(\eta\) as a map which sends every vector to a covector and vice-versa.
Whenever we contract upper and lower indices, we hence get something that is invariant under the Lorentz group. By extension, it is customary to put upper/lower indices on objects that have the same transformation behaviour as \(x^\mu\) and \(x_\mu\). The same rule for constructing invariants then exists there as well. The positioning of indices hence serves as a book keeping device for the transformation behaviour and consequently for the constructing of Lorentz scalars, i.e. invariant quantities.

1.1. Consider a Lorentz vector with components \(x^\mu\), which transforms under Lorentz transformations as \[x^\mu \rightarrow x'^{\mu} = \Lambda^\mu_{\,\,\, \nu} x^\nu \, .\] Note that throughout this problem we are using summation convention.

  1. Let \(f^{\mu \nu} \equiv x^\mu x^\nu\). Find the transformation behavior of \(f^{\mu \nu}\), \(f^{\mu}_{\,\,\,\, \nu}=x^\mu x_\nu\) and \(f_{\mu \nu}= x_\mu x_\nu\) under Lorentz transformations.

  2. For another Lorentz vector \(y^\mu\), find the transformation behavior of \(f^{\mu \nu} y_\mu\) under Lorentz transformations.

  3. Compute \[\sum_\mu \frac{\partial}{\partial x^\mu} x^\mu \, .\]

  4. Work out the transformation behavior of \[\frac{\partial}{\partial x^\mu}\] under Lorentz transformations. Use c) to argue for the same result.

Let us now examine the global structure of the Lorentz group \(L\). Clearly, the determinant of \(\Lambda\) is \(\pm 1\), so that we get two disconnected components \(L_\pm\), just as for \(SO(3)\). The component \(L_+\) that is connected to the identity is called proper Lorentz group. Furthermore the \((0,0)\) component of \(\eta \Lambda^T \eta \Lambda = \mathds{1}\) implies \[\begin{equation} \label{eq:1strowoflambda} 1 = \left(\Lambda^{0}_{\,\, 0 }\right)^2 - \left(\Lambda^{0}_{\,\, 1 }\right)^2 - \left(\Lambda^{0}_{\,\, 2 }\right)^2 - \left(\Lambda^{0}_{\,\, 3 }\right)^2 \, , \end{equation}\] so that \(\left( \Lambda^{0}_{\,\, 0}\right) ^2 \geq 1\) which has again two components:

The orthochronous transformations keep the arrow of time pointing in the same direction. Altogether we hence have four components. The maps \(\Lambda_T =\mbox{diag}(-1,1,1,1)\) (time reversal) and \(\Lambda_P =\mbox{diag}(1,-1,-1,-1)\) (parity) generate the whole group together with \(L_+^\uparrow\): we can use \(\Lambda_T\), \(\Lambda_P\) and \(\Lambda_T \Lambda_P\) to map any group element to \(L_+^\uparrow\), which implies we can write any group element in \(L\) as a product of \(\Lambda \in L_+^\uparrow\) with \(\Lambda_T^a \Lambda_P^b\) for \(a,b \in (0,1)\).


The component of \(L\) that is continously connected to the identity is the proper orthochronous Lorentz group \(L_+^\uparrow\). \(L_+^\uparrow\) admits the following decomposition

Theorem 1.1. \(^\ast\) Every proper orthochronous Lorentz transformation \(\Lambda \in L_+^\uparrow\) has a unique decomposition as \[\Lambda = B(\boldsymbol{v}) \begin{pmatrix} 1 & \\ & R \end{pmatrix}\] where \(B(\boldsymbol{v})\) is a boost with parameter \[v^i/c = \Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0}\] and \(R\) is an element of \(SO(3)\) given by \[R^{ik} = \Lambda^i_{\,\, k} - \frac{1}{1+\Lambda^0_{\,\, 0}} \Lambda^i_{\,\, 0} \Lambda^0_{\,\, k}\, .\]

: First of all, it follows from \(\eqref{eq:1strowoflambda}\) that \(\sum_i (\Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0})^2 < 1\) as \[\sum_i (\Lambda^{i}_{\,\, 0} / \Lambda^{0}_{\,\, 0})^2 = \frac{(\Lambda^0_{\,\,0})^2-1}{(\Lambda^0_{\,\,0})^2} < 1 \, .\] A boost associated to the speed \(\boldsymbol{v}/c\) hence makes sense. From definition Definition 1.2 above it follows that \(B^0_{\,\, 0} (\boldsymbol{v}) = \cosh \lambda = \Lambda^0_{\,\, 0}\) and \(B^0_{\,\, i} (\boldsymbol{v}) = -v^i/c \cosh \lambda = \Lambda^0_{\,\, i}\). Hence \[B^i_{\,\, j}(\boldsymbol{v}) = \delta^i_{\,\, j} + \frac{1}{1+\Lambda^0_{\,\,0}} \Lambda^0_{\,\,i}\Lambda^0_{\,\,j}\] using \(\eqref{eq:boostij}\). We now show that \[\mathcal{R} := B(-\boldsymbol{v}) \Lambda = B^{-1}(\boldsymbol{v}) \Lambda\] is indeed a rotation and \(\mathcal{R} = 1 \oplus R\), which finishes the proof. We work out \[\begin{aligned} \mathcal{R}^{0}_{\,\, 0} &= (\Lambda^0_{\,\,0})^2 - \sum_i (\Lambda^i_{\,\,0})^2 = 1\\ \mathcal{R}^{0}_{\,\, i} &= \Lambda^0_{\,\,0}\Lambda^0_{\,\,i} - \sum_j \Lambda^j_{\,\,0}\Lambda^j_{\,\,i} = 0 \\ \mathcal{R}^{i}_{\,\, k} &= \Lambda^i_{\,\, k} - \frac{1}{1+\Lambda^0_{\,\, 0}} \Lambda^i_{\,\, 0} \Lambda^0_{\,\, k} \end{aligned}\, .\] Here we used \(\Lambda^T \eta \Lambda = \eta\) repeatedly. This is a rotation with the right block-diagonal structure as claimed. \(\square\)


To understand the global structure of \(L_+^\uparrow = SO(1,3)_+\), we can repeat the trick we used when describing the relationship between \(SO(3)\) and \(SU(2)\). For a 4-vector \((x^0,x^1,x^2,x^3)\) we write it as a matrix \(M_x\) with \(M_x^\dagger = M_x\): \[M_x := \begin{pmatrix} x^0 + x^3 & x^1 - ix^2 \\ x^1 + i x^2 & x^0-x^3 \end{pmatrix}\, .\] We can now formulate a map \(SL(2,\mathbb{C}) \rightarrow L\) by sending \(g \in SL(2,\mathbb{C})\) \[g \rightarrow F(g) \hspace{1cm} F(g) M_x := g M_x g^\dagger \, .\]

Proposition 1.1. \(F(g)\) is a surjective group homomorphism from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).

:

1.2. .

  1. Show that \(F\) is a surjective homomorphism from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).
    hint: Try to follow a similar logic as for the homomorphism from \(SU(2)\) to \(SO(3)\) studied before. You can take for granted that \(SL(2,\mathbb{C})\) is connected.

  2. For a rotation in the \(x^1,x^2\)-plane, find the element \(g \in SL(2,\mathbb{C})\) that is mapped to it by \(F\). Repeat the same for a boost along the \(x^1\) direction.

Finally, we can work out the Lie algebra of the Lorentz group. As we have seen, a general Lorentz transformation is uniquely given in terms of an element of \(SO(3)\) (which is real three-dimensional) and a boost (which is parametrized by a real three-dimensional vector \(\boldsymbol{v}\)). We hence conclude that the Lorentz group is a real six-dimensional manifold. This fits with the fact that a real \(4 \times 4\) matrix has \(16\) components and \(\Lambda^T \eta \Lambda = \eta\) imposes \(10\) independent constraints. Using rotation and boost matrices like \(\eqref{eq:simp_boost}\) with parameters gives us paths in the group, and we find that the Lie algebra is generated by the six matrices \[\begin{aligned} l^{01} = \begin{pmatrix} 0 & -1 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}\,\,\, l^{02} = \begin{pmatrix} 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix} \,\,\, l^{03} = \begin{pmatrix} 0 & 0 & 0 & -1 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ -1 & 0 & 0 & 0 \\ \end{pmatrix} \\ l^{12} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix}\,\,\, l^{13} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\ \end{pmatrix} \,\,\, l^{23} = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & -1 & 0 \\ \end{pmatrix} \end{aligned}\] These can be summarized by \[\begin{equation} \label{eq:ellcomponents} (l^{\mu\nu})^\alpha_{\,\, \beta} = \eta^{\mu \alpha} \delta^\nu_{\,\, \beta} - \eta^{\nu \alpha} \delta^\mu_{\,\, \beta} \, . \end{equation}\] Note that \(\mu\) and \(\nu\) in the equation above label different elements of the Lie algebra, and \(\alpha,\beta\) are the components of the corresponding matrix.

1.3. Verify that the matrices above are elements in the Lie algebra of the Lorentz group.

After a slightly tedious computation one finds that they obey the Lie algebra \[= -\eta^{\mu \rho} l^{\nu \sigma} -\eta^{\nu \sigma} l^{\mu \rho} + \eta^{\mu \sigma} l^{\nu \rho} + \eta^{\nu \rho} l^{\mu \sigma}\]

1.2 Representations of the Lorentz group

Let us now investigate representations of the Lorentz group. We have already seen the defingin representation: \[x^\mu \rightarrow \Lambda^\mu_\nu x^\nu\] with \[\Lambda^T \eta \Lambda = \eta\] so that \[x^\mu x_\mu = x^\mu \eta_{\mu \nu} x^\nu = - (x^0)^2 + (x^1)^2 + (x^2)^2 + (x^3)^2\] stays invariant. Now we will ask about other representations of this group. Note that \(SO(3)\) is a subgroup of \(L_+^\uparrow\), and that the fundamental representation of its spin group, \(SU(2)\), had physical significance as a spinor.

As \(SO_+(1,3)= L_+^\uparrow\) has \(SL(2,\mathbb{C})\) as a double covering group (Proposition Proposition 1.1), so it will not be suprising if we make the

Definition 1.3. The group \(Spin(1,3)\) is equal to the group \(SL(2,\mathbb{C})\).

And it is again a fact of life that what matters to describing relativistic processes in the real world, are representations of \(SL(2,\mathbb{C}) = Spin(1,3)\) instead of representations of \(L\).

1.2.0.1 Spinors of the Lorentz Group

For \(SO(3)\) we found irreducible representations by using Lie algebra of \(SO(3)\), which is the same as the Lie algebra of \(SU(2)\). Not all representations of this algebra descended to representations of \(SO(3)\), but the extra representations we found were exactly the ‘spin 1/2’ spinorial representations of \(SU(2)\) of physical significance. We can use a similar strategy here, which leads us to what are called spinors of the Lorentz group. Our presentation of spinors mostly follows , see also . Note that these books use somewhat different convention however.
Recall the Lorentz algebra \[\begin{equation} \label{eq:Lalgebra} [l^{\mu\nu},l^{\rho\sigma}] = -\eta^{\mu \rho} l^{\nu \sigma} -\eta^{\nu \sigma} l^{\mu \rho} + \eta^{\mu \sigma} l^{\nu \rho} + \eta^{\nu \rho} l^{\mu \sigma}\, . \end{equation}\]

Proposition 1.2. Let \(\gamma^\mu\), \(\mu=0,1,2,3\) be matrices that obey the algebra \[\{\gamma^\mu,\gamma^\nu\} := \gamma^\mu\gamma^\nu + \gamma^\nu\gamma^\mu = 2 \eta^{\mu\nu} \mathds{1}\, .\] Then we can construct a representation of the Lorentz algebra, \(\eqref{eq:Lalgebra}\), using the matrices \[S^{\mu \nu} := \tfrac14 [\gamma^\mu,\gamma^\nu]\, .\]

: we need to check that the \(S^{\mu \nu}\) satisfy the Lorentz algebra. First note that the relation \(\{\gamma^\mu,\gamma^\nu\} = 2\eta^{\mu \nu}\) implies that \[\gamma^\mu \gamma^\nu = -\gamma^\nu \gamma^\mu \hspace{1cm} \mbox{for} \,\,\ \mu \neq \nu\] and \[(\gamma^\mu)^2 = \eta^{\mu \mu} \mathds{1}\hspace{1cm} \mbox{(no summation)}\] We can now work out the commutator of \([S^{\mu \nu},S^{\rho \sigma}]\). First note that \(\mu \neq \nu\) and \(\rho \neq \sigma\) as the \(S\) otherwise vanish (as do the corresponding \(\ell\). Hence \(S^{\mu \nu} = \tfrac12 \gamma^\mu \gamma^\nu\) and \(S^{\rho \sigma} = \tfrac12 \gamma^\rho \gamma^\sigma\). Let us first assume that \(\mu,\nu,\rho,\sigma\) are all different. We get \[\begin{aligned}[] (\mu,\nu,\rho,\sigma \hspace{.2cm} \mbox{all different}):\\ [S^{\mu \nu},S^{\rho \sigma}] &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - \gamma^\rho \gamma^\sigma {\color{red}\gamma^\mu} \gamma^\nu \right) \\ & \hspace{3.7cm} {\color{red}\swarrow} \\ &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - {\color{red}\gamma^\mu}\gamma^\rho \gamma^\sigma {\color{blue}\gamma^\nu} \right) \\ & \hspace{4.1cm} {\color{blue}\swarrow} \\ &= \frac{1}{4} \left( \gamma^\mu \gamma^\nu \gamma^\rho \gamma^\sigma - \gamma^\mu {\color{blue}\gamma^\nu} \gamma^\rho \gamma^\sigma \right) = 0 \end{aligned}\] As the colours and arrows are supposed to show you, this looks more complicated than it is. All we have done in the second equality is swapped the \(\gamma^\mu\) with \(\gamma^\rho\) and \(\gamma^\sigma\), which produced two minus sign, hence no sign at all. In the third equality we did the same with \(\gamma^\nu\). This is the same as what \(\eqref{eq:Lalgebra}\) tells us.

Now we assume that \(\mu = \rho\) (note that there is no summation over \(\mu\) in the below expressions): \[\begin{aligned} (\mu = \rho): & \\ & [S^{\mu \nu},S^{\rho \sigma}] &=& \frac{1}{16} \left[[\gamma^\mu,\gamma^\nu],[\gamma^\rho,\gamma^\sigma] \right] = \frac{1}{16} \left[[\gamma^\mu,\gamma^\nu],[\gamma^\mu,\gamma^\sigma] \right] \\ && =& \frac{1}{16}\left[2 \gamma^\mu \gamma^\nu,2\gamma^\mu \gamma^\sigma \right] = \frac{1}{4} \left(\gamma^\mu \gamma^\nu \gamma^\mu \gamma^\sigma - \gamma^\mu \gamma^\sigma \gamma^\mu \gamma^\nu \right)\\ && =& \frac{1}{4} \left(-(\gamma^\mu)^2\gamma^\nu \gamma^\sigma + (\gamma^\mu)^2\gamma^\sigma\gamma^\nu \right) = -\eta^{\mu \mu} S^{\nu \sigma} \, . \end{aligned}\] Here we only had to swap \(\gamma^\mu\) with \(\gamma^\nu\) in the first term and with \(\gamma^\sigma\) in the second term, each giving a minus sign. The final result is exactly what we find from \(\eqref{eq:Lalgebra}\) when \(\mu = \rho\). The remaining cases can be worked out analogously. \(\square\).
REMARK:Algebras of the type \(\{\gamma^a,\gamma^b \}= 2 \eta^{ab}\) where \(\eta^{ab}\) is a symmetric diagonal matrix with entries \(\pm 1\) are called ‘Clifford algebras’. We have already seen an example when discussing the Pauli matrices: the Pauli matrices obey a Clifford algebra generated by three elements with \(\eta^{ab}= \mbox{diag}(1,1,1)\).

When trying to find explicit examples of the four \(\gamma^\mu\) for \(\mu=0,1,2,3\) the above remark is useful hint. It turns out we need at least \(4 \times 4\) matrices, and one possible choice is

Definition 1.4. The Dirac matrices are \[\gamma^0 = \begin{pmatrix} 0 & \mathds{1}_{2 \times 2} \\ -\mathds{1}_{2 \times 2} & 0 \end{pmatrix}\, , \hspace{1cm} \gamma^i = \begin{pmatrix} 0 & \sigma_i \\ \sigma_i & 0 \end{pmatrix}\,\,\, i = 1,2,3\] where \(\mathds{1}\) is the \(2\times 2\) identity matrix and \(\sigma_i\) are the Pauli matrices \[\sigma_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_2 = \begin{pmatrix} 0 & -i \\ i & 0 \end{pmatrix} \,\,, \hspace{.3cm} \sigma_3 = \begin{pmatrix} 1 & 0 \\ 0 & -1 \end{pmatrix}\, .\] Note that the \(\gamma^\mu\) are \(4 \times 4\) matrices which we have written in a \(2 \times 2\) block structure using the \(2 \times 2\) Pauli matrices.

Proposition 1.3. The Dirac matrices obey \(\{\gamma^\mu,\gamma^\nu\} = 2 \eta^{\mu \nu} \mathds{1}_{4 \times 4}\)

:

1.4. .

  1. Show that the Dirac matrices obey \(\{\gamma^\mu,\gamma^\nu\} = 2 \eta^{\mu \nu} \mathds{1}_{4 \times 4}\).

  2. Show the ‘freshers dream’: \[\left(a_\mu \gamma^\mu\right)^2 = a_\mu a^\mu \mathds{1}_{4 \times 4}\]

REMARK:This is not the only realization one can write down (and not Dirac’s original matrices). The above version is often called the ‘Weyl’ or ‘chiral’ representation.

Proposition 1.4. Using the Dirac matrices, the algebra generators \(S^{\mu \nu}\) are \[S^{0i} = \tfrac12 \begin{pmatrix} \sigma_i & 0 \\ 0 & -\sigma_i \end{pmatrix}\,\,, \hspace{1cm} S^{jk} = \frac{i}{2} \epsilon_{jkl} \begin{pmatrix} \sigma_l & 0 \\ 0 & \sigma_l \end{pmatrix}\]

:

1.5. Using the Dirac matrices, check that the algebra generators \(S^{\mu \nu} = \tfrac14 [\gamma^\mu,\gamma^\nu]\) can be written as \[S^{0i} = \frac{1}{2} \begin{pmatrix} \sigma_i & 0 \\ 0 & -\sigma_i \end{pmatrix}\,\,, \hspace{1cm} S^{jk} = \frac{i}{2} \epsilon_{jkl} \begin{pmatrix} \sigma_l & 0 \\ 0 & \sigma_l \end{pmatrix}\, .\]

Definition 1.5. A vector \(\Psi \in \mathbb{C}^4\) transforming under \(Spin(1,3)\) as \[\Psi \rightarrow \Psi' = e^{S^{\mu\nu} \theta_{\mu \nu}} \Psi \equiv \Lambda_{\frac{1}{2}}\Psi\,\,, \hspace{.5cm} \theta_{\mu\nu} \in \mathbb{R}\] is called a Dirac spinor.

REMARK:Note that a Dirac spinor transforms in a reducible representation, as the matrices \(S^{\mu\nu}\) are block-diagonal. The irreducible representations we find by restricting to the blocks are called

Definition 1.6. Decomposing \(\Psi = (\psi_L,\psi_R)\), the objects \(\psi_L\) and \(\psi_R\) are called left-handed, and right-handed Weyl spinors, respectively.

1.6. .

For an element \(\Lambda(\theta) = e^{l^{12}\theta}\) of the Lorentz group (\(l^{12}\) is one of the generators of the Lorentz algebra introduced in the lectures) show that \(\Lambda(0) =\Lambda(2 \pi) = \mathds{1}\). Now compare this behavior to the corresponding element of the representation acting on a Dirac spinor: \(\Lambda_{1/2}(\theta) = e^{S^{12}\theta}\).

Let \(\gamma^5:= i \gamma^0\gamma^1\gamma^2\gamma^3\). What is \(\tfrac12 \left(\gamma^5 \pm \mathds{1}\right)\Psi\) for \(\Psi\) a Dirac spinor written in terms of Weyl spinors?

Having defined the ‘Dirac spinor’ representation of the (spin group of the) Lorentz group, we may ask how we can construct Lorentz scalars out of it. Let us denote the complex conjugate of \(\Psi\) by \(\Psi^*\), an obvious guess might then be \[\Psi^* \cdot \Psi = \Psi^*_I \Psi_I\] where \(\Psi_I\) are the components of \(\Psi\). It turns out it is not quite (but almost) this easy. The problem here is that \[\Lambda_{1/2}^\dagger \neq \Lambda_{1/2}^{-1}\]

Definition 1.7. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) and \(\Psi^*\) its complex conjugate, we let \[\bar{\Psi} \equiv \Psi^* \gamma^0 \hspace{1cm} \mbox{i.e.} \hspace{1cm} \bar{\Psi}_I \equiv \Psi^*_I \gamma^0_{IJ}\]

Note the slight break with the general convention that a bar signifies complex conjugation, but the above notation is almost universally used, so I will follow this as well.

Proposition 1.5. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) \[\bar{\Psi} \Psi = \Psi^*_I \gamma^0_{IJ} \Psi_J\] is a Lorentz scalar.

: A direct computation (see problems class) shows that \[\Lambda_{1/2}^\dagger \gamma^0 = \gamma^0 \Lambda_{1/2}^{-1} \, .\] Now we can work out \[\begin{aligned} \bar{\Psi} \Psi& = \Psi^*\gamma^0 \Psi \\ &\rightarrow \Psi^* \Lambda_{1/2}^\dagger \gamma^0 \Lambda_{1/2} \Psi = \Psi^* \gamma^0 \Lambda_{1/2}^{-1} \Lambda_{1/2} \Psi = \bar{\Psi} \Psi \, . \end{aligned}\] \(\square\)

Theorem 1.2. For a Dirac spinor \(\Psi\) with components \(\Psi_I\) the expression \[\bar{\Psi} \gamma^\mu \Psi = \Psi^*_I \gamma^0_{IJ} \gamma^\mu_{JK} \Psi_K\] transforms as a Lorentz vector.

Note that this means we can effectively take the \(^\mu\) index we gave the Dirac matrices seriously, which is the reason for this notation. Before showing this, we need an important lemma:

Lemma 1.1. The matrices \(\Lambda_{\frac{1}{2}} = e^{S^{\mu\nu} \theta_{\mu \nu}}\) satisfy \[\Lambda_{\frac{1}{2}}^{-1} \gamma^\mu \Lambda_{\frac{1}{2}} = \Lambda^\mu_{\,\, \nu}\gamma^\nu = \left(e^{\,l^{\,\rho \sigma} \theta_{\rho \sigma}} \right)^\mu_{\,\,\,\, \nu}\gamma^\nu\, .\]

: First we show that \[= (l^{\rho \sigma})^{\mu}_{\,\,\,\nu} \gamma^\nu\, .\] Don’t get confused by the rhs of this equation: \(\rho\) and \(\sigma\) label the matrices \(l\), and we are talking about the \(\mu\) and \(\nu\) components of that matrix. As observed earlier in the lectures, these can be written as \[(l^{\rho \sigma})^\mu_{\,\, \nu} = \eta^{\rho \mu} \delta^\sigma_{\,\, \nu} - \eta^{\sigma \mu} \delta^\rho_{\,\, \nu} \, .\] Let’s first take \(\mu \neq \rho\) and \(\mu \neq \sigma\). The rhs then vanishes and we can the work out the lhs as \[2[\gamma^\mu,\gamma^\rho \gamma^\sigma] = 2 (\gamma^\mu \gamma^\rho \gamma^\sigma - \gamma^\rho \gamma^\sigma \gamma^\mu) = 0\, .\] Now we take \(\mu =\rho\neq \sigma\) and compute \[(\mu =\rho): \hspace{.5cm}[\gamma^\mu,S^{\rho\sigma}] = 2[\gamma^\mu,\gamma^\mu \gamma^\sigma] = \eta^{\mu \mu}\gamma^\sigma \,\,\, (\mbox{no summation})\] which equals the rhs of what we want to show for \(\mu =\rho\neq \sigma\). Finally, we take \(\mu =\sigma\neq \rho\) and find \[(\mu =\sigma): \hspace{.5cm}[\gamma^\mu,S^{\rho\sigma}] = 2[\gamma^\mu,\gamma^\rho \gamma^\mu] = -\eta^{\mu \mu}\gamma^\rho \,\,\, (\mbox{no summation})\] which equals the rhs of what we want to show for \(\mu =\sigma\neq \rho\).
The above is equivalent to the statement that, for very small \(\theta_{\mu\nu}\) \[\begin{equation} \label{eq:proof_wonderful_eq_Lambda} (\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}) \gamma^\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma}) = \left(\delta^\mu_{\,\, \nu} + \left(\ell^{\rho \sigma} \theta_{\rho \sigma}\right)^\mu_{\,\, \nu}\right) \gamma^\nu \end{equation}\] Let’s look at this equation from the following perspective: consider the vector space of matrices spanned by the \(\gamma^\mu\). We can write any element of such a vector space as \(A := a_\mu \gamma^\mu\). The right hand side can be understood as a linear map acting on \(A\) mapping it to \[A' = a_\mu \left(\delta^\mu_{\,\, \nu} + \left( \ell^{\rho \sigma} \theta_{\rho \sigma}\right)^\mu_{\,\, \nu}\right) \gamma^\nu\] and \(\eqref{eq:proof_wonderful_eq_Lambda}\) says that (for \(\theta_{\rho \sigma}\) very small) we can also write this map as \[A' = (\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}) A (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma})\] We can apply the same map \(n\) times to find \[(\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma})^n \gamma_\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma})^n = \left(\left(\mathds{1}+ \ell^{\rho \sigma} \theta_{\rho \sigma}\right)^n\right)^\mu_{\,\, \nu} \gamma^\nu\] so also \[\lim_{n\rightarrow \infty} (\mathds{1}- S^{\rho \sigma} \theta_{\rho \sigma}/n)^n \gamma_\mu (\mathds{1}+ S^{\rho \sigma} \theta_{\rho \sigma}/n)^n = \lim_{n\rightarrow \infty} \left(\left(\mathds{1}+ \ell^{\rho \sigma} \theta_{\rho \sigma}/n\right)^n\right)^\mu_{\,\, \nu} \gamma^\nu\] which shows what we wanted to show using the description of the matrix exponential established before. \(\square\)

(of the theorem): We can now work out \[\bar{\Psi} \gamma^\mu \Psi \rightarrow \Psi^\ast \gamma^0 \Lambda_{1/2}^{-1} \gamma^\mu \Lambda_{1/2} \Psi = \Psi^\ast \gamma^0 \Lambda^{\mu}_{\,\,\,\nu} \gamma^\nu \Psi = \Lambda^{\mu}_{\,\,\, \nu} \bar{\Psi} \gamma^\nu \Psi\] where we have used the identity \(\Lambda_{1/2}^{-1} \gamma^\mu \Lambda_{1/2} = \Lambda^{\mu}_{\,\,\,\nu} \gamma^\nu\) shown in the lemma above. \(\square\)

Corollary 1.1. For a Lorentz vector \(a^\mu\), \(a_\mu \bar{\Psi} \gamma^\mu \Psi \equiv \bar{\Psi} \slashed{a} \Psi\) transforms as a Lorentz scalar.

: We have already seen that \(a_\mu b^\mu\) for \(a^\mu\) and \(b^\mu\) any Lorentz vectors gives us a scalar. In the theorem above we saw that \(b^\mu = \bar{\Psi} \gamma^\mu \Psi\) is a Lorentz vector, so the statement follows.

1.7. How does \[B^{\mu \nu} \equiv \bar{\Psi} \gamma^\mu \gamma^\nu \Psi\] transform under Lorentz transformations for \(\Psi\) a Dirac spinor?

1.8. For a Dirac spinor \(\Psi\) write \[\bar{\Psi} \gamma^\mu \Psi\] in terms of Weyl spinors.


1.2.0.2 General Representation Theory \(^\ast\)

Working with the Lie algebra \(\mathfrak{so}(1,3)\) of \(L_+^\uparrow\) reveals the following. Taking this as a Lie algebra over \(\mathbb{C}\) instead of \(\mathbb{R}\) we can define \[\begin{aligned} A_1 = \tfrac{1}{2}(-\ell^{23} + i\ell^{01}) \hspace{.5cm} A_2 = \tfrac{1}{2}(\ell^{13} + i\ell^{02}) \hspace{.5cm} A_3 = \tfrac{1}{2}(-\ell^{12} + i\ell^{03})\\ B_1 = \tfrac{1}{2}(-\ell^{23} - i\ell^{01}) \hspace{.5cm} B_2 = \tfrac{1}{2}(\ell^{13} - i\ell^{02}) \hspace{.5cm} B_3 = \tfrac{1}{2}(-\ell^{12} - i\ell^{03}) \end{aligned}\] these satisfy the algebra \[\label{eq:sl2csl2c} \begin{align}[] &[A_i,B_j] = 0\,\, \forall i, j &\\ [A_i,A_j] =\epsilon_{ijk} A_k &\hspace{1cm}& [B_i,B_j] =\epsilon_{ijk} B_k \end{align}\] which is two copies of the Lie algebra \(\mathfrak{sl}(2,\mathbb{C})\). Hence

Proposition 1.6. The complexification of \(\mathfrak{so}(1,3)\) is equal to \(\mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})\): \(\mathfrak{so}(1,3) \otimes \mathbb{C}= \mathfrak{sl}(2,\mathbb{C}) \oplus \mathfrak{sl}(2,\mathbb{C})\).

: We can write \(\mathfrak{so}(1,3) \otimes \mathbb{C}\) as \(\eqref{eq:sl2csl2c}\). \(\square\)
We have studied representations of \(SL(2,\mathbb{C})\) in Michaelmas term, and found them to be complex \(d+1\) dimensional and labelled by an integer \(d\). Furthermore, we have seen that e.g. the complex conjugate representation \(\bar{\mathbf{2}}\) becomes the same as \(\mathbf{2}\) after a change of basis in exercise 15. This is not true for \(SL(2,\mathbb{C})\): conjugation does not change the eigenvalues of a matrix and \(g\) and \(\bar{g}\) have different eigenvalues for \(g \in SL(2,\mathbb{C})\). 2 We hence get different representations after taking complex conjugation. At the level of the algebras we can repeat the classification of irreucible representations of \(\mathfrak{so}(1,3)\) by taking a detour via \(\mathfrak{so}(1,3) \otimes \mathbb{C}\) (just as we did for \(\mathfrak{su}(2)\otimes \mathbb{C}= \mathfrak{sl}(2,\mathbb{C})\)), and it turns out that (we will not prove this here)

Theorem 1.3. The complex irreducible representations of \(SL(2,\mathbb{C})\) are the tensor products \(r_{s_1} \otimes \bar{r}_{s_2}\) labelled by pairs \((s_1,s_2)\) where \(s_i\) take half-integer values. They act on a complex vector space of dimension \((2s_1+1)(2s_2+1)\).

For the first values of \((s_1,s_2)\) these representations have the following names

  • \((0,0)\) This does not transform at all, so this is a scalar.

  • \((\tfrac12,0)\) This is a Weyl spinor. For the same reasons we discussed representations of \(SU(2)\) vs. \(SO(3)\), this is only a representation of \(Spin(1,3)=SL(2,C)\) but not \(SO(1,3)_+\).

  • \((0,\tfrac12)\) This is another Weyl spinor.

  • \((\tfrac12,\tfrac12)\) This has dimension four and is a vector. It is the representation we have used to define the Lorentz group. Its action is exactly the one written down in proposition Proposition 1.1 when we studied the map from \(SL(2,\mathbb{C})\) to \(L_+^\uparrow\).

  • \((\tfrac12,0) \oplus (0,\tfrac12)\) This reducible representation is a Dirac spinor.


2 Symmetries and Action Principles

In this section we will review some aspects of action principles for field theories and use these to construct field theories with symmetries. Those of you that have take Mathematical Physics II should be familiar with many of the things we are doing here, but I can also recommend for an introduction to actions for both systems with finitely many degrees of freedom and field theories. Some of the more advanced topics treated here are also covered in .

2.1 Actions and Symmetries for a finite number of degrees of freedom

Recall the action principle for systems with finitely many degrees of freedom. Given the action \[S[q_i,\dot{q}_i] = \int dt \,\, L(q_i,\dot{q}_i)\] the paths \(q(t)\) described by this systems are those of stationary action. Let us consider paths taking us from \(q(t_0)\) to \(q(t_1)\). The stationary points are found by varying \[\begin{aligned} q_i(t) &\rightarrow q_i(t) + \delta q_i(t) \\ \dot{q}_i(t) &\rightarrow q_i(t) + \frac{d}{dt} \delta q_i(t) = q_i(t) + \delta \dot{q}_i(t) \end{aligned}\] where \(\delta q_i(t)\) is an arbitrary smooth function such that \(\delta q_i(t_0)= \delta q_i(t_1) = 0\). We then set \[\delta S = S[q_i+ \delta q_i,\dot{q}_i + \delta \dot{q}_i] - S[q_i,\dot{q}_i] = 0\] to find \[\begin{aligned} \delta S &= \int dt \,\, \frac{\partial}{\partial q_i} L(q_i,\dot{q}_i) \delta q_i + \frac{\partial}{\partial\dot{q}_i} L(q_i,\dot{q}_i) \delta \dot{q}_i \\ &= \int dt \,\, \frac{\partial}{\partial q_i} L(q_i,\dot{q}_i) \delta q_i - \frac{d}{dt}\left(\frac{\partial}{\partial\dot{q}_i} L(q_i,\dot{q}_i)\right) \delta q_i \\ & = \int dt \,\, \left(\frac{\partial}{\partial q_i} L(q_i,\dot{q}_i)- \frac{d}{dt}\frac{\partial}{\partial\dot{q}_i} L(q_i,\dot{q}_i)\right) \delta q_i \, . \end{aligned}\, ,\] where we have used partial integration in the second equality. The boundary term has been discarded because \(\delta q_i\) vanishes there.

As \(\delta q_i(t)\) is an arbitrary smooth function we hence see that the paths described by the system must obey the Euler-Lagrange equation \[\frac{\partial}{\partial q_i} L(q_i,\dot{q}_i)- \frac{d}{dt}\frac{\partial}{\partial\dot{q}_i} L(q_i,\dot{q}_i) = 0\]

2.1. For a relativistic point particle moving on path \(C\) through space-time, the only Lorentz invariant property of \(C\) is its length. Taking the action of a relativistic particle to be the length of \(C\) and parametrizing \(C\) as \(x^\mu(s)\) we can write this as \[S[x^\mu,\dot{x}^\mu] = -cm\int_C ds = -cm \int_C \sqrt{-\dot{x}^\mu \dot{x}_\mu} ds \, .\] for a constant \(m\) and \(c\) the speed of light and \(\dot{x}^\mu = \partial/\partial s\,\, x^\mu\). \(C\) is called the world-line of the particle.

  1. Show that this action is invariant under Lorentz transformations.

  2. Find the equations of motions and show that they are solved by straight lines in space-time.

  3. Set \(s=t\) and expand the action for slow particles to recover the action of a non-relativistic point particle.

Proposition 2.1. Adding a term \(d/d t F(q,\dot{q})\) to \(L\) does not change the equations of motion.

Definition 2.1. An invertible transformation of the generalized coordinates \[\begin{aligned} q_i &\rightarrow q_i' = f(q_i) \\ \dot{q}_i &\rightarrow \dot{q}_i' = f(\dot{q}_i) \end{aligned}\] is called a symmetry of \(L\) if \[L' := L(q_i',\dot{q}_i') = L(q_i,\dot{q}_i) + d/d t F(q_i,\dot{q}_i)\]

Definition 2.2. If the symmetries of \(L\) contain a Lie group \(G\), then elements of the Lie algebra \(\mathfrak{g}\) of \(G\) are called infinitesimal transformations.

REMARK:In the following, we will restrict ourselves to linear group actions. This means that the \(q_i\) transform in a representation \(r\) of \(G\): \[\begin{aligned} \boldsymbol{q} \rightarrow \boldsymbol{q}' = r(g)\boldsymbol{q} \\ \dot{\boldsymbol{q}} \rightarrow \dot{\boldsymbol{q}}' = r(g)\dot{\boldsymbol{q}} \end{aligned}\] and that the infinitesimal transformations act as the associated Lie algebra representation \(\rho\) \[\begin{aligned} \boldsymbol{q} \rightarrow \boldsymbol{q}' &= \left( \mathbf{1} + \rho(\gamma) \right) \boldsymbol{q} \\ \dot{\boldsymbol{q}} \rightarrow \dot{\boldsymbol{q}}' &= \left( \mathbf{1} + \rho(\gamma) \right) \dot{\boldsymbol{q}} \end{aligned}\] for every \(\gamma \in \mathfrak{g}\).

Theorem 2.1. (Noether’s Theorem) Let \(G\) be a Lie group of symmetries of \(L\) acting linearly on the generalized coordinates in a representation \(r\). Then \[Q(\gamma) = \frac{\partial L}{\partial\dot{q}_i} \left(\rho(\gamma)\boldsymbol{q}\right)_i - F(q,\dot{q},\gamma)\] is a conserved quantity for each \(\gamma\in \mathfrak{g}\). Here \(\rho\) is the Lie algebra representation associated with the group representation \(r\).

: (see MPII notes).

REMARK:As the Lie algebra \(\mathfrak{g}\) and its representation \(\rho(\gamma)\) are vector spaces we have for \(a,b \in \mathbb{R}\) that \[a \rho(\gamma) + b \rho(\gamma') \in \rho(\mathfrak{g})\] and (as \(F\) must be a linear function of \(\rho(\gamma)\)) \[Q(a\gamma) + Q(b\gamma') = Q(a\gamma + b\gamma') \, .\] It is of course not surprising that the lhs is again conserved as any function of conserved quantities is again conserved.

Example 2.1. Consider a particle in \(n\) dimensions in a spherically symmetric potential. Then \[S = \int dt \,\, \frac{m}{2} |\dot{\boldsymbol{q}}|^2 - V(|\boldsymbol{q}|^2)\] where \(|\boldsymbol{q}|^2 = \sum_i q_i^2\). The Lagrangian is invariant under rotations in \(O(n)\) which act in the defining representation on \(\boldsymbol{q}\). Hence \[Q(\gamma ) = m \dot{\boldsymbol{q}} \gamma \boldsymbol{q}\] is conserved for any element \(\gamma\) of the Lie algebra of \(O(n)\), which equals the Lie algebra of \(SO(n)\) E.g. recalling the form of the matrices in the Lie algebra of \(SO(3)\) we can write \[\gamma = \sum_i \alpha_i \ell_i\] for \(\alpha_i \in \mathbb{R}\) and matrices \(\ell_i\) with components \((\ell_i)_{jk} = \epsilon_{ijk}\). This gives the conserved quantity \[Q = \lambda_i L_i\] for any choice of \(\lambda_i \in \mathbb{R}\) and \(\boldsymbol{L} = \boldsymbol{x} \times \boldsymbol{p}\). Hence each component of the angular momentum \(\boldsymbol{L}\) is conserved. Note that the appearance of the \(\epsilon_{ijk}\) in the vector cross product is now seen to be due to the form of the matrices in the Lie algebra of \(SO(3)\).

2.2 Actions for Field Theories

Let us now consider field theories, i.e. instead of functions \(q(t)\) we consider functions \(\phi(t,\boldsymbol{x})\) that depend on \(\boldsymbol{x}\) as well. Consequently, the equations of motions for \(\phi(t,\boldsymbol{x})\) will have to involve derivatives w.r.t. the components of \(\boldsymbol{x}\) as well.

An action for a field theory with a field \(\phi\) is written in terms of a Lagrangian density \(\mathcal{L}\) as \[S[\phi,\partial_t \phi,\partial_i \phi] = \int d^4 x\,\, \mathcal{L}(\phi,\partial_t \phi,\partial_i \phi))\,,\] where we use \(\partial_i\) as a shorthand for \(\partial/\partial x_i\) and \(\partial_t\) as a shorthand for \(\partial/\partial t\). We now vary \[\begin{aligned} \phi &\rightarrow \phi + \delta \phi \\ \partial_t \phi &\rightarrow \partial_t \phi + \delta \partial_t\phi = \partial_t \phi + \partial_t \delta \phi \\ \partial_i \phi &\rightarrow \partial_i \phi + \delta \partial_i \phi = \partial_i \phi + \partial_i \delta \phi \\ \end{aligned}\] Let us set the limits of the integral to be that of a box \(t = t_a \dots t_b\), \(x_i = a_i \dots b_i\). The variational principle now tells us that \(\delta S = 0\), where \(\delta S\) is \(S[\phi,\partial_t \phi,\partial_i \phi] - S[\phi+ \delta \phi,\partial_t \phi + \delta \partial_t\phi,\partial_i \phi+ \delta \partial_i\phi]\). Expanding \(\delta S\) to linear order in the variation of \(\phi\) gives us \[\begin{aligned} 0 & = \delta S = \int d^4 x \left(\frac{\partial}{\partial\phi} \mathcal{L}\right) \delta \phi + \left(\frac{\partial}{\partial(\partial_t \phi)} \mathcal{L}\right) \delta \partial_t \phi + \left(\frac{\partial}{\partial(\partial_i \phi)} \mathcal{L}\right) \delta \partial_i \phi \\ &= \int d^4 x \left(\frac{\partial}{\partial\phi} \mathcal{L}\right) \delta \phi + \left(\frac{\partial}{\partial(\partial_t \phi)} \mathcal{L}\right) \partial_t \delta \phi + \left(\frac{\partial}{\partial(\partial_i \phi)} \mathcal{L}\right) \partial_i \delta \phi \end{aligned}\] Similar to the treatment of systems with finitely many degrees of freedom, we now integrate the terms that involve derivates of \(\delta \phi\) by parts to get something proportional to \(\delta \phi\). This gives \[0 = \int d^4 x \left[\left(\frac{\partial}{\partial\phi} \mathcal{L}\right) - \partial_t \left(\frac{\partial}{\partial(\partial_t \phi)} \mathcal{L}\right) - \partial_i \left(\frac{\partial}{\partial(\partial_i \phi)} \mathcal{L}\right) \right]\delta \phi + B\] where \(B\) are the boundary terms \[\label{eq:surface_term} \begin{align} B &= \int d^3x \left[\left(\frac{\partial}{\partial(\partial_t \phi)} \mathcal{L}\right) \delta \phi \right]_{t=t_a}^{t=t_b} \\ &+ \int dt dx_1 dx_2 \left[\left(\frac{\partial}{\partial(\partial_3 \phi)} \mathcal{L}\right) \delta \phi \right]_{x_3=a_3}^{x_3=b_3} \\ &+ \int dt dx_1 dx_3 \left[\left(\frac{\partial}{\partial(\partial_2 \phi)} \mathcal{L}\right) \delta \phi \right]_{x_2=a_2}^{x_2=b_2} \\ &+ \int dt dx_2 dx_3 \left[\left(\frac{\partial}{\partial(\partial_1 \phi)} \mathcal{L}\right) \delta \phi \right]_{x_1=a_1}^{x_1=b_1} \end{align}\] We will now assume that the field \(\phi\) vanishes when approaching infinity. We can then send the volume of the box to infinity which also makes the boundary terms vanish. This also immediately implies that any boundary term vanishes. Alternatively, we can keep \(\phi\) at the boundary of the box we have chosen fixed so that \(\delta \phi\) vanishes there3.

As \(\delta \phi\) is arbitrary, we conclude that

Theorem 2.2. The Euler-Lagrange equations for a field theory are \[\left(\frac{\partial}{\partial\phi} \mathcal{L}\right) - \partial_t \left(\frac{\partial}{\partial(\partial_t \phi)} \mathcal{L}\right) - \partial_i \left(\frac{\partial}{\partial(\partial_i \phi)} \mathcal{L}\right) =0\]

REMARK:If the action \(S\) depends on several fields and their derivatives, we get an Euler-Lagrange equation as above for every single field. I.e. indexing the fields by an index \(I\) \[S[\phi_I,\partial_t \phi_I,\partial_i \phi_I] = \int d^4 x\,\, \mathcal{L}(\phi_I,\partial_t \phi_I,\partial_i \phi_I))\,,\] we have \[\left(\frac{\partial}{\partial\phi_I} \mathcal{L}\right) - \partial_t \left(\frac{\partial}{\partial(\partial_t \phi_I)} \mathcal{L}\right) - \partial_i \left(\frac{\partial}{\partial(\partial_i \phi_I)} \mathcal{L}\right) =0\] for every \(I\).

Example 2.2. Let us consider the theory of a real scalar field \(\phi\) with action \[S = \int dt d^3x \,\,\, -(\partial_t \phi)^2 + (\partial_i \phi)^2 + m^2 \phi^2 \, .\] Then the equation of motion for \(\phi\) is \[\left( \partial_t^2 - \boldsymbol{\nabla}^2 + m^2 \right) \phi = 0\]

Example 2.3. We can also use complex fields to write actions. Let us consider the theory of a complex scalar field \(\phi\) with action \[S = \int dt d^3x \,\,\, -|\partial_t \phi|^2 + |\partial_i \phi|^2 + m^2 |\phi|^2 \, .\] Then the equation of motion for \(\phi\) is again \[\begin{equation} \label{eomphilinear} \left( \partial_t^2 - \boldsymbol{\nabla}^2 + m^2 \right) \phi = 0 \, . \end{equation}\] This can be seen in two different ways:

  1. We can just write \(S\) in terms of real and imaginary parts (which are two real fields) and derive their equations of motion, which can be combined to give the above.

  2. Treating the real and imaginary parts of \(\phi\) as different fields is equivalent to (after a complex redefinition of fields) treating \(\phi\) and \(\bar{\phi}\) as independent fields. The equation of motion of \(\bar{\phi}\) is \(\eqref{eomphilinear}\).

Example 2.4. Consider a complex scalar field \(\psi\) with action \[S = \int dt d^3x \,\,\, -|\boldsymbol{\nabla} \psi|^2 + \tfrac12 i \left(\bar{\psi} \partial_t \psi - \psi \partial_t \bar{\psi} \right)\] While this is obviously real for the first term, taking the complex conjugate of the second term shows that the whole action is real. The equations of motion for \(\bar{\psi}\) are \[\begin{aligned} 0 = & - \partial_i \left(\frac{\partial}{\partial(\partial_i \bar{\psi})} \mathcal{L}\right) - \partial_t \left(\frac{\partial}{\partial(\partial_t \bar{\psi})} \mathcal{L}\right) + \left(\frac{\partial}{\partial\bar{\psi}} \mathcal{L}\right)= 0 \\ = & \boldsymbol{\nabla} \cdot \boldsymbol{\nabla} \psi + \tfrac12 i \partial_t \psi + \tfrac12 i \partial_t \psi \\ = & \Delta \psi + i \partial_t \psi \end{aligned}\] This is nothing but the Schroedinger equation for a free particle with \(m=1/2\) and \(\hbar=1\), but now \(\psi\) is just a classical field. The Euler Lagrange equations for \(\psi\) give the complex conjugate of the above equations.

We can repeat the same steps above to deal with actions written in a Lorentz covariant notation (\(\partial_\mu = \partial/\partial x^\mu\)) \[S[\phi,\partial_\mu \phi] = \int d^4 x\,\, \mathcal{L}(\phi,\partial_\mu \phi))\,,\] to arrive at

Theorem 2.3. The Euler-Lagrange equations specifying stationary points of the action \(S[\phi,\partial_\mu \phi] = \int d^4 x \,\, \mathcal{L}(\phi,\partial_\mu \phi)\) of a field theory are \[\frac{\partial}{\partial\phi} \mathcal{L}(\phi,\partial_\mu \phi) - \partial_\mu \left(\frac{\partial}{\partial(\partial_\mu \phi)} \mathcal{L}(\phi,\partial_\mu \phi)\right) = 0\]

2.2. Consider the following action of a real scalar field \[S = \int d^4x \,\partial_\mu \phi \partial^\mu \phi + m^2 \phi^2 \, .\] Show that the equations of motion are \[(-\partial_\mu \partial^\mu + m^2) \phi = 0 \, .\]

2.3 Noether’s theorem

Let us now consider symmetries of field theories. Our discussion will mirror what we did in the section above, except that we will restrict ourselfs to linear maps acting on the fields, and we will assume for simplicity that \(\mathcal{L}\) is invariant, i.e. the group action does not lead to a boundary term.

Definition 2.3. For a Lie group \(G\) and a representation \(r:G\rightarrow GL(V)\), a linear map \[\label{eq:globalgroupactionfields} \begin{align} \phi_I &\rightarrow \phi_I' &=& \left[ r(g) \boldsymbol{\phi} \right]_I\\ \partial_\mu \phi_I &\rightarrow \partial_\mu\phi_I' &=& \left[ r(g) \partial_\mu\boldsymbol{\phi} \right]_I \end{align}\] is called a symmetry of \(\mathcal{L}\) if \[\mathcal{L}(\phi_I,\partial_\mu \phi_I)) = \mathcal{L}(\phi_I',\partial_\mu \phi_I'))\]

REMARK:We could allow a total derivative here as well, but will content ourselves with this stricter definition here.

REMARK: The infinitesimal version of the map \(\eqref{eq:globalgroupactionfields}\) is (here \(g = e^\gamma\)) \[\phi_I \rightarrow \phi_I + \delta_\gamma \phi_I = \left[ \left(\mathds{1}+ \rho(\gamma) \right) \boldsymbol{\phi} \right]_I\] i.e. \[\begin{equation} \label{eq:delphi} \delta_\gamma \phi_I = \left[ \rho(\gamma) \boldsymbol{\phi} \right]_I\, , \end{equation}\] where \(\rho\) is the Lie algebra representation associated with the group representation \(r\).

Theorem 2.4. Let \(G\) be a Lie group of symmetries of \(\mathcal{L}\) acting in a representation \(r\) on the fields \(\phi_I\). Then \[j^\mu := \left[ \rho(\gamma) \boldsymbol{\phi} \right]_I \frac{\partial\mathcal{L}}{\partial\left(\partial_\mu \phi_I \right)}\] (summation convention: there is a sum over \(I\) in the above) is a conserved current: \[\partial_\mu j^\mu = 0 \, .\]

: For the associated infinitesimal transformation we have (to linear order in \(\delta \phi_I\) and using summation convention) \[\begin{aligned} 0 &= \delta \mathcal{L} = \frac{\partial\mathcal{L}}{\partial\phi_I} \delta_\gamma \phi_I + \frac{\partial\mathcal{L} }{\partial\partial_\mu \phi_I} \partial_\mu \delta_\gamma \phi_I = \partial_\mu \left(\frac{\partial\mathcal{L} }{\partial\partial_\mu \phi_I}\right) \delta_\gamma \phi_I + \frac{\partial\mathcal{L} }{\partial\partial_\mu \phi_I} \partial_\mu \delta_\gamma \phi_I \\ & = \partial_\mu \left(\frac{\partial\mathcal{L} }{\partial\partial_\mu \phi_I}\delta_\gamma \phi_I \right) \end{aligned}\] where we have used the Euler-Lagrange equations of motion. Using \(\eqref{eq:delphi}\) then shows the statement.

REMARK: As a consequence, we can write \[\frac{\partial}{\partial t} \int_V d^3x j^0 + \int_{\partial V}d \boldsymbol{A}^i j^i = 0\, .\] for any volume \(V\). If the charge inside the volume (the first term) changes, it must be due to a current leaving the volume. Letting \(V\) increase arbitrarily and recalling our assumption that fields at infinity vanish the right hand term is zero and we can write \[\frac{\partial}{\partial t} \int_{\mathbb{R}^3}d^3x j^0 = \frac{\partial}{\partial t} Q = 0 \, ,\] i.e. the total charge is unchanged.

2.4 Lorentz symmetry and field theories

The symmetries we have considered above are not symmetries of space-time, but ’internal symmetries’ acting on the fields. In relativity, we demand invariance of physics under maps \[\begin{aligned} x^\mu &\rightarrow x'^\mu = \Lambda^\mu_{\,\,\nu} x^\nu \\ \boldsymbol{x}& \rightarrow \boldsymbol{x}' = \Lambda \boldsymbol{x} \end{aligned}\] We can take the following perspective on Lorentz transformations: we map our coordinates of space-time \(\boldsymbol{x}\) to \(\Lambda \boldsymbol{x}\). If a given solution has an isolated zero at some \(\boldsymbol{x}_0\), \(\phi(\boldsymbol{x}_0) = 0\), this will map \(\phi(\boldsymbol{x})\) to a new solution \(\phi'(\boldsymbol{x})\) that has a zero at \(\Lambda \boldsymbol{x}_0\), i.e. the action of a group element \(\Lambda\) of the Lorentz group on our scalar field \(\phi\) is \[\Lambda: \phi \rightarrow \phi'(\boldsymbol{x}) = \phi(\Lambda^{-1}\boldsymbol{x}) \, .\] Note that this plays nicely with the group composition \[(\Lambda_1 \circ \Lambda_2 )\phi \rightarrow \Lambda_1 \phi(\Lambda_2^{-1}\boldsymbol{x}) \rightarrow \phi(\Lambda_2^{-1} \Lambda_1^{-1} \boldsymbol{x}) = \phi((\Lambda_1 \Lambda_2)^{-1} \boldsymbol{x}) \, .\]

Definition 2.4. A field \(\phi\) is a Lorentz scalar if its behavior under Lorentz transformations is \[\phi(\boldsymbol{x}) \rightarrow \phi(\Lambda^{-1} \boldsymbol{x}) \, .\]

In a similar fashion

Definition 2.5. A field (actually, four fields) \(A^\mu\) is a Lorentz vector if its behavior under Lorentz transformations is \[A^\mu(\boldsymbol{x}) \rightarrow \Lambda^\mu_{\,\, \nu}A^\nu(\Lambda^{-1}\boldsymbol{x}) \, .\]

Definition 2.6. A field (actually, four fields) \(A_\mu\) is a Lorentz covector if its behavior under Lorentz transformations is \[A_\mu(\boldsymbol{x}) \rightarrow \left(\Lambda^{-1}\right)^{\nu}_{\,\,\, \mu}A_\nu(\Lambda^{-1}\boldsymbol{x}) \, .\]

Definition 2.7. A field (actually, four fields) \(\Psi\) is a Dirac spinor if its behavior under Lorentz transformations is \[\Psi(\boldsymbol{x}) \rightarrow \Lambda_{1/2} \Psi (\Lambda^{-1}\boldsymbol{x}) \, .\]

For a field theory, invariance under Lorentz transformation means is that if a scalar field \(\phi(\boldsymbol{x})\) is a solution to our equations of motion, then so must be \(\phi(\Lambda^{-1} \boldsymbol{x})\). What this means is that we do not want to transform the derivatives in our equations of motion, but only the arguments of the fields.

Definition 2.8. A field theory is called Lorentz invariant if for every solution \(\phi(\boldsymbol{x})\) to the equations of motion, there is another solution \(\phi(\Lambda \boldsymbol{x})\) for all \(\Lambda \in L_+^\uparrow\).

REMARK:You might find it suprising that we only ask for \(L_+^\uparrow\) here. The reason is that parity and time reversal are not symmetries of fundamental physics, but we still want to call such theories Lorentz invariant as they are invariant under rotations and boosts.

Proposition 2.2. Transforming only the argument of the field, but not the derivative, \(\partial_\nu \phi(\boldsymbol{x})\) is a Lorentz covector field. : Let \(\boldsymbol{y} = \Lambda^{-1} \boldsymbol{x}\). We have \[\partial_\mu \phi(\Lambda^{-1} \boldsymbol{x} ) = \frac{\partial}{\partial x^\mu} \phi(\Lambda^{-1}\boldsymbol{x})= \left(\Lambda^{-1}\right)^{\nu}{}_{\mu} \frac{\partial}{\partial y^\nu} \phi(\boldsymbol{y})\] This also implies that \(\partial^\nu \phi(\boldsymbol{x}) = \eta^{\nu \mu}\partial_\mu \phi(\boldsymbol{x})\) transforms as Lorentz vector (besides replacing \(x\) by \(y\)).

As we find our field equations from a Lagrangian \(\mathcal{L}\), \(\phi(\boldsymbol{x})\) being a solution implies that \(\phi(\Lambda^{-1}\boldsymbol{x})\) is also a solution to the equations of motion if \(\mathcal{L}\) behaves as a Lorentz scalar. Hence

Definition 2.9. An action \(S = \int d^4x \mathcal{L}\) is called Lorentz invariant if the associated Lagrangian \(\mathcal{L}\) is a Lorentz scalar for \(\Lambda \in L^\uparrow_+\): \[\Lambda:\mathcal{L}(\phi(\boldsymbol{x}),\partial_\mu \phi(\boldsymbol{x})) \rightarrow \mathcal{L}(\phi'(\boldsymbol{x}),\partial_\mu \phi'(\boldsymbol{x})) = \mathcal{L}(\phi(\Lambda^{-1}\boldsymbol{x}),\partial_\mu \phi(\Lambda^{-1}\boldsymbol{x}))\]

REMARK:

As the Lagrangian is a scalar, all that is happening is that \(\boldsymbol{x}\) is replaced by \(\boldsymbol{y}\). Furthermore, \(d^4x\) transforms with the Jacobian, which is just \(\det \Lambda = 1\) for \(\Lambda \in L^\uparrow_+\). Hence \(S\) is invariant if we integrate over all spacetime. This implies that for an extremum \(\phi(\boldsymbol{x})\) of \(S\), \(\phi(\boldsymbol{y})\) is also an extremum. As extrema are found as solutions to the equations of motion, it must hence be that for any solution \(\phi(\boldsymbol{x})\), \(\phi(\boldsymbol{y}) = \phi(\Lambda^{-1}\boldsymbol{x})\) must also be a solution.

REMARK:Above we have been carefully keeping track of the change of coordinates from \(\boldsymbol{x}\) to \(\boldsymbol{y}\). As you can see, the template is to replace \(\boldsymbol{x}\) to \(\boldsymbol{y}\) and to simulataneously transform all indices with appropriate matrices \(\Lambda\). It is common to supress the change from \(\boldsymbol{x}\) to \(\boldsymbol{y}\) and simply summarize the transformation of scalars, their derivative, vectors, and spinors as \[\begin{aligned} \phi &\rightarrow \phi \\ \partial^\mu \phi &\rightarrow \Lambda^\mu_{\,\, \nu} \partial^\nu \phi \\ A^\mu &\rightarrow \Lambda^\mu_{\,\, \nu} A^\nu \\ \Psi &\rightarrow \Lambda_{1/2} \Psi \end{aligned}\]

2.3. Consider the action \[S = \int d^4x \bar{\Psi} \left( \gamma^\mu \partial_\mu + m \right) \Psi \, .\]

  1. Show that it is Lorentz invariant.

  2. Find the equations of motion.

  3. Find the conserved charge associated to the \(U(1)\) symmetry \(\Psi \rightarrow e^{i\theta} \Psi\).

  4. Show that \[\left( \gamma^\mu \partial_\mu - m \right)\left( \gamma^\nu \partial_\nu + m \right) = \partial_\mu \partial^\mu - m^2\]

2.4. Consider a field \(\Phi\) transforming in the adjoint representation of the Lie group \(SU(n)\). Show that \[\nonumber S = \int d^4x \,\, tr\left( \partial_\mu \Phi \partial^\mu \Phi \right)\] is invariant under the action of \(SU(n)\) and find the associated conserved current.

3 Abelian gauge theories

In the rest of this term we will learn how to formulate gauge theories, a special subset of field theories which describe most forces in modern physics. For example, the Standard Model of elementary particles is a gauge theory based on the group \(G=SU(3)\times SU(2)\times U(1)\), and accounts for the strong, weak and electromagnetic interaction. In this chapter we will start by looking at abelian gauge theories, the formulation of which is based on an abelian Lie group, called the gauge group. The abelian restriction will allow us to acquaint us with the key concepts in gauge theory without complicating the underlying mathematics too much.

3.1 Electromagnetism as a \(U(1)\) gauge theory

We will soon delve into the abstract idea that underlies abelian gauge theories, starting from a field theory with a \(U(1)\) global symmetry and promoting the constant \(U(1)\) parameter to a local function of spacetime. But before we do that, let us take a fresh look at Maxwell’s theory of electromagnetism, and describe it as a relativistic field theory that can be based on a gauge symmetry principle. Excellent references for some foundational material are , and . The coupling of electromagnetism to field theories, which we will study later, is a standard topic in nearly all books on quantum field theory, see e.g. .

3.1.1 Maxwell’s equations and relativity

The Maxwell equations describing which electric (\(\boldsymbol{E}\)) and magnetic fields (\(\boldsymbol{B}\)) are induced by the electric charge density \(\rho\) and current \(\boldsymbol{j}\) are (in natural units) \[\label{Maxwell_eqn} \begin{align} \boldsymbol{\nabla} \cdot \boldsymbol{E} & = \rho ~,&\hspace{1cm}& \boldsymbol{\nabla} \times \boldsymbol{B} -\frac{\partial\boldsymbol{E}}{\partial t} = \,\boldsymbol{j}~,\\ \boldsymbol{\nabla} \cdot \boldsymbol{B} & = 0 ~, && \boldsymbol{\nabla} \times \boldsymbol{E} + \frac{\partial\boldsymbol{B}}{\partial t}= 0~. \end{align}\] We call the equations in the first line the inhomogeneous Maxwell equations, since they have sources for the electric and magnetic fields in the right-hand side, and the equations in the second line the homogeneous Maxwell equations, since they don’t.

The behaviour of Maxwell equations under Lorentz transformations can be worked out as follows. Starting from an inertial frame with a charge distribution \(\rho\) at rest, we can perform a boost \[\Lambda = \begin{pmatrix} \cosh \lambda & \sinh \lambda & 0 & 0 \\ \sinh \lambda &\cosh \lambda & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 &0 & 0 & 1 \end{pmatrix}\] to another inertial frame moving at a relative speed \(\tanh \lambda\), in which there is now also a non-zero current \(\boldsymbol{j}\). As resting charges only source electric fields and steady currents source magnetic fields, this implies that Lorentz transformations will also mix up electric and magnetic fields.

In order to understand how to write the Maxwell equations in a manifestly Lorentz invariant way, and how the electric and magnetic field transform under Lorentz transformations, let us first focus on the sources appearing in the right-hand side of the inhomogeneous Maxwell equations. The charge density \(\rho\) and the current \(\boldsymbol{j}\) can be repackaged into a Lorentz \(4\)-vector \(J^\mu\), such that \(J^0 = \rho\) and \(J^i = j^i\). The continuity equation (or local conservation law) \[\begin{equation} \label{cont_eqn_noncovariant} \frac{\partial\rho}{\partial t}+ \boldsymbol{\nabla}\cdot \boldsymbol{j}=0 \end{equation}\] can then be written as \[\begin{equation} \label{cont_eqn_covariant} \partial_\mu J^\mu=0~. \end{equation}\] Since \(J^\mu\) is a Lorentz vector, a Lorentz transformation acts as \[\begin{equation} \label{Jmu} J^\mu (x)\mapsto J'^\mu (x)=\Lambda^\mu{}_\nu J^\nu(\Lambda^{-1}x) ~, \end{equation}\] which indeed leaves the continuity equation invariant. 4

REMARK:
In the following I may use the shorthand notation \(J^\mu \mapsto \Lambda^\mu{}_\nu J^\nu\) for the transformation law \(\eqref{Jmu}\), with the understanding that if the object in question is a field then the argument must transform appropriately.

The transformation property of \(J^\mu\) and the assumption of Lorentz symmetry (or ‘Lorentz invariance’) requires that the inhomogeneous Maxwell equations in the first line of \(\eqref{Maxwell_eqn}\) be the temporal and spatial components of a Lorentz 4-vector equation respectively. The similarity between the two rows of \(\eqref{Maxwell_eqn}\) suggests that the same should be true of the homogeneous Maxwell equations in the second line of \(\eqref{Maxwell_eqn}\).

Let’s now focus on the left-hand side of the inhomogeneous Maxwell equations, which is equal to the current 4-vector \(J^\mu\). Spacetime derivatives appear linearly, so we need a \(\partial_\nu\) on the left-hand side, with the \(\nu\) index suitably contracted with a tensor linear in the electric and magnetic field, in such a way that a \(\mu\) index stays free (that is, uncontracted). The simplest option is that the left-hand side is \(\partial^\mu X\) for a scalar field \(X\), but an equation of the form \(\partial^\mu X = J^\mu\) is immediately ruled out by counting degrees of freedom: it cannot account for the electric and magnetic fields \(\boldsymbol{E}\) and \(\boldsymbol{B}\) and hence reproduce the left-hand side of the inhomogeneous Maxwell equation. In order to match the upper index of \(J^\mu\) on the right-hand side, the derivative \(\partial_\nu\) must therefore act on a second rank Lorentz tensor \(F^{\mu\nu}\), 5 which is linear in the electric and magnetic field, with the \(\nu\) index contracted so that only the \(\mu\) index remains free.

The electric and magnetic field \(\boldsymbol{E}\) and \(\boldsymbol{B}\) have \(3+3=6\) components in total, whereas a second rank tensor 6 has \(4\cdot 4=16\) components, so there still appears to be a mismatch of degrees of freedom. This is fixed by requiring that \(F^{\mu\nu}\) be antisymmetric, that is \(F^{\mu\nu}=-F^{\nu\mu}\) : then it has \(\frac{4\cdot 3}{2}=6\) components.

To summarize, we are led to write the inhomogeneous Maxwell equations as \[\begin{equation} \label{Maxwell_inhom} \partial_\nu F^{\mu\nu} = J^\mu \end{equation}\] for a second rank antisymmetric tensor \(F^{\mu\nu}=-F^{\nu\mu}\) which is linear in \(\boldsymbol{E}\) and \(\boldsymbol{B}\). Comparing with the first line of \(\eqref{Maxwell_eqn}\) determines \[= \begin{pmatrix} 0 & E_1 & E_2 & E_3 \\ -E_1 & 0 & B_3 & -B_2 \\ -E_2 & -B_3 & 0 & B_1 \\ -E_3 & B_2 & -B_1 & 0 \end{pmatrix} ~.\]

Lowering indices to \(F_{\mu\nu}=\eta_{\mu\rho}\eta_{\nu\sigma}F^{\rho\sigma}\), we have \[\left[F_{\mu \nu}\right] = \begin{pmatrix} 0 & -E_1 & -E_2 & -E_3 \\ E_1 & 0 & B_3 & -B_2 \\ E_2 & -B_3 & 0 & B_1 \\ E_3 & B_2 & -B_1 & 0 \end{pmatrix} ~.\] In other words for \(i=1,2,3\) we have \[\begin{equation} \label{F_to_EB} F_{i0} = -F_{0i} = E_i ~,\hspace{1cm} F_{ij} = \epsilon_{ijk} B_k ~. \end{equation}\] \(F_{\mu\nu}\) used to be called the Faraday tensor, and is now most commonly called the field strength tensor, because its components encode the strength of the electric and magnetic fields.

By a similar logic, it is not hard to see that the homogeneous Maxwell equations in the second line of \(\eqref{Maxwell_eqn}\) can also be written covariantly – that is, in Lorentz tensor notation – as \[\begin{equation} \label{Maxwell_hom} \epsilon^{\mu\nu\rho\sigma}\partial_\nu F_{\rho\sigma} = 0~, \end{equation}\] where \(\epsilon^{\mu \nu \rho \sigma}\) is the completely antisymmetric tensor with four indices, normalized such that \(\epsilon^{0123}=1\).

REMARKS:

  1. In practice this means that one gets a relative minus sign when swapping any two indices. E.g. \(\epsilon^{3201}=-1\) as one needs to swap indices an odd number of times to arrive there from \(\epsilon^{0123}\). One way to see that is \[\epsilon^{3201}=-\epsilon^{3021}=\epsilon^{1023}=-\epsilon^{0123}~.\]

  2. A fancier mathematical way of saying the same thing is: for any permutation \(\sigma\) of \(0,1,2,3\) we set \(\epsilon^{\sigma(0),\sigma(1),\sigma(2),\sigma(3)} = \mbox{sign}(\sigma)\), where \(\mbox{sign}(\sigma)\) is the signature of \(\sigma\). The signature of a permutation \(\sigma\) is defined to be \(+1\) (respectively \(-1\)) if the permutation is even (resp. odd), which means that \((\sigma(0),\sigma(1),\sigma(2),\sigma(3))\) is obtained from \((0,1,2,3)\) by an even (resp. odd) number of transpositions (or swaps).

  3. Note that in a situation with four indices the ‘cyclical’ vs. ‘anti-cyclical’ method useful for \(\epsilon_{ijk}\) does not work anymore.

  4. If we lower all four indices using the Minkowski metric, one of them is temporal and three of them are spatial, so we pick up a minus sign: \[\epsilon_{0123} = \eta_{00}\eta_{11}\eta_{22}\eta_{33} \epsilon^{0123}=-1~.\]

3.1. Show that using the field strength \(F_{\mu\nu}\) and the 4-current \(J^\mu\) we can write the Maxwell equations as \[\begin{equation} \label{eq:maxwell_cov} \partial_\nu F^{\mu \nu} = J^\mu~, \hspace{2cm} \epsilon^{\mu \nu \rho \sigma} \partial_\nu F_{\rho \sigma} = 0 ~ . \end{equation}\]

The inhomogeneous Maxwell equations imply the local conservation equation for the electromagnetic current \(J^\mu\): \[\partial_\mu J^\mu = 0 \, .\] Using the inhomogeneous Maxwell equations we find \[\partial_\mu J^\mu = \partial_\mu \partial_\nu F^{\mu \nu} = 0\] The first equality are just Maxwell’s equations and the second equality follows from the antisymmetry of the field strength \(F^{\mu \nu} = - F^{\nu \mu}\), along with the commutativity of partial derivatives \(\partial_\nu \partial_\mu=\partial_\mu \partial_\nu\). 7 We have \[\partial_\mu \partial_\nu F^{\mu \nu} = - \partial_\mu \partial_\nu F^{\nu \mu}= - \partial_\nu \partial_\mu F^{\nu \mu} = - \partial_\mu \partial_\nu F^{\mu \nu}\] where we have relabelled \((\nu,\mu)\) as \((\mu,\nu)\) in the last step. As we see, this expression is equal to minus itself, so it must be zero.

3.1.2 Maxwell’s equations: variational principle

How can we write down a Lorentz invariant Lagrangian density that will give us \(\eqref{eq:maxwell_cov}\) as its Euler-Lagrange equations (or equations of motion, or EoM)? You can try playing around but you will soon realise that using \(F^{\mu \nu}\) as the dynamical field(s) will not allow you to recover Maxwell’s equations.

Let us hence try something else. The second equation of \(\eqref{eq:maxwell_cov}\) implies that we can write \[\begin{equation} \label{field_strength_EM} F_{\mu \nu} = \partial_\mu A_\nu - \partial_\nu A_\mu \end{equation}\] in any star-shaped open subset in \(\mathbb{R}^4\). 8 We say that \(\eqref{field_strength_EM}\) holds locally. Conversely, \(\eqref{field_strength_EM}\) implies \[\epsilon^{\mu \nu \rho \sigma} \partial_\nu F_{\rho\sigma} = \epsilon^{\mu \nu \rho \sigma} \partial_\nu (\partial_\rho A_\sigma - \partial_\sigma A_\rho) = \epsilon^{\mu \nu \rho \sigma} \partial_\nu \partial_\rho A_\sigma - \epsilon^{\mu \nu \rho \sigma} \partial_\nu \partial_\sigma A_\rho = 0-0=0\] by using that each of the two terms is symmetric with respect to swapping the order of the derivatives but is contracted with an epsilon tensor, which is antisymmetric in all indices. The second equation of \(\eqref{eq:maxwell_cov}\) is hence automatic (it is called the Bianchi identity) and we need only worry about the first one.

In the theory of electromagnetism, \(A^\mu\) is called the electromagnetic 4-vector potential: its time component \(A^0=\phi\) is the electric ‘scalar potential’, and its space components \(A^i=A_i\) are the components of the magnetic ‘vector potential’ \(\boldsymbol{A}\). (In this pre-relativistic terminology, ‘scalar’ and ‘vector’ refer to spatial rotations, not to Lorentz transformations). Using \(\eqref{field_strength_EM}\) and \(\eqref{F_to_EB}\), we recover the relations between electromagnetic fields and electromagnetic potentials from the theory of electromagnetism: \[\boldsymbol{E} = - \nabla \phi - \frac{\partial\boldsymbol{A}}{\partial t}~, \qquad \qquad \boldsymbol{B} = \nabla \times \boldsymbol{A}~.\]

3.2. Show the relationship above between electric and magnetic fields and the potentials.

We now declare that \(A_\mu\) is the dynamical field, which also enables us to include \(J^\mu\) as a source in the action.

Proposition 3.1. Maxwell’s equations follow from the action 9 \[\begin{equation} \label{Maxwell_action_source} S[A_\mu] = \int d^4 x ~ \left(-\frac{1}{4} F^{\mu \nu} F_{\mu \nu} + A_\mu J^\mu \right)~. \end{equation}\]

We work out the Euler-Lagrange equations \[\frac{\partial\mathcal{L}}{\partial A_\mu} - \partial_\nu \frac{\partial\mathcal{L}}{\partial(\partial_\nu A_\mu)}=0\] for the Lagrangian density \[\mathcal{L}= -\frac{1}{4} F^{\mu \nu} F_{\mu \nu} + A_\mu J^\mu~.\] For the first term we have \[\frac{\partial\mathcal{L}}{\partial A_\mu} = \frac{\partial}{\partial A_\mu} (A_\nu J^\nu) = \frac{\partial A_\nu}{\partial A_\mu} J^\nu = \delta^\mu_\nu J^\nu = J^\mu~.\] Remember: repeated indices are summed over and are dummy. You should never use the same letter for different indices, or you will get wrong results: this is the reason why I relabelled the dummy index as \(\nu\) here. For the second term we have \[\label{deriv_Maxwell} \begin{align} \frac{\partial\mathcal{L}}{\partial(\partial_\nu A_\mu)} & = \frac{\partial F_{\alpha\beta}}{\partial(\partial_\nu A_\mu)} \frac{\partial}{\partial F_{\alpha\beta}}\left(-\frac{1}{4} F^{\rho\sigma}F_{\rho\sigma} \right) \\ & = -\frac{1}{4} \cdot 2 F^{\alpha\beta} \frac{\partial}{\partial(\partial_\nu A_\mu)} (\partial_\alpha A_\beta - \partial_\beta A_\alpha)\\ &= -\frac{1}{2} F^{\alpha\beta} (\delta^\nu_\alpha \delta^\mu_\beta - \delta^\nu_\beta \delta^\mu_\alpha) \\ &= -\frac{1}{2} (F^{\nu\mu}-F^{\mu\nu}) = F^{\mu\nu}~. \end{align}\] In deriving \(\eqref{deriv_Maxwell}\) we used the chain rule in the first line. In the second line we used the definition \(\eqref{field_strength_EM}\) of the field strength \(F_{\alpha\beta}\) in terms of derivatives of \(A_\mu\), and the identity

3.3. Show \[\frac{\partial}{\partial X_{a_1 \dots a_n}}(X^{b_1 \dots b_n} X_{b_1 \dots b_n}) = 2 X^{a_1 a_2\dots a_n}~,\] for any tensor \(X\).

In the third line of \(\eqref{deriv_Maxwell}\) we just calculated derivatives, and in the final equality in the fourth line we used the antisymmetry of the field strength.

The Euler-Lagrange equations then give \[J^\mu - \partial_\nu F^{\mu\nu} = 0~,\] which reproduce the inhomogeneous Maxwell’s equations.

REMARK:
It is also possible to derive the action \(\eqref{Maxwell_action_source}\) (without the source term) by using the Lorentz force to show that the energy stored in the electromagnetic fields (which equals the Hamiltonian) is \(\frac{1}{2}\int d^3 x~(\boldsymbol{E}^2 +\boldsymbol{B}^2)\), and then finding the associated Lagrangian.

3.1.3 Gauge Symmetry

The technical trick we have used has an interesting consequence: the physical fields that we can measure are the electric and magnetic field \(\boldsymbol{E}\) and \(\boldsymbol{B}\), i.e. the components of the field strength tensor \(F_{\mu \nu}\), not the dynamical field \(A_\mu\) that we use to define the action and obtain equations of motion. In fact, \(A_\mu\) is not uniquely defined: we are free to shift \(A_\mu(x)\) by a derivative of an arbitrary smooth function \(\alpha(x)\) \[\begin{equation} \label{gauge_transfo_EM} A_\mu (x) \mapsto A_\mu (x) + \partial_\mu \alpha (x) \end{equation}\] without altering the physical fields which appear in the Maxwell equations and which can be measured: \[F_{\mu \nu} = \partial_\mu A_\nu - \partial_\nu A_\mu \mapsto \partial_\mu A_\nu - \partial_\nu A_\mu + \partial_\mu \partial_\nu \alpha - \partial_\nu \partial_\mu \alpha = F_{\mu \nu}\, .\]

A symmetry for which the parameters of the transformation depend on space-time is called a gauge symmetry. 10 Equation \(\eqref{gauge_transfo_EM}\) is called the gauge transformation of \(A_\mu\). The field \(A_\mu\) is then called the gauge field (or the gauge connection). Gauge field configurations which differ by a gauge transformations are considered physically equivalent, since they give rise to the same physically observable electric and magnetic fields.

You should contrast gauge symmetries with the symmetries you studied so far: their parameters did not depend on space-time in any way. They are called global symmetries, and they relate physically inequivalent (though isomorphic) configurations.

Performing a gauge transformation \(\eqref{gauge_transfo_EM}\) has the following effect on the action \(\eqref{Maxwell_action_source}\): \[\begin{split} S[A_\mu] \mapsto S[A_\mu+\partial_\mu \alpha] &=\int d^4 x~ \left(-\frac{1}{4} F^{\mu \nu} F_{\mu \nu} + A_\mu J^\mu + (\partial_\mu \alpha) J^\mu\right) ~ \\ &= S[A_\mu] + \int d^4 x~(\partial_\mu \alpha) J^\mu \end{split}\] At first sight the action does not seem to be invariant under a gauge transformation, since \[\delta_\alpha S[A_\mu]\equiv S[A_\mu+\partial_\mu \alpha]-S[A_\mu] = \int d^4 x~(\partial_\mu \alpha) J^\mu\] does not seem to vanish. But this is too fast: we can perform a partial integration of the extra term and discard the boundary term 11 to write the gauge variation of the action as \[\delta_\alpha S[A_\mu] = - \int d^4 x~ \alpha\,( \partial_\mu J^\mu )=0~,\] which vanishes thanks to the conservation of the current \(J^\mu\) that couples to the electromagnetic gauge field \(A_\mu\).

REMARKS:

  1. We can write \[A_\mu \rightarrow A_\mu + \partial_\mu \alpha = e^{i \alpha} \left(A_\mu + i \partial_\mu \right)e^{-i \alpha}\, ,\] so we can think about our gauge transformations as being related to the group \(G=U(1)\), but now its parameter \(\alpha\) depends on where we are in space-time. \(G=U(1)\) is called the gauge group. The field \(A_\mu\) transforms in the adjoint representation, except for the derivative term. This rewriting may look silly since the adjoint representation of \(G=U(1)\) is trivial, but we will see later that this form generalizes to other gauge groups in a natural way. We will also understand the rôle and meaning of the extra derivative term.

  2. You have encountered field theories with \(U(1)\) global symmetries and conserved currents before. Can we use the currents found there to couple them to electromagnetism? If so, can we identify the \(U(1)\) global symmetry of these field theories with the \(U(1)\) gauge symmetry found above?

The answer to the previous question is yes, and we will learn how to do this systematically next. But first, let us briefly remind ourselves of the concept of \(U(1)\) global symmetry and set notation for what follows.

3.2 \(U(1)\) global symmetry

Consider (for simplicity) a complex scalar field \(\phi(x)\). 12

The action 13

\[\label{action_scalar_global_U(1)} \begin{split} S_0[\phi,\bar\phi] &= \int d^4x~ \mathcal{L}_0(\phi, \bar \phi, \partial_\mu \phi, \partial_\mu \bar\phi)~,\\ \mathcal{L}_0 &= -|\partial_\mu \phi|^2 - V(\phi, \bar\phi) = - |\partial_\mu \phi|^2 - U(|\phi|^2) \\ &= |\dot \phi|^2 - |\nabla \phi|^2-U(|\phi|^2) \end{split}\] is invariant under global \(G=U(1)\) transformations \[g: ~~ \phi(x) \mapsto e^{i\alpha} \phi(x)\] where \(\alpha\sim \alpha+2\pi\) is a constant parameter, and \(g=e^{i\alpha}\in U(1)\) is a constant group element. The requirement of \(U(1)\) invariance restricts the scalar potential \(V(\phi,\bar\phi)\) to only depend on the invariant \(|\phi|^2\). Because the scalar field \(\phi\) is multiplied by a single power of the \(U(1)\) group element \(g=e^{i\alpha}\), we say that it has charge \(1\).

REMARKS:

  1. The continuous \(U(1)\) symmetry ensures the existence of a conserved current \[\begin{equation} \label{conserved_current} \begin{split} j^\mu &= -i(\bar\phi \partial^\mu \phi - \phi \partial^\mu \bar\phi)\\ &\partial_\mu j^\mu=0 \end{split} \end{equation}\] and of a conserved charge \[\begin{equation} \label{conserved_charge} \begin{split} Q &= \int d^3 x~ j^0\\ &\frac{d}{dt}Q=0 \end{split} \end{equation}\] by Noether’s theorem.

  2. A global symmetry relates physically distinct configurations.

3.4. Consider a field theory with action \(\eqref{action_scalar_global_U(1)}\) and scalar potential \[V(\phi,\bar\phi)=\lambda (|\phi|^2-a^2)^2~,\] with parameters \(\lambda,a>0\), see figure 3.2. The energy (or “Hamiltonian”) is \[\begin{split} E&=\int d^3x ~\left(|\partial_0\phi|^2+|\partial_i\phi|^2+V(\phi,\bar\phi) \right)~\\ &=\int d^3x ~\left(|\dot\phi|^2+|\nabla\phi|^2+V(\phi,\bar\phi) \right)~. \end{split}\]

  1. Show that the configurations of least energy (“vacua”, or “ground states”) parametrize a circle in field space.

  2. Show that different vacua are related by global \(U(1)\) transformations.

The scalar potential V(\phi,\bar\phi)=\lambda (|\phi|^2-a^2)^2.

3.3 \(U(1)\) gauge symmetry

To make the global symmetry local, or a gauge symmetry, we promote the constant parameter \(\alpha\) to a function of spacetime \(\alpha(x)\). For subtle reasons that we might return to later, the parameter \(\alpha(x)\) of a gauge transformation should approach \(0\) (sufficiently fast) at infinity.

If we try to write a kinetic term for \(\phi\), we immediately seem to run into trouble. Under a \(U(1)\) gauge transformation \[\begin{equation} \label{gauge_transfo_de_phi} \partial_\mu \phi \mapsto \partial_\mu\phi' \equiv \partial_\mu (e^{i\alpha}\phi) = e^{i\alpha} \left(\partial_\mu\phi + i (\partial_\mu\alpha)\phi \right) \end{equation}\] since now \(\alpha\) depends on spacetime. Therefore the naive kinetic term \(-|\partial_\mu\phi|^2\) is not invariant under a \(U(1)\) gauge transformation. We say that it is not gauge invariant.

This is a serious problem. But there is a way to fix it: we replace the derivative \(\partial_\mu \phi\) by the so called gauge covariant derivative \[\begin{equation} \label{gauge_cov_derivative_phi} D_\mu \phi := \partial_\mu\phi - i A_\mu \phi \end{equation}\] which includes a new field \(A_\mu\) (the gauge field), whose purpose is to transform under gauge transformations precisely in such a way to cancel the unwanted second term in \(\eqref{gauge_transfo_de_phi}\). This happens if under a \(U(1)\) gauge transformation \[\begin{equation} \label{gauge_trans_A} A_\mu \mapsto A_\mu' = A_\mu + \partial_\mu\alpha~, \end{equation}\] because then \[\begin{equation} \label{gauge_transfo_D_phi} \begin{split} D_\mu \phi = (\partial_\mu \phi - i A_\mu\phi)\mapsto D_\mu' \phi' &\equiv (\partial_\mu \phi' - i A'_\mu\phi')\\ &= e^{i\alpha} \left(\partial_\mu\phi + i (\partial_\mu\alpha)\phi -i A_\mu\phi- i (\partial_\mu\alpha)\phi \right) \\ &= e^{i\alpha} \left(\partial_\mu\phi -i A_\mu\phi\right) = e^{i\alpha} D_\mu \phi ~, \end{split} \end{equation}\] using \(\eqref{gauge_transfo_de_phi}\) and \(\eqref{gauge_trans_A}\). Replacing derivatives \(\partial_\mu\) by gauge covariant derivatives \(D_\mu\) makes the gauge kinetic term of \(\phi\) invariant under \(U(1)\) gauge transformations.

Note that \(\eqref{gauge_trans_A}\) mimics precisely the gauge transformation \(\eqref{gauge_transfo_EM}\) of the 4-vector potential in the theory of electromagnetism. Having introduced a new \(U(1)\) gauge field \(A_\mu\), we now need to write a gauge invariant kinetic term for it. But we know how to do it: we just write the Maxwell Lagrangian from the theory of electromagnetism.

Putting everything together, we find that the action \[\begin{equation} \label{action_scalar_ED} \begin{split} S[\phi,\bar\phi,A_\mu] &= \int d^4x~ \mathcal{L}(\phi, \bar \phi,A_\nu, \partial_\mu \phi, \partial_\mu \bar\phi,\partial_\mu A_\nu)~,\\ \mathcal{L}&= \mathcal{L}_0(\phi, \bar \phi,D_\mu \phi, \overline{D_\mu \phi}) + \mathcal{L}_{\rm Maxwell}(\partial_\mu A_\nu)\\ &= -\overline{D_\mu \phi}D^\mu\phi - U(|\phi|^2) -\frac{1}{4g^2} F_{\mu\nu}F^{\mu\nu}~, \end{split} \end{equation}\] where \(A_\mu\) is a real gauge field (or mathematically, a “gauge connection”) and \[\label{covar_der_fieldstrength_U(1)} \begin{split} D_\mu\phi &:= (\partial_\mu-iA_\mu)\phi \qquad \quad ~~\text{{\bf covariant derivative} of $\phi$}\\ F_{\mu\nu}&:= \partial_\mu A_\nu - \partial_\nu A_\mu \quad\qquad \,\text{{\bf field strength} of $A_\mu$}~, \end{split}\] is invariant under \(G=U(1)\) gauge transformations \[\begin{equation} \label{gauge_transfo_U1} \begin{split} \phi(x) &\mapsto e^{i\alpha(x)} \phi(x) \\ A_\mu(x) & \mapsto A_\mu(x) + \partial_\mu \alpha(x)~. \end{split} \end{equation}\]

REMARKS:

  1. To linear order in the gauge field \(A_\mu\) \[\begin{equation} \label{minimal_coupling} \mathcal{L}= \mathcal{L}_0 + j^\mu A_\mu + \dots \end{equation}\] The scalar field is coupled (via covariant derivatives) to the gauge field \(A_\mu\), and not to the field strength \(F_{\mu\nu}\). To leading order, the gauge field \(A_\mu\) couples directly to the conserved current \(j^\mu\) of the theory with \(U(1)\) global symmetry, which is built out of the scalar field. This type of coupling is called the minimal coupling.

    A common alternative normalization to the one we use is obtained by rescaling the gauge field by one power of the gauge coupling: \(A_\mu \to g A_\mu\). In that normalization the Lagrangian density is \[\begin{aligned} \mathcal{L}&= -\left((\partial^\mu + i g A^\mu)\bar\phi\right) (\partial_\mu - i g A_\mu) \phi - U(|\phi|^2) -\frac{1}{4} F_{\mu\nu}F^{\mu\nu} \\ &= \mathcal{L}_0 + g j^\mu A_\mu + \dots \end{aligned}\] where the ellipses denote terms quadratic in the gauge field. This alternative normalization makes it clear that the gauge coupling \(g\) controls the strength of the coupling between the conserved current \(j^\mu\) of the theory with \(U(1)\) global symmetry and the gauge field \(A_\mu\). In the following we will typically stick to the convention in which the gauge coupling \(g\) appears in front of the kinetic term for the gauge field, rather than inside gauge covariant derivatives.

  2. The group of gauge transformations \[\mathcal{G}= \mathcal{U}(1) := \left. \begin{cases} \begin{tabular}{crcl} $g:$ & $\mathbb{R}^{1,3}$ & $\to$ & $G=U(1)$ \\ & $x^\mu$ & $\mapsto$ & $g(x)=e^{i\alpha(x)}$ \end{tabular} \end{cases} \right\}\] is infinite-dimensional, since it associates independent transformations \(g(x)\) for the fields at different points \(x^\mu\), and there are infinitely many points in space-time. We use calligraphic letters to distinguish the gauge group from the associated finite-dimensional (for \(G=U(1)\), one-dimensional) Lie group. Later on, once we have familiarized ourselves with this distinction, we will typically drop this notation and simply use \(G\) for the gauge group, with a common abuse of notation.

  3. A “gauge symmetry” relates physically equivalent configurations, which are to be identified. The term “ gauge symmetry” is therefore a misnomer: it is not a symmetry, but rather a redundancy in our description of the theory.

    The identification of field configurations which differ by a gauge transformation 14 leads to non-trivial topological properties of gauge fields, which in turn ensure the existence of topological solitons and instantons, non-trivial gauge field configurations which are stable for topological reasons. We will study these configurations in later chapters.

    From now on we omit writing the dependence on the space-time coordinate \(x\). It is understood that all fields and all gauge transformation parameters depend on \(x\).

  4. Under a \(U(1)\) gauge transformation \(\eqref{gauge_transfo_U1}\), \[\begin{split} D_\mu \phi & \mapsto e^{i\alpha} D_\mu \phi~,\\ F_{\mu\nu} &\mapsto F_{\mu\nu} \end{split}\] We say that the covariant derivative \(D_\mu \phi\) of \(\phi\) is gauge covariant, because it transforms in a representation of \(G\) for all \(x\) (the same representation of \(\phi\), namely the charge \(1\) representation here), and that the field strength \(F_{\mu\nu}\) is gauge invariant, because it does not change under a gauge transformation (in fancy language, it transforms in the trivial, or “singlet”, representation).

  5. It is useful to think of the covariant derivative \(D_\mu=\partial_\mu-iA_\mu\) as a differential operator, which acts on everything to its right. The partial derivative \(\partial_\mu\) acts by differentiating all that appears to its right, while the gauge field \(A_\mu\), like all functions of \(x\), acts by multiplying all that appears to its right. Requiring that under a \(U(1)\) gauge transformation \[\begin{equation} \label{D_mu_abelian_gauge_transfo} D_\mu \equiv \partial_\mu - i A_\mu \mapsto D'_\mu \equiv \partial_\mu - i A'_\mu = e^{i\alpha} D_\mu e^{-i\alpha}~, \end{equation}\] so that \[D_\mu \phi \mapsto e^{i\alpha} D_\mu e^{-i\alpha} e^{i\alpha} \phi = e^{i\alpha} D_\mu \phi\] as desired, implies the gauge transformation of the gauge field \[\begin{equation} \label{A_mu_abelian_gauge_transfo} A_\mu \mapsto A'_\mu = A_\mu + \partial_\mu\alpha \end{equation}\] and vice versa.

    We have already proven the implication \(\eqref{D_mu_abelian_gauge_transfo}\) \(\Leftarrow\) \(\eqref{A_mu_abelian_gauge_transfo}\) in \(\eqref{gauge_transfo_D_phi}\). For the opposite implication \(\eqref{D_mu_abelian_gauge_transfo}\) \(\Rightarrow\) \(\eqref{A_mu_abelian_gauge_transfo}\), we expand \(\eqref{D_mu_abelian_gauge_transfo}\) and act with \(\partial_\mu\) on everything to its right. There are two options: either \(\partial_\mu\) acts on \(e^{-i\alpha}\), which produces the function \((\partial_\mu e^{-i\alpha})=-i\ e^{-i\alpha}(\partial_\mu\alpha)\), or \(\partial_\mu\) goes through \(e^{-i\alpha}\), which produces the differential operator \(e^{-i\alpha} \partial_\mu\). 15 Then we find \[\begin{split} D_\mu \equiv \partial_\mu - i A_\mu \mapsto D'_\mu &\equiv \partial_\mu - i A'_\mu = e^{i\alpha} (\partial_\mu - i A_\mu)e^{-i\alpha}\\ &= e^{i\alpha}e^{-i\alpha}(-i\partial_\mu\alpha) + e^{i\alpha}e^{-i\alpha} \partial_\mu - i e^{i\alpha}e^{-i\alpha} A_\mu \\ &=\partial_\mu - i(A_\mu+\partial_\mu\alpha)~, \end{split}\] which comparing the initial expression and the final result implies \[A_\mu \mapsto A'_\mu = A_\mu + \partial_\mu \alpha~.\]

    Furthermore, defining the commutator \([X,Y]:=XY-YX\), we have \[\label{F=[D,D]} [D_\mu, D_\nu]=-iF_{\mu\nu}~,\] so the field strength controls the non-commutativity of covariant derivatives.

    :

    3.5. Show that \[=-iF_{\mu\nu}~,\]

  6. The gauge field \(A_\mu\) is only defined locally, namely in a patch, which we take to be such that the Poincaré lemma applies. As we saw in the gauge theory formulation of electromagnetism, the Bianchi identity \(\epsilon^{\mu\nu\rho\sigma} \partial_\nu F_{\rho \sigma}=0\) implies \(F_{\mu\nu}=\partial_\mu A_\nu - \partial_\nu A_\mu\) only if the Poincaré lemma applies.

    What this means is the following. Consider two patches \(U^{(1)}\) and \(U^{(2)}\) with a non-trivial overlap \(U^{(1)}\cap U^{(2)}\neq \emptyset\). Then the gauge fields \(A_\mu^{(1)}\) and \(A_\mu^{(2)}\) defined in the two patches are related by a gauge transformation \[A_\mu^{(1)}= A_\mu^{(2)}+ \partial_\mu \alpha^{(12)}\] on the overlap \(U^{(1)}\cap U^{(2)}\), so that the field strengths agree: \(F_{\mu\nu}^{(1)}= F_{\mu\nu}^{(2)}\). 16 Mathematically, the gauge transformation parameter \(\alpha^{(12)}\) that relates the gauge fields in the two patches is called a “transition function”. Charged fields are also defined locally, in patches. For consistency, they also transform by a gauge transformation when we switch to another patch.

    This local definition of \(A_\mu\) is responsible for most of the topological and geometric properties of gauge theories. To give you an appetizer, consider a space-time of the form \(\mathbb{R}\times (\mathbb{R}^3 \setminus p)\), where the first factor of \(\mathbb{R}\) is parametrized by time, and the second factor is space, which is flat Euclidean space \(\mathbb{R}^3\) except that we excise the point \(p\) (we could equally excise a 3-ball). It turns out that this space-time is not contractible to a point, but only to a 2-sphere surrounding the point \(p\). (Perhaps you can figure it in your mind. If not, just trust me for now.) Last term, when you learned about stereographic projections, you saw that a 2-sphere can be covered by two patches, see figure 3.3. For instance, we can take patch \(U^{(1)}\) to cover everything north of the southern tropic, and patch \(U^{(2)}\) to cover everything south of the northern tropic. The two patches overlap in the region between the two tropics near the equator, so we need to specify how the gauge field in the northern patch and the gauge field in the southern patch are related in this region where both are defined. As we will see, this freedom allows us to define a magnetic monopole, namely a pointlike magnetic charge, sitting at point \(p\). This is very surprising, because Maxwell’s equations allow electric charge densities but not magnetic charge densities in the right-hand sides. As we will see later, we can by-pass this limitation by exploiting the topology of the gauge field.

    Two patches which cover a 2-sphere S^2, and their overlap.
  7. The appearance of the covariant derivative can also be understood by studying the Lorentz force and writing down the associated Lagrangian. Crucially, for such models there is a difference between the kinematic momentum (the conserved charge following from translation invariance) and the canonical momentum associated with the coordinates in the Hamiltonian formalism. See for a detailed explanation in the context of quantum mechanics.

3.6. So far I have assumed for simplicity that the complex scalar field \(\phi\) has charge \(1\). Go through this chapter and work out how all formulae change if \(\phi\) has charge \(q \in \mathbb{Z}\) rather than charge \(1\).

3.4 Gauge redundancy and gauge fixing

A good reference for this topic is section 6 of David Tong’s QFT lecture notes .

Let us start from the equations of motion (EoM) of the theory of scalar electrodynamics, which is described by the action \(\eqref{action_scalar_ED}\). We recall here the Lagrangian density \[\mathcal{L}= -|D_\mu\phi|^2 - V(\bar\phi,\phi) - \frac{1}{4g^2} F_{\mu\nu}^2 ~,\] where \(F_{\mu\nu}^2 \equiv F_{\mu\nu} F^{\mu\nu}\) etc, and the scalar potential takes the form \(V(\bar\phi,\phi)=U(|\phi|^2)\) to ensure gauge invariance.

3.7. Show that the Euler-Lagrange equations of the above Lagrangian are \[\begin{equation} \label{scalarED_EoM} \begin{split} 1) \quad & D_\mu D^\mu \phi = \frac{\partial V}{\partial\bar\phi} \equiv U'(|\phi|^2) \phi\\ 2) \quad & ~~\partial_\nu F^{\mu\nu} = g^2 J^\mu \end{split} \end{equation}\] where \[\begin{equation} \label{cons_curr_scalarED} J_\mu = - i (\bar\phi D_\mu \phi - \phi D_\mu\bar\phi) = j_\mu - 2 A_\mu |\phi|^2 \end{equation}\] is a conserved current. The EoM for \(\bar\phi\) is the complex conjugate of the EoM for \(\phi\), so I will not write it explicitly. Note that upon gauging the global U(1) symmetry, the conserved current \(j_\mu\) \(\eqref{conserved_current}\) of the scalar field theory with global \(U(1)\) symmetry gets a correction term, due to the presence of the gauge field \(A_\mu\) in the covariant derivatives.

Let us now consider the transformation properties of the EoM \(\eqref{sec:U1_global_symmetry}\) under a \(U(1)\) gauge transformation \(\eqref{gauge_transfo_U1}\). The equations transform as \[\begin{equation} \label{scalarED_EoM_gaugetransfo} \begin{split} 1) &\mapsto e^{i\alpha} 1) \qquad (\text{gauge covariant})\\ 2) &\mapsto 2) \qquad~~~~~ (\text{gauge invariant}) \end{split} \end{equation}\] Therefore, if a field configuration \((\phi, A_\mu)\) solves the EoM \(\eqref{scalarED_EoM}\), then any gauge transformed field configuration \((\phi'=e^{i\alpha}\phi, A_\mu'=A_\mu+\partial_\mu\alpha)\) also solves the EoM \(\eqref{scalarED_EoM}\): the EoM only determine \((\phi, A_\mu)\) up to a gauge transformation.

Given some initial data \((\phi^{(0)}, A_\mu^{(0)})\) specifying the field configuration at an initial time \(t_0\), we cannot uniquely determine the field configuration \((\phi, A_\mu)\) at a later time \(t>t_0\). Indeed \((\phi'=e^{i\alpha}\phi, A_\mu'=A_\mu+\partial_\mu\alpha)\) is as good a solution of the EoM as \((\phi, A_\mu)\), and obeys the same initial condition provided that the gauge parameter \(\alpha\) obeys the conditions \(\alpha(t_0,\vec x)=0\) (mod \(2\pi\)) and \(\partial_\mu \alpha(t_0,\vec x)=0\) at the initial time \(t_0\).

We appear to be in trouble: we would like the EoM to define a well-posed initial value problem and determine uniquely physically observable fields at later times. This is not the case if we regard field configurations which differ by a gauge transformation as physically inequivalent. If instead we declare field configurations which differ by a gauge transformation to be physically equivalent, then the issue disappears and the initial value problem is well-posed. We will therefore identify field configurations related by a gauge transformation, \[\begin{equation} \label{gauge_equivalence} (\phi, A_\mu)~\sim~(\phi'=e^{i\alpha}\phi, A_\mu'=A_\mu+\partial_\mu\alpha)~. \end{equation}\] Physically observable quantities must then be gauge invariant, such as for example the field strength \(F_{\mu\nu}\), the magnitude of the scalar field \(|\phi|^2\), or the conserved current \(J_\mu\). This explains remark 3 in the previous section.

The picture to keep in mind for gauge theories is that field space \(\mathcal{F}= \{\phi(x),A_\mu(x)\}\) is foliated17 by gauge orbits traced by the action of the gauge group \[\mathcal{G}\cdot (\phi(x),A_\mu(x)) = \{(e^{i\alpha(x)}\phi(x), A_\mu(x)+\partial_\mu\alpha(x) ~|~ \alpha(x)\sim \alpha(x)+2\pi \}~.\] In down to earth terms, a gauge orbit simply consists of all the field configurations which are related by a gauge transformation.

The space of all field configurations decomposes into the disjoint union of gauge orbits, each represents a single physical configuration. A complete gauge fixing selects a single representative for each orbit.

Then the identification \(\eqref{gauge_equivalence}\) of field configurations related by gauge transformations states the correspondence 18 \[\text{Physical configuration}~~ \longleftrightarrow~~ \text{Gauge orbit}~.\]

Rather than working with the redundant description of field space \(\mathcal{F}\) subject to the gauge symmetry \(\mathcal{G}\), it is often useful to “fix a gauge” (or pick a gauge, that is, picking a single representative for each gauge orbit). Any representative does the job – after all any two representatives of a given gauge orbit are physically equivalent – but we need to ensure that the gauge fixing cuts each orbit once and only once, as in figure 3.4. If that is not the case, and there is some leftover gauge symmetry that is not fixed, we refer to the gauge fixing as partial or incomplete, and further conditions must be specified in order to have a complete gauge fixing. The topic of gauge fixing is rather technical, and plays an important role in the quantization of gauge theories. Here we will content ourselves with giving a few standard examples of (partial) gauge fixing, which may be useful later on.

EXAMPLES:

  1. Lorenz gauge:
    This gauge is defined by imposing the constraint \[\begin{equation} \label{Lorenz_gauge} \partial_\mu A^\mu = 0 \end{equation}\] on the gauge field 4-vector \(A_\mu\). This can always be achieved. Indeed, if we are given a representative \(A_\mu\) which does not obey the Lorenz gauge condition \(\eqref{Lorenz_gauge}\), then we can find another representative \(A'_\mu=A_\mu + \partial_\mu \alpha\) in the same gauge orbit which obeys the Lorenz gauge constraint \[0 = \partial_\mu A'^\mu = \partial_\mu A^\mu + \partial_\mu \partial^\mu \alpha\] by picking \(\alpha\) to be a solution of the inhomogeneous equation \[\begin{equation} \label{Lorenz_gauge_Poisson} \partial_\mu \partial^\mu \alpha = - \partial_\mu A^\mu~, \end{equation}\] which exists.19

    Let us discuss pros and cons of the Lorenz gauge. The main advantage of the Lorenz gauge is that the constraint \(\eqref{Lorenz_gauge}\) is Lorentz invariant.20 The main disadvantage of the Lorenz gauge is that it only fixes the gauge partially. Indeed, if we are in Lorenz gauge we are free to perform gauge transformations with parameters \(\alpha\) such that \(\partial_\mu \partial^\mu \alpha =0\) and we will remain in the Lorenz gauge. (This corresponds to adding a solution of the homogeneous equation in \(\eqref{Lorenz_gauge_Poisson}\).)

  2. Coulomb gauge (or radiation gauge):
    This gauge is defined by imposing the constraint \[\begin{equation} \label{Coulomb_gauge} \nabla \cdot \vec{A} = 0 \end{equation}\] on the vector potential \(\vec{A}\), which is the spatial part of the 4-vector \(A_\mu\). This can always be achieved, by a similar reasoning to above.

    Compared to the Lorenz gauge, the Coulomb gauge has the clear drawback of not being Lorentz covariant. So this gauge fixing spoils the manifest relativistic symmetry of the formalism, which is not ideal. (The physics of the system remains Lorentz invariant, because gauge transformations are unphysical, they are just a redundancy in our description.) Another drawback, in common with the Lorenz gauge, is that the Coulomb gauge constraint \(\eqref{Coulomb_gauge}\) only fixes the gauge partially. The argument is the same as for the Lorenz gauge, except that we are using spatial indices only instead of full space-time indices.

    On the other hand, a pro of the Coulomb gauge is that the temporal component \(A_0\) of the gauge potential (aka the ‘electric scalar potential’ in electromagnetism) is determined by the charge density \(\rho=J^0\) as in electrostatics: \[\begin{equation} \label{A_0_Coulomb_gauge} A_0(t, \vec x) \propto \int d^3 x'~ \frac{\rho(t,\vec{x}')}{|\vec{x}-\vec{x}'|}~. \end{equation}\] So if the charge density \(\rho=0\), for instance for ‘pure electromagnetism’, in which there is no charged matter \(\phi\), we have \[A_0=0\] in Coulomb gauge. On the other hand, if there are charged fields and hence \(\rho\neq 0\), then \(A_0\neq 0\).

    3.8. Determine the proportionality factor in \(\eqref{A_0_Coulomb_gauge}\).

3.5 \(U(1)\) Wilson line and Wilson loop

Let us conclude this chapter with an appetizer of geometric aspects that we will hopefully return to later. A good reference for this section is section 15.1 of the book by Peskin and Schroeder .

We start by recalling that if \(\phi\) is a charged scalar (of charge \(1\) for definiteness), then its partial derivative is not gauge covariant, that is, it does not transform under a well-defined representation of the \(U(1)\) gauge group. You have seen this explicitly in the first term, when you worked out how \(\partial_\mu \phi\) transforms under a \(U(1)\) gauge transformation \(\eqref{gauge_transfo_U1}\). One can fix this problem by introducing the gauge covariant derivative \(D_\mu \phi = (\partial_\mu - i A_\mu)\phi\), which transform covariantly as a field of charge \(1\) under the gauge transformation \(\eqref{gauge_transfo_U1}\). Hopefully this is all clear by now at a technical level. But why is this, conceptually?

To analyze all the partial derivatives in one fell swoop, let us consider the total differential of \(\phi(x)\), \[\begin{equation} \label{total_diff} d\phi(x)= \lim_{\epsilon\to 0} \frac{\phi(x+\epsilon dx)-\phi(x)}{\epsilon}= \partial_\mu\phi(x) dx^\mu~, \end{equation}\] where I have introduced an infinitesimal book-keeping parameter \(\epsilon\) in front of the line increment \(dx^\mu\), so that I could write the total differential as a limit. The final expression, which writes the total differential of \(\phi(x)\) as the 4-vector \(\partial_\mu \phi(x)\) contracted with the differential increment \(dx^\mu\), follows from Taylor expanding the numerator inside the limit and by taking the limit (see Calculus and AMV).

The reason why the total differential \(\eqref{total_diff}\) of \(\phi\) (and hence its partial derivatives) does not transform covariantly under gauge transformations is that the two terms that we are subtracting inside the limit have different gauge transformation properties \[\begin{split} \phi(x+\epsilon dx) & \mapsto e^{i\alpha(x+\epsilon dx)} \phi(x+\epsilon dx)\\ \phi(x) & \mapsto e^{i\alpha(x)} \phi(x)~, \end{split}\] because \(\alpha(x+\epsilon dx) \neq \alpha(x)\).

This problem can be fixed by introducing the ‘Wilson line’, or the mathematical notion of ‘parallel transport’.

Let \(C\) be an open curve (or a path) from point \(x_1\) to point \(x_2\), see figure 3.5. Mathematically, this is a smooth map from an interval to space-time \(\mathbb{R}^{1,3}\) \[\begin{aligned} {2} C:\quad ~I=[\tau_1&, \tau_2] && \mapsto ~ \mathbb{R}^{1,3}\\ &\tau && \mapsto ~ x^\mu(\tau)\end{aligned}\] with \(x(\tau_1)=x_1\) and \(x(\tau_2)=x_2\) at the endpoints.

An open curve from point x_1 to point x_2.
A closed curve (or ‘loop) with base-point x_1=x_2.

The Wilson line (of charge \(1\)) along the path \(C\) is defined to be \[\begin{equation} \label{Wilson_line} W_C(x_2,x_1) := \exp\left[ i \int_{x_1,\,C}^{x_2} A_\mu(x) dx^\mu \right] \equiv \exp\left[i \int_{\tau_1}^{\tau_2} A_\mu(x(\tau)) \dot x^\mu(\tau) d\tau\right]~, \end{equation}\] where the first integral is the line integral from \(x_1\) to \(x_2\) along \(C\), and the second integral is its expression in the parametrization \(x^\mu(\tau)\). If \(C\) is a closed path (or a ‘loop’), namely if \(x_1=x_2\) as in figure 3.6, then \[\begin{equation} \label{Wilson_loop} W_C:= \exp\left[ i \oint_C A_\mu(x) dx^\mu \right] \end{equation}\] is called the Wilson loop (of charge \(1\)) along the curve \(C\). By standard results from multivariate calculus, the line integral \(\oint_C A_\mu(x) dx^\mu\) only depends on the curve \(C\) and not on the base-point \(x_1=x_2\).

Under a \(U(1)\) gauge transformation \(\eqref{gauge_transfo_U1}\), we claim that the Wilson line \(\eqref{Wilson_line}\) transforms as 21 \[\begin{equation} \label{Wilson_line_gauge_transfo} W_C(x_2,x_1) \mapsto e^{i \alpha(x_2)} W_C(x_2,x_1) e^{-i\alpha(x_1)}~. \end{equation}\]

\[\begin{aligned} W_C(x_2,x_1)= e^{i \int_{x_1,\,C}^{x_2} A_\mu dx^\mu} \mapsto~ & e^{ i \int_{x_1,\,C}^{x_2} (A_\mu+\partial_\mu\alpha) dx^\mu }\\ =&e^{ i \int_{x_1,\,C}^{x_2} A_\mu dx^\mu } e^{ i \int_{x_1,\,C}^{x_2} \partial_\mu\alpha dx^\mu }\\ =& W_C(x_2,x_1) e^{i(\alpha(x_2)-\alpha(x_1))}\\ =& e^{i\alpha(x_2)} W_C(x_2,x_1) e^{-i\alpha(x_1)} ~.\end{aligned}\] To go from the second to the third line, we have used the fact that \(\partial_\mu \alpha dx^\mu = d\alpha(x)\) is an exact differential, so its integral along a curve \(C\) only receives contribution from the boundary terms.

A corollary of the gauge transformation \(\eqref{Wilson_line_gauge_transfo}\) is that the U(1) Wilson loop \(\eqref{Wilson_loop}\) is gauge invariant. To see that, simply set \(x_1=x_2\), or use the fact that the integral of an exact differential along a closed curve vanishes.

REMARKS:

  1. In QM, the Wilson line \(W_C(x_2,x_1)\) is the phase picked up by the wave-function of a charged point particle slowly (more precisely, ‘adiabatically’) moving from \(x_1\) to \(x_2\) along a path \(C\) in the presence of a gauge field.

  2. The Wilson loop \(\eqref{Wilson_loop}\) is gauge invariant and therefore physically observable. It is the phase picked up by the wave-function of a charged point particle slowly moving along a loop \(C\). This phase controls the Aharonov-Bohm effect in QM, a subtle and unexpected form of quantum interference which arises because the wave-function couples directly to the gauge potential \(A_\mu\) rather than to the physical electric and magnetic fields \(\vec{E}\), \(\vec{B}\).

If the loop \(C\) is the boundary of a surface \(\Sigma\), then by a higher-dimensional version of Stokes’ theorem (see Differential Geometry III) one has \[\begin{equation} \label{Stokes_Fmunu} \begin{split} \oint_C A_\mu(x) dx^\mu &= \frac{1}{2}\int_{\Sigma} F_{\mu\nu}(x) dx^\mu\wedge dx^\nu\\ &\equiv \frac{1}{2} \int_{x^{-1}(\Sigma)} F_{\mu\nu}(x(\sigma)) \left( \frac{\partial x^\mu(\sigma)}{\partial\sigma^1}\frac{\partial x^\nu(\sigma)}{\partial\sigma^2} - \frac{\partial x^\nu(\sigma)}{\partial\sigma^1}\frac{\partial x^\mu(\sigma)}{\partial\sigma^2} \right) d\sigma^1 d\sigma^2 \end{split} \end{equation}\] where \(x^\mu(\sigma)\equiv x^\mu(\sigma^1,\sigma^2)\) is a parametrization of the surface \(\Sigma\). 22 The previous formula is a higher-dimensional analogue of Stokes’ theorem \[\oint_C \vec{A}\cdot d\vec l = \int_\Sigma (\nabla \times \vec{A}) \cdot \hat{n} ~d^2\sigma = \int_\Sigma \vec{B} \cdot \hat{n} ~d^2\sigma ~,\] which is used in electromagnetism to relate the circulation of the vector potential \(\vec{A}\) along \(C\) to the magnetic flux through a surface with boundary \(C\). The formula \(\eqref{Stokes_Fmunu}\) tells us that the field strength \(F_{\mu\nu}\) encodes the value of infinitesimal Wilson loops.

If the loop \(C\) is not contractible to a point, it may happen that \(A_\mu\neq 0\) and therefore \[\oint_C A_\mu dx^\mu \neq 0\] even if the field strength \(F_{\mu\nu}=0\) vanishes everywhere in the region probed by a quantum-mechanical particle (or by a charged scalar field). Examples of spaces which allow these phenomenon are \(\mathbb{R}^2 \setminus p\), for loops which encircle the removed point \(p\), or the torus \(T^n\), for loops that wind non-trivially around a circle direction in the torus.

Time permitting, we will return to the Aharonov-Bohm effect later. For an accessible summary, see section 10.5.3 of , up to equation (10.100) or the excellent book .

3.6 The Dirac monopole (à la Wu and Yang)

For this topic, see sections 1.9, 9.4.1 and 10.5.2 of .

In this section we will investigate the question: can we have a magnetic field localized near a point in space \(\mathbb{R}^3\)? The resulting putative configuration is called a magnetic monopole, to contrast it with the magnetic dipoles which are physically realized and observed in real world magnets and have two poles.

We can already ask the question of the mathematical existence of magnetic monopoles in pure electromagnetism. The immediate answer that comes to mind is that no, magnetic monopoles are forbidden by Maxwell’s equations \[\begin{equation} \label{Maxwell_eqns_reminder} \begin{split} \partial_\nu F^{\mu\nu} &= J^\mu \\ \partial_\nu \tilde F^{\mu\nu} &= 0 \end{split} \end{equation}\] where \(\tilde F^{\mu\nu}:= \frac{1}{2}\epsilon^{\mu\nu\rho\sigma} F_{\rho\sigma}\) is the dual field strength which is obtained from the original field strength by the replacement \((\boldsymbol{E}, \boldsymbol{B})\to (\boldsymbol{B}, -\boldsymbol{E})\). The vacuum Maxwell equations which are obtained by setting to zero the sources for the electric and magnetic fields in the right-hand side, are invariant under the electric-magnetic duality that sends \((\boldsymbol{E}, \boldsymbol{B})\mapsto (\boldsymbol{B}, -\boldsymbol{E})\) or equivalently \(F_{\mu\nu} \mapsto \tilde F_{\mu\nu}\). But the sources break this symmetry: in the first equation of \(\eqref{Maxwell_eqns_reminder}\) we have the electric current 4-vector \(J^\mu\), but there is no analogous magnetic current 4-vector \(\tilde J^\mu\) in the second equation. It is precisely the absence of a magnetic current 4-vector in the Maxwell equations that allows us to write the field strength in terms of a gauge field. For static field configurations, we have \[\boldsymbol{B} = \nabla \times \boldsymbol{A} \qquad \Longrightarrow \qquad \nabla \cdot \boldsymbol{B} = 0~,\] with no magnetic charge density \(\tilde \rho\) in the right-hand side to source the magnetic field \(\boldsymbol{B}\).

The previous argument seems to suggest that if we accept Maxwell’s equations as the correct mathematical description of the phenomena of electromagnetism, then pointlike electric charges are allowed, but pointlike magnetic charges are not. But Dirac found a loophole in this reasoning and was able to describe a magnetic monopole, which is dubbed the Dirac monopole since. Or almost... Dirac’s argument involves a so-called Dirac string, which has a localized magnetic flux inside it, much like an infinitesimally thin solenoid. The Dirac string ends at a point, from which a radial magnetic field emanates, analogously to the electric field that emanates from an electrically charged point particle. That’s the Dirac monopole. The location of the Dirac string turns out to be be unphysical, as it can be moved around by performing a gauge transformation, but the endpoint of the string, which is the center of the monopole, is physical. Then by a quantum-mechanical consideration (requiring that the wave-function of a charged particle is single-valued when the particle loops around the Dirac string, which is equivalent to requiring that the Wilson line around the Dirac string is equal to 1) it follows that the magnetic charge is quantized. Note that in Dirac’s point of view there is no pointlike magnetic charge really, just the endpoint of a movable Dirac string coming in from infinity. The magnetic flux through a 2-sphere that surrounds the endpoint of the Dirac string is zero, because the magnetic flux that enters the sphere from the Dirac string is equal and opposite to the flux that exits the sphere having emanated from the endpoint of the Dirac string (or the Dirac monopole).

The explanation of the Dirac monopole with the Dirac string can be confusing. Luckily, one can improve on Dirac’s intuition, reinterpreting it in more geometric terms, to actually describe a genuine pointlike magnetic charge. This was achieved by Wu and Yang , and it’s their modern description of the Dirac monopole that we will present here. The key point that will allow us to introduce a magnetic monopole is to remove from space \(\mathbb{R}^3\) a point, the position of the monopole, which we will set to be the origin \(O\) in what follows. Then, while \(\nabla \cdot \boldsymbol{B}=0\) everywhere in \(\mathbb{R}^3 \setminus O\), we can still have a non-vanishing magnetic flux through any 2-sphere surrounding the location of the monopole, which is measured by the magnetic charge \[\begin{equation} \label{magnetic_charge} m = \frac{1}{2\pi} \int_{S^2} \boldsymbol{B} \cdot d^2\boldsymbol{\sigma }~, \end{equation}\] where \(d^2\boldsymbol{\sigma}\) is the infinitesimal area element of the sphere, see figure 3.7.

Magnetic flux produced by a magnetic monopole at the origin.

REMARK:
We could equivalently work on \(\mathbb{R}^3\) and use Gauss’ theorem to rewrite \(\nabla \cdot \boldsymbol{B}=0\) on \(\mathbb{R}^3\setminus O\) together with \(\eqref{magnetic_charge}\) as \[\begin{equation} \label{magnetic_monopole_R^3} \nabla \cdot \boldsymbol{B} = 2\pi m ~\delta^{(3)}(\boldsymbol{x}) \qquad \text{in}~~\mathbb{R}^3~, \end{equation}\] but it is preferable to work in \(\mathbb{R}^3\setminus O\), which allows us to use gauge fields.

Using polar coordinates in \(\mathbb{R}^3\), we have the identities \[\begin{equation} \label{identities_R^3} \nabla \frac{1}{r} = - \frac{\boldsymbol{x}}{r^3}~, \qquad \Delta \frac{1}{r} = -4\pi \delta^{(3)}(\boldsymbol{x})~, \end{equation}\] where \(r=|\boldsymbol{x}|\) and \(\Delta \equiv \nabla^2\) is the Laplacian. Then we can solve \(\eqref{magnetic_monopole_R^3}\) by \[\begin{equation} \label{magnetic_monopole_R^3_2} \boldsymbol{B} = \frac{m}{2} \frac{\boldsymbol{x}}{r^3} = \frac{m}{2} \frac{1}{r^2}~\hat{\boldsymbol{x}}~, \end{equation}\] similarly to how we describe pointlike electric charges.

What about the vector potential or gauge field \(\boldsymbol{A}\)? We cannot write a smooth \(\boldsymbol{A}\) which is defined everywhere in \(\mathbb{R}^3\), such that \(\boldsymbol{B} := \nabla \times \boldsymbol{A}\) obeys \(\eqref{magnetic_monopole_R^3}\), because then we would have \(\nabla \cdot (\nabla \times \boldsymbol{A}) =0\). Next, we can try to write a smooth \(\boldsymbol{A}\) which is defined everywhere in \(\mathbb{R}^3\setminus O\), such that \(\boldsymbol{B} := \nabla \times \boldsymbol{A}\) obeys \(\nabla \cdot \boldsymbol{B} = 0\). But this fails too. Indeed, consider for instance the vector potential \(\boldsymbol{A}^+\) given by 23 \[\label{Dirac monopole_north} A^+_x = - \frac{m}{2}\frac{y}{r(r+z)}~, \quad A^+_y = + \frac{m}{2}\frac{x}{r(r+z)}~, \quad A^+_z = 0~.\] The corresponding magnetic field is

3.9. \[\begin{equation} \label{curl_Dirac_1} \nabla \times \boldsymbol{A}^+ = \frac{m}{2} \frac{\boldsymbol{x}}{r^3} ~ \end{equation}\]

as we hoped, but unfortunately this only holds where \(\eqref{Dirac monopole_north}\) is defined, namely on \(\mathbb{R}^3\) minus the origin and the negative \(z\) axis. We can try harder, but we will only be able to move the semi-infinite open path where the gauge field is ill-defined (different choices are related by singular gauge transformations).

Spherical coordinates.

The reason why it is not possible to find a globally defined gauge field on \(R^3\setminus O \cong \mathbb{R}_{>0}\times S^2\) is that in this space there is a two-sphere surrounding the origin, and the two-sphere is a differentiable manifold which requires at least two charts (or patches) with the topology of an open disc. Working in polar coordinates \((r,\theta,\varphi)\), see figure 3.8, we can take the two patches on \(S^2\) to be 24 \[\begin{equation} \label{U_+-} \begin{split} U_+ &=\{(\theta,\varphi)~|~ 0\le \theta< \frac{\pi}{2}+\epsilon\} \\ U_- &=\{(\theta,\varphi)~|~ \frac{\pi}{2}-\epsilon < \theta\le \pi\} \end{split} \end{equation}\] for a constant \(\epsilon\in (0,\pi)\). The two patches overlap in a region \[\begin{equation} \label{overlap_U+-} \begin{split} U_+ \cap U_- &=\{(\theta,\varphi)~|~ \frac{\pi}{2}-\epsilon < \theta< \frac{\pi}{2}+\epsilon\} \end{split} \end{equation}\] near the equator, which has the topology of an open interval (parametrized by \(\theta\)) times as circle (parametrized by \(\varphi\)). Then we can view \(\boldsymbol{A}^+\), defined in \(\eqref{Dirac monopole_north}\) in terms of Cartesian coordinates, as a gauge field defined in the northern patch \(U^+\). We now need to define a gauge field in the southern patch \(U^-\), and to figure out how \(\boldsymbol{A}^+\) and \(\boldsymbol{A}^-\) are related on the overlap \(U_+ \cap U_-\). The key idea is that on the overlap \(U_+ \cap U_-\) the two gauge fields are allowed to differ by a gauge transformation, since field configurations which are related by a gauge transformation are physically equivalent. On the southern patch \(U_-\) we can take the gauge field to be \(\boldsymbol{A}^-\), defined by \[\label{Dirac monopole_south} A^-_x = + \frac{m}{2}\frac{y}{r(r-z)}~, \quad A^-_y = - \frac{m}{2}\frac{x}{r(r-z)}~, \quad A^-_z = 0~,\] which also has magnetic field \[\begin{equation} \label{curl_Dirac_2} \nabla \times \boldsymbol{A}^- = \frac{m}{2} \frac{\boldsymbol{x}}{r^3} ~ \end{equation}\] where it is defined.

Since the gauge fields \(\boldsymbol{A}^+\) and \(\boldsymbol{A}^-\) lead to the same gauge invariant magnetic field \(\boldsymbol{B}^+ := \nabla \times \boldsymbol{A}^+ = \nabla \times \boldsymbol{A}^- =: \boldsymbol{B}^-\) in the overlap region \(U_+ \cap U_-\) where they are both defined, we might expect them to be gauge equivalent. To see this explicitly, it is easier to switch to polar coordinates. Using differential form notation we find 25

3.10. \[\begin{equation} \label{A_cartesian_to_polar} \begin{split} A^\pm &= A^\pm_x dx + A^\pm_y dy + A^\pm_z dz = A^\pm_r dr + A^\pm_\theta d\theta + A^\pm_\varphi d\varphi\\ &= \frac{m}{2}(\pm 1 -\cos\theta)~d\varphi~. \end{split} \end{equation}\]

Then we find that on the overlap of the two patches \(U_+ \cap U_-\) the two gauge fields differ by \[\begin{equation} \label{transition_function_Dirac_monopole_1} A^+ - A^- = m~d\varphi = d(m\varphi) \equiv d \alpha_{+-} \equiv -i g_{+-}^{-1} d g_{+-} \end{equation}\] where the transition function, namely the parameter of the \(U(1)\) gauge transformation that relates the gauge fields in the two patches, is \[\begin{equation} \label{transition_function_Dirac_monopole_2} g_{+-}(\varphi) = e^{i\alpha_{+-}(\varphi)} = e^{i m \varphi} \in U(1)~. \end{equation}\] Since \(\varphi \sim \varphi + 2\pi\), \(g_{+-}(\varphi)\) is single-valued (or periodic) if we do one lap around the \(\varphi\) circle (e.g. , the equator) if and only if the magnetic charge \(m\) is an integer: \[\begin{equation} \label{Dirac_quantization} g_{+-}(\varphi+2\pi) = g_{+-}(\varphi) \quad \Longleftrightarrow \quad m \in \mathbb{Z}~. \end{equation}\] We learn that the quantization of the magnetic charge follows from carefully considering gauge fields defined locally on the two patches of \(S^2\), and gluing them consistently by \(U(1)\) gauge transformations in the overlap of the two patches. The \(\mathcal{U}(1)\)-valued transition function \(g_{+-}\) on the overlap tells us how to relate gauge transformation parameters \(g_\pm\) on \(U_+\) and \(U_-\) along the overlap: \(g_+ = g_{+-}g_-\), or equivalently \(\alpha_+=\alpha_- + \alpha_{+-}\). Mathematically, the \(\mathcal{U}(1)\) gauge transformation parameters define sections of a so called principal \(U(1)\) bundle over \(S^2\); the gauge fields \(A^\pm\) are (local) connections for this principal \(U(1)\) bundles. If you want to learn about the definition of these bundles, their sections and connections, and how they provide a mathematical definition of gauge groups and gauge fields, see the bonus chapter 9.

REMARK:
In this formulation we can calculate the magnetic flux through the 2-sphere surrounding the origin (the position of the magnetic monopole) as follows. Call \(U_N\) and \(U_S\) the northern and southern hemisphere respectively, which are the limits as \(\epsilon \to 0\) of \(U^\pm\), so that the overlap reduces to the equator \(S^1_{\rm eq}\). Then the contributions of the two hemispheres to the magnetic flux add up: \[\begin{equation} \label{magnetic_flux_Dirac_monopole} \begin{split} \frac{1}{2\pi} \Phi_{S^2} (\boldsymbol{B}) &= \frac{1}{2\pi} \int_{S^2} \boldsymbol{B} \cdot d^2 \boldsymbol{\sigma }= \frac{1}{2\pi} \int_{U_N} \boldsymbol{B}^+ \cdot d^2 \boldsymbol{\sigma }+ \frac{1}{2\pi} \int_{U_S} \boldsymbol{B}^-\cdot d^2 \boldsymbol{\sigma }\\ &= \frac{1}{2\pi} \int_{U_N} (\nabla \times\boldsymbol{A}^+) \cdot d^2 \boldsymbol{\sigma }+ \frac{1}{2\pi} \int_{U_S} (\nabla \times \boldsymbol{A}^-) \cdot d^2 \boldsymbol{\sigma }\\ &= \frac{1}{2\pi} \oint_{S^1_{\rm eq}} \boldsymbol{A}^+ \cdot d \boldsymbol{l} - \frac{1}{2\pi} \oint_{S^1_{\rm eq}} \boldsymbol{A}^- \cdot d \boldsymbol{l}\\ &= \frac{1}{2\pi} \oint_{S^1_{\rm eq}} (\boldsymbol{A}^+ - \boldsymbol{A}^-) \cdot d \boldsymbol{l} = \frac{1}{2\pi} \oint_{S^1_{\rm eq}} (A^+ - A^-)\\ &= \frac{1}{2\pi} \oint_{S^1_{\rm eq}} d \alpha_{+-} = \frac{m}{2\pi} \int_0^{2\pi} d\varphi = m~. \end{split} \end{equation}\] To go from the second to the third line we used Stokes’ theorem. The relative minus sign between the two terms is there because the two hemisphere have opposite orientations, so that \(\partial U_N = S^1_{\rm eq}\) but \(\partial U_S = - S^1_{\rm eq}\) (the equatorial circle with the opposite orientation), see figure \(\eqref{fig:oriented_hemispheres}\). This reproduces the desired result \(\eqref{magnetic_charge}\).

Oriented hemispheres and their oriented boundaries.

This is very nice! We can describe a static solution of Maxwell’s equations which is a pointlike magnetic charge (magnetic monopole) by excising the location of the monopole from space and exploiting the geometry and topology of gauge fields over \(\mathbb{R}^3 \setminus O\) (or equivalently of \(S^2\)). But unfortunately it is not hard to see that a Dirac monopole has infinite energy. This problem can be fixed if we embed the \(U(1)\) gauge group into a bigger nonabelian gauge group, such as \(SU(2)\).

3.11. The energy stored in electromagnetic fields is \[\tfrac{1}{2} \int d^3x ( \boldsymbol{E}^2 + \boldsymbol{B}^2) \, .\] Show that the energy of the magnetic monopole solution \(\eqref{magnetic_monopole_R^3_2}\) is infinite. How about an electric monopole?

4 Non-abelian gauge theories

In this chapter we will learn how to formulate gauge theories with a non-abelian (that is, non-commutative) gauge group. Non-abelian gauge theories are named Yang-Mills theories, after Chen-Ning Yang and Robert Mills, who developed the formalism in 1954 .

The formalism of Yang and Mills became prominent in the late 1960s, and has remained central in modern physics ever since. Non-abelian gauge theories are the language of the Standard Model of Particle Physics, and have also established very fruitful interactions between Physics and Maths, which have led to numerous developments in both subjects and quite a few Nobel prizes and Fields medals.

We will focus on compact Lie groups. As common in the physics literature, we will choose to write group elements \(g\) in terms of Lie algebra elements as \[g = \exp(i \alpha^a t_a)\] for real numbers \(\alpha_a\), i.e. we will write a basis of the Lie algebra as \(i t_a\). In such a basis the structure constants are related to the generators \(t_a\) by \[\begin{equation} \label{Lie_bracket} [t_a, t_b] = i f_{ab}{}^c t_c \qquad (a,b,c=1,\dots,\dim\mathfrak{g})\, . \end{equation}\] As we have seen in Michaelmas term, we can always assume that \(r(g)\) is unitary for any compact Lie group, the above normalization has the advantage that the \(t_a\) are Hermitian, \(t^\dagger_a = t_a\).

For simplicity of notation, we will write the representation of the \(t_a\) associated with a representation \(r\) of the Lie group as \(t_a^{({\bf r})}\).

4.1 Non-abelian gauge theories: fields

This section introduces the cast of characters which we will use in the next section to formulate actions which are invariant under non-abelian gauge transformations. The cast of characters will consist of:

and their gauge transformations.

References for this section are section 1.8.1 of and section 2.1 of .

We will be more general later, but let us start slowly and assume that the gauge group \(G\) is a classical group (e.g. \(SU(N)\)), whose elements are matrices, and that the charged field \(\phi\) transforms in the fundamental representation \(\rm{fund}\) (that is \(\mathbf{N}\) for \(SU(N)\)). This means that the gauge transformation of the charged field \(\phi\) is \[\begin{equation} \label{gauge_transfo_phi} \phi \mapsto g \phi = e^{i \alpha^a t_a} \phi \end{equation}\] where \(\phi\) is a column vector (\(N\)-dimensional for \(G=SU(N)\), that is \(\phi=(\phi^j)_{j=1}^N \in \mathbb{C}^N\)), the Lie algebra generators \(t_a\) are matrices (\(N\times N\) hermitian traceless for \(G=SU(N)\)), and the group element \(g\) is also a matrix (\(N\times N\) unitary and with unit determinant for \(G=SU(N)\)), which acts on \(\phi\) by matrix multiplication. Recall that both the field \(\phi=\phi(x)\) and the group element \(g=g(x)\), and therefore the gauge parameter \(\alpha=\alpha(x)\), depend on the space-time point \(x\).

Given the charged field \(\phi\), we define its (gauge) covariant derivative \[\begin{equation} \label{cov_der_phi} D_\mu\phi:=\partial_\mu\phi-iA_\mu\phi \end{equation}\] where the gauge field \(A_\mu\) is now a matrix, which will turn out to be an element of the Lie algebra to ensure the consistency of its gauge transformation: \[\begin{equation} \label{A_mu_Lie_alg} A_\mu = A_\mu^a t_a~. \end{equation}\] We require that under the non-abelian gauge transformation \(\eqref{gauge_transfo_phi}\) the covariant derivative transforms in the same way as \(\phi\): \[\begin{equation} \label{gauge_transfo_Dphi} D_\mu\phi \mapsto g D_\mu\phi~. \end{equation}\] Viewing the covariant derivative 28 \[D_\mu := \mathds{1}\partial_\mu - i A_\mu\] as a matrix-valued differential operator, which in components reads \[(D_\mu)^j{}_k = \delta^j{}_k \partial_\mu - i (A_\mu)^j{}_k~,\] we require the gauge transformation \[\begin{equation} \label{gauge_transfo_D} D_\mu \mapsto g D_\mu g^{-1}~. \end{equation}\] In terms of the gauge field, the gauge transformation of the covariant derivative is \[\begin{split} \partial_\mu - i A_\mu ~~\mapsto~~ \partial_\mu-iA'_\mu &= g(\partial_\mu-iA_\mu)g^{-1}\\ &=g(\partial_\mu g^{-1}) + g g^{-1} \partial_\mu -i g A_\mu g^{-1}~. \end{split}\] Note that the gauge group element \(g\) and the gauge field \(A_\mu\) are matrices now, so they do not commute: their order matters!

Comparing the initial and final result, we obtain the following gauge transformation for the gauge field \(A_\mu\): \[\begin{equation} \label{gauge_transfo_A} \begin{split} A_\mu ~~\mapsto~~ A'_\mu &= gA_\mu g^{-1} + i g (\partial_\mu g^{-1})\\ &=gA_\mu g^{-1} - i (\partial_\mu g)g^{-1}~, \end{split} \end{equation}\] where I have used parenthesis to make it clear that all objects are (matrix-valued) functions, 29 not differential operators. I have used the identity \[\begin{equation} \label{deriv_identity} 0 = (\partial_\mu \mathds{1}) = (\partial_\mu(g g^{-1})) = (\partial_\mu g)g^{-1} + g (\partial_\mu g^{-1})~ \end{equation}\] to go from the first line to the second line.

REMARKS:

  1. The first term in the gauge transformation \(\eqref{gauge_transfo_A}\) of the gauge field \(A_\mu\) is the adjoint action of the Lie group \(G\) on a Lie algebra element. This clarifies why \(A_\mu\) belongs to the Lie algebra \(\mathfrak{g}=\mathrm{Lie}(G)\).

  2. The second term in \(\eqref{gauge_transfo_A}\) is a correction term to the adjoint action, which involves a derivative. This is also an element of the Lie algebra \(\mathfrak{g}\), which can be seen as follows:
    consider the path \(g(t_0+t)g^{-1}(t_0)\), which passes through the identity for \(t=0\). The associated Lie algebra element is \[\left.\frac{\partial}{\partial t} g(t_0+t)g^{-1}(t_0) \right|_{t=0} = \left.\left(\frac{\partial}{\partial t_0} g(t_0+t) \right) g^{-1}(t_0) \right|_{t=0} = \left(\frac{\partial}{\partial t_0} g(t_0) \right) g^{-1}(t_0)\] For any path \(g(t)\), we hence have that \((\partial_t g(t)) g^{-1}(t) \in \mathfrak{g}\) for all \(t\). For \(g(\boldsymbol{x}\) we get paths by setting \(t=x^\mu\) for some \(\mu\) while keeping the other components of \(\boldsymbol{x}\) fixed. Hence \[\left(\partial_\mu g(\boldsymbol{x}) \right) g^{-1}(\boldsymbol{x}) \in \mathfrak{g} \, .\]

Finally, in analogy with the \(G=U(1)\) case, we define the field strength \[\begin{equation} \label{F_nonab} F_{\mu\nu}:=i[D_\mu,D_\nu]~. \end{equation}\] As in the \(U(1)\) case, in the above definition we view both sides as differential operators, except that now they are matrix-valued. As we will see shortly, despite appearance \(F_{\mu\nu}\) turns out to be a multiplicative operator, which means that it is a (matrix-valued) function that simply acts by (matrix) multiplication, no differentiations are involved.

By construction, under a gauge transformation \(\eqref{gauge_transfo_phi}\) the field strength transforms as \[\begin{equation} \label{gauge_transfo_F} F_{\mu\nu} \mapsto g F_{\mu\nu} g^{-1}~. \end{equation}\]

We simply need to use the gauge transformation property \(\eqref{gauge_transfo_D}\) and basic properties of the commutator: \[\begin{split} F_{\mu\nu} = i [D_\mu,D_\nu] \mapsto F'_{\mu\nu} &= i[g D_\mu g^{-1}, g D_\nu g^{-1}] \\ &= g [D_\mu,D_\nu]g^{-1} = g F_{\mu\nu} g^{-1}~. \end{split}\]

Calculating the commutator in \(\eqref{F_nonab}\), we find the following expression for the field strength: \[\begin{equation} \label{F_from_A_nonab} F_{\mu\nu} = \partial_\mu A_\nu - \partial_\nu A_\mu - i [A_\mu,A_\nu]~. \end{equation}\] Restoring the identity matrix \(\mathds{1}\) for clarity (feel free to omit it if you are comfortable without it), \[\begin{split} -iF_{\mu\nu} &= [D_\mu,D_\nu] = [\mathds{1}\partial_\mu - i A_\mu, \mathds{1}\partial_\nu - i A_\nu] \\ &= [\mathds{1}\partial_\mu, \mathds{1}\partial_\nu] - i [\mathds{1}\partial_\mu ,A_\nu] - i [A_\mu, \mathds{1}\partial_\nu] - [A_\mu, A_\nu]\\ &=0 - i (\partial_\mu A_\nu) + i (\partial_\nu A_\mu) - [A_\mu,A_\nu]\\ &=-i \left(\partial_\mu A_\nu - \partial_\nu A_\mu -i[A_\mu, A_\nu]\right)~. \end{split}\]

REMARK: The finite gauge transformations \(\eqref{gauge_transfo_D}\) of the covariant derivative \(D_\mu\) and \(\eqref{gauge_transfo_F}\) of the field strength \(F_{\mu\nu}\) is by the adjoint action of the Lie group on the Lie algebra. This means that \(D_\mu\) and \(F_{\mu\nu}\) transform in the adjoint representation \(\mathrm{adj}\) of \(G\).

4.1. By considering infinitesimal gauge transformations (\(|\alpha^a|\ll 1\)) \[\begin{equation} \label{infinites_gauge_transfos} g = e^{i \alpha^a t_a} \equiv e^{i\alpha} = 1 + i \alpha + O(\alpha^2) \end{equation}\] and Taylor expanding finite gauge transformations to leading order in \(\alpha\in \mathfrak{g} = {\rm Lie}(G)\), show that the infinitesimal gauge variations of the fields are \[\begin{equation} \label{inf} \begin{split} \delta_\alpha \phi &= i \alpha \phi\\ \delta_\alpha A_\mu &= i [\alpha,A_\mu]+\partial_\mu\alpha\\ \delta_\alpha F_{\mu\nu} &= i[\alpha,F_{\mu\nu}]~, \end{split} \end{equation}\] where \(\phi \mapsto \phi + \delta_\alpha \phi + O(\alpha^2)\) and so on.

REMARKS:

  1. The field strength \(F_{\mu\nu}\) transforms in the \(\mathrm{adj}\) rep of \(\mathfrak{g}\) under infinitesimal gauge transformations.

  2. The gauge field \(A_\mu\) doesn’t quite transform in \(\mathrm{adj}\), as the first term in its variation suggests, because of the additional derivative term, which we have already encountered when we studied \(\mathfrak{g}=u(1)\). People often say (and I might also say in the future) that \(A_\mu\) transforms in the adjoint representation \(\mathrm{adj}\), but that’s an abuse of terminology.

  3. On the other hand the covariant derivative \(D_\mu\) does transform in the \(\mathrm{adj}\) representation.

Everything that we have seen so far generalizes to an arbitrary Lie group \(G\) and a charged field \(\phi\) transforming in an \(r\)-dimensional representation \(\mathbf{r}\). Now \(\phi\) is a column vector with \(r\) components, and we simply need to replace the group element \(g\) in previous formulae by the appropriate \(r\times r\) representation matrix \[\begin{equation} \label{repr_matrix} r(g) = \exp[i \alpha^a t_a^{(\mathbf{r})}]~. \end{equation}\] For instance \[D_\mu\phi = \partial_\mu\phi - i A_\mu \phi := \left(\mathds{1}_r \partial_\mu - i A_\mu^a t_a^{(\mathbf{r})}\right) \phi~,\] and \[\begin{split} F_{\mu\nu}\phi &= i[D_\mu,D_\nu]\\ &= (\partial_\mu A_\nu - \partial_\nu A_\mu - i [A_\mu,A_\nu])\phi\\ &= (\partial_\mu A_\nu^a - \partial_\nu A_\mu^a +f_{bc}{}^a A_\mu^b A_\nu^c ) t_a^{(\mathbf{r})} \phi~, \end{split}\] where it is understood that if \(\phi\) transforms in the representation \(\mathbf{r}\), then \[\label{apply_gauge_field)notation} \begin{split} A_\mu\phi&:= A_\mu^a t_a^{(\mathbf{r})} \phi\\ F_{\mu\nu}\phi&:= F_{\mu\nu}^a t_a^{(\mathbf{r})} \phi \end{split}\] etc. Similarly, I should warn you that it is customary to simply write \(g \phi\), to mean the abstract action of \(g\) on \(\phi\) in the appropriate representation, rather than the explicit multiplication \(r(g)\phi\) by the representation matrix \(r(g)\). Of course one needs to specify the representation \(\mathbf{r}\) beforehand, or it wouldn’t be clear what \(g \phi\) means.

In components, \[(A_\mu\phi)^i = A_\mu^a (t_a^{(\mathbf{r})} )^i{}_j \phi^j \qquad (i,j=1,\dots,r)\] etc.

4.2. Show that, if \(G=U(1)\), all the equations written so far in this section reduce to those introduced in chapter 3, both for the charge \(1\) representation, which is analogous to the fundamental representation, and for the more general charge \(q\) representation.

4.3. Consider a field \(\phi\) in the \(\mathrm{adj}\) representation, with components \(\phi^a\), where \(a=1,\dots,\dim~\mathfrak{g}\).

  1. Show that \[(A_\mu\phi)^a = i f_{bc}{}^a A_\mu^b \phi^c\] and similarly for \((F_{\mu\nu}\phi)^a\).
    [Hint: we worked out the matrices defining the adjoint representation in problem 29 of Michaelmas term, but wrote group elements as \(e^{\alpha^a \hat{t}_a}\) instead of the physics convention \(e^{i \alpha^a t_a}\) used here]

  2. Let \(\Phi:=\phi^a t_a\), and \(A_\mu=A_\mu^a t_a\), \(F_{\mu\nu}=F_{\mu\nu}^a t_a\) as usual. Show that \[(A_\mu \phi)^a t_a = [A_\mu,\Phi]\] and similarly for \(F_{\mu\nu}\phi\). Show that therefore \[\begin{split} D_\mu \Phi &= \partial_\mu\Phi - i [A_\mu, \Phi]\\ [D_\mu,D_\nu]\Phi &= -i [F_{\mu\nu},\Phi]~. \end{split}\] The lesson here is that the action of the adjoint representation on itself is by commutators (or Lie brackets). We have already seen that the associated Lie algebra representation of adjoint lets the Lie algebra act on itself via commutators in Michaelmas term.

4.2 Non-abelian gauge theories: action and EoM

Let us start by constructing a gauge invariant action for the (Lie algebra valued) non-abelian gauge field \(A_\mu=A_\mu^a t_a\). This is easy: since the field strength \(F_{\mu\nu}=F_{\mu\nu}^a t_a\) transforms as \[\begin{equation} \label{F_adj} F_{\mu\nu} \mapsto g F_{\mu\nu}g^{-1} \end{equation}\] under a gauge transformation, it follows immediately that \(tr(F_{\mu\nu}F^{\mu\nu})\) is gauge invariant and can therefore be used as a term in the Lagrangian density.

Under a gauge transformation, \[\begin{split} tr(F_{\mu\nu}F^{\mu\nu}) &= tr(g F_{\mu\nu}g^{-1} gF^{\mu\nu} g^{-1}) \\ &= tr(g^{-1}g F_{\mu\nu}g^{-1} gF^{\mu\nu}) = tr(F_{\mu\nu}F^{\mu\nu}) ~. \end{split}\] where we have used the cyclic property of the trace.

We are now ready to define the Yang-Mills action \[\begin{equation} \label{YM_action} \begin{split} S_{YM}[A] &= \int d^4x ~\mathcal{L}_{YM}~,\\ \mathcal{L}_{YM} &= -\frac{1}{2g_{YM}^2} tr(F_{\mu\nu} F^{\mu\nu})~, \end{split}\, . \end{equation}\] Working in a normalization where \[\begin{equation} \label{normalization_quadratic_inv_fund} tr\,\, t_a t_b = \frac{1}{2}~\delta_{ab}, \end{equation}\] we find \[\begin{equation} \label{YM_Lagr_2} \begin{split} \mathcal{L}_{YM} &= -\frac{1}{4g_{YM}^2} F^a_{\mu\nu} F^{a~\mu\nu}~. \end{split} \end{equation}\] \(g_{YM}\) is called the Yang-Mills coupling constant 30 and controls the strength of the interactions. (To see that, it helps to rescale \(A_\mu \to g_{YM} A_\mu\).)

It turns out that there is a second gauge invariant term that one can add to the action. It is the theta term \[\begin{equation} \label{theta_term} \begin{split} S_{\theta}[A] &= \int d^4x ~\mathcal{L}_{\theta}~,\\ \mathcal{L}_{\theta} &= \frac{\theta}{16\pi^2} tr(F_{\mu\nu} \tilde F^{\mu\nu})~, \end{split} \end{equation}\] where \(\theta\) is called the theta angle, and \[\begin{equation} \label{Fdual} \tilde F^{\mu\nu} :=\frac{1}{2} \epsilon^{\mu\nu\rho\sigma} F_{\rho\sigma} \end{equation}\] is the dual field strength. In \(\eqref{Fdual}\), \(\epsilon^{\mu\nu\rho\sigma}\) is the completely antisymmetric tensor in four indices, with \(\epsilon^{0123}=1\).

To summarize, the most general gauge invariant action (with two derivatives) which contains a kinetic term for the non-abelian gauge field \(A_\mu\), as well as interaction terms, is \[\begin{equation} \label{YM+theta} \begin{split} S_{\rm gauge}[A] &=S_{YM}[A] + S_{\theta}[A]~,\\ \mathcal{L}_{\rm gauge} &= \mathcal{L}_{YM}+\mathcal{L}_{\theta} = -\frac{1}{2g_{YM}^2} tr(F_{\mu\nu} F^{\mu\nu}) + \frac{\theta}{16\pi^2} tr(F_{\mu\nu} \tilde F^{\mu\nu})~. \end{split} \end{equation}\]

4.4.

  1. Express the Lagrangian density \(\mathcal{L}_{\rm gauge}\) in

    terms of \(A_\mu^a\) and the structure constants \(f_{ab}{}^c\), and identify quadratic terms involving derivatives of the gauge field, and cubic and quartic terms in \(A_\mu\), which represent interactions.

  2. Show that the theta term \(\eqref{theta_term}\) can be written as a surface (or ‘boundary’) term: \[\begin{equation} \label{theta_term_tot_der} \begin{split} S_{\theta} &= \frac{\theta}{8\pi^2} \int d^4x ~\partial_\mu K^\mu~,\\ K^\mu &= \epsilon^{\mu\nu\rho\sigma} tr(A_\nu \partial_\rho A_\sigma - \frac{2i}{3} A_\nu A_\rho A_\sigma)~. \end{split} \end{equation}\]

  3. Show that the equations of motion (EoM) obtained from the action \(S_{\rm gauge}\) are \[\begin{equation} \label{EoM_nonab_gauge_field} D_\mu F^{\mu\nu} \equiv \partial_\mu F^{\mu\nu}-i[A_\mu,F^{\mu\nu}] = 0~. \end{equation}\]

  4. Show, without using the EoM, that the Bianchi identity \[D_\mu \tilde F^{\mu\nu} = 0~.\] holds.

If in addition to the gauge field \(A_\mu\) there are also charged fields \(\phi\) transforming in a representation \(\mathbf{r}\) (reducible or irreducible), then we can write a gauge invariant action for them using covariant derivatives. For instance for \(G=SU(N)\), we have \[\begin{equation} \label{S_matter} \begin{split} S_\text{matter}[\phi,\phi^\dagger,A] &= \int d^4x ~\mathcal{L}_\text{matter}~,\\ \mathcal{L}_\text{matter} &= - (D_\mu\phi)^\dagger D^\mu\phi - V(\phi, \phi^\dagger)~, \end{split} \end{equation}\] where we require the scalar potential \(V\) to be gauge invariant, that is, \(V \mapsto V\) under non-abelian gauge transformations. This generalizes to other classical groups \(G\) by using the appropriate inner product in the kinetic term.

4.5. Consider the action \[S[\phi,\bar\phi,A] = S_{YM}[A] + S_{\theta}[A] + S_{\rm matter}[\phi,\bar\phi,A] ~.\]

  1. Show that the EoM are \[\begin{equation} \label{EoM_nonab} \begin{split} D_\mu D^\mu \phi &= \frac{\partial V}{ \partial\phi^\dagger} \\ D_\nu F^{\mu\nu} &= g^2_{YM} J^\mu \end{split} \end{equation}\] for a current \(J_\mu = J_\mu^a t_a\) that you should find.

  2. Show that under a gauge transformation the current \(J^\mu\) transforms as \[\begin{equation} \label{gauge_transfo_J} J^\mu \mapsto g J^\mu g^{-1}~, \end{equation}\] and that \(J^\mu\) is covariantly conserved, namely \[\label{DJ=0} D_\mu J^\mu = 0~.\]

4.3 A brief look at the Standard Model*

The Standard Model of elementary particle physics, which has been surprisingly succesful in decribing elementary particle interactions ever since its inception in the 60s is a gauge theory with gauge group31 \[G_{SM} = U(1) \times SU(2) \times SU(3)\,\]. The reason field theories have some relevance in particle physics is that quanta of fields are (quantum) particles. Roughly speaking, you can associate a type of particle with every field; if you want to learn more you’ll have to take a course on quantum field theory, such as AQT.

As we have seen, a gauge theory implies the existence of gauge fields which generalize electric and magnetic fields, so we can think of them as mediating a force. You can think about \(U(1) \times SU(2)\) as being the gauge groups of electromagnetism and the ’weak force’ which is responsible e.g. for \(\beta\) decay. However, it turns out that the \(U(1)\) factor is not identical to the \(U(1)\) of electromagnetism, more about this later. The \(SU(3)\) factor gives rise to a force known as the ’strong force’ which binds quarks together in Baryons such as protons and neutrons, and also protons and neutron into atomic nuclei.

What makes this theory so beautiful is that all we need to do to define it is state the gauge symmetry (we did that already) and which charged matter fields we have and in which representations of \(G_{SM}\) they live. Writing down the most general Langrangian (to lowest order) then gives the Standard Model Lagrangian up to fixing free parameters by experiment. For the sake of simplicity we will discuss the ’classical’ version without neutrino masses which it turns out has 19 free parameters.

The charged particles are \(q_{L i}\), \(u_{R i}\) \(d_{R i}\), \(\ell_{L i}\), \(e_{R i}\) for \(i=1,2,3\) which are all (left/right-handed) Weyl Fermions, the label \(i\) is called the ’generation’ and a single complex scalar \(H\). These transform in the following representations (please ignore the last row for now): \[\begin{array}{c||c|c|c|c|c|c} & q_{L i} & u_{R i}&d_{R i}&\ell_{L i} & e_{R i} & H \\ \hline \hline U(1)_h & \tfrac13 & \tfrac43 & -\tfrac23 & -1 & -2 & 1 \\ SU(2) & 2 & - & - & 2 & - & 2 \\ SU(3) & 3 & 3 & 3 & - & - & - \\ \hline \hline U(1)_{EM} & \begin{pmatrix} \tfrac23 \\ -\tfrac13 \end{pmatrix} & \tfrac23 & -\tfrac13 & \begin{pmatrix} 0 \\ -1 \end{pmatrix} & -1 & 0 \end{array}\, .\]

Here we have given the \(U(1)\) charge for each one of them and \(2\) and \(3\) indicate they transform under the defining representation of \(SU(2)\) or \(SU(3)\). It is standard terminology to use \(1\) to indicate a singlet under e.g. \(SU(2)\), but I find this confusing when comparing with \(U(1)\) charges and use a dash \(-\) to indicate they do not transform at all. Hence e.g. \(\ell_{L i}\) has two components, as appropriate for the defining rep. of \(SU(2)\) and \(q_{L i}\) has 6 components as it both transforms as a \(2\) under \(SU(2)\) and \(3\) under \(SU(3)\).

\(q_{L i}\), \(u_{R i}\) \(d_{R i}\) describe the six quarks, up, down for \(i=1\), strange, charm for \(i=2\) and bottom and top for \(i=3\), and \(\ell_{L i}, e_{R i}\) the leptons: electron and electron-neutrino for \(i=1\), muon and muon-neutrino \(i=2\) as well as tau and tau-neutrino \(i=3\).

You’ll notice several things right away:

  1. The \(SU(2)\) only talks to left-handed Weyl spinors but not right-handed Weyl spinors. This is the origin of parity violation in nature, first demonstrated in \(\beta\) decay by Chien-Shiung Wu in 1956.

  2. Only quarks participate in the strong interactions.

  3. The \(U(1)\) charges are not all integers, which seems to contradict our statements regarding \(U(1)\) representations. However, this normalization has historical reasons and we can appropriately rescale the generator of \(U(1)\) to make these all integers.

We can now write down kinetic terms for all of the gauge fields and charged particles in the usual way. The covariant derivative of \(q_{L i }\) is e.g. \[D_\mu q_{L i} = \left(\partial_\mu - i \frac{1}{3} (A_h)_\mu - i W_\mu - i g_\mu \right) q_{L i}\] where \((A_h)_\mu\) is the gauge field of \(U(1)\), \(W_\mu\) of \(SU(2)\) (3 actually) and \(g_\mu\) of \(SU(3)\) (8 actually).

For \(H\) we have the possibility of writing down a potential term in \(\mathcal{L}\): \[V(H) = - m |H|^2 + \lambda |H|^4\] Note that \(H\) is actually two complex fields as it lives in the \(2\) of \(SU(2)\) and that \(|H|^2= \bar{H}_i H_i\). It turns out that the right physics emerges when \(m,\lambda\) are both positive. In this case the vacua of \(H\) are described by \[|H|^2 = m/\lambda\] which is non-zero. The set of options to solve this equation is gauge-invariant, but any given choice is not invariant under all elements in \(U(1)_h \times SU(2)\): this is called spontaneous symmety breaking; \(H\) is the Higgs field. This type of symmetry breaking, where the action is invariant under a symmetry, but the vacuum (or ground state) is not, is called spontaneous symmetry breaking in physics.

If a continuous internal global symmetry is spontaneously broken, then there is a massless scalar field (called Nambu-Goldstone bosons) for each spontaneously broken symmetry generator If the symmetry is gauged, as it is here, the would-be Nambu-Goldstone bosons are not physical as they can be absorbed by a gauge transformation, but the gauge fields associated to the spontanously broken gauge symmetry gain a mass, which is otherwise forbidden by gauge invariance. This is called the (Anderson-Brout-Englert-Guralnik-Hagen-) Higgs (-Kibble) mechanism

One way to see the mass of the gauge bosons is that after fixing a background value \(|H|^2 = m/\lambda\) the kinetic term for \(H\) gives (schematically) \[\overline{D_\mu H} D^\mu H \rightarrow W_\mu W^\mu m/\lambda\] which is not gauge invariant and in fact gives an (otherwise forbidden) mass to the particles corresponding to three out of the four gauge fields \(W_\mu^a\) and \((A_h)_\mu\).

The surviving combination is \[t_{EM} = t_3^{SU(2)} + \tfrac12 t_h\] and generates the \(U(1)\) associated with electromagnetism. Here \(t_3^{SU(2)}\) is the 3rd generator of \(SU(2)\). The charges our particles have under this \(U(1)\) are given in the last row of the table above. Due to it causing symmetry breaking, three of the four real degrees of freedom in \(H\) become longitudinal components of the three \(W_\mu\), the fourth is a field corresponding to a massive real scalar particle, the Higgs, which was finally found at the LHC roughly 50 years after its prediction in 2012.

The spontaneous symmetry breaking has another effect. Recall that a mass term \(m \bar{\Psi} \Psi\) for a Dirac fermion reads \(m(\bar{\Psi}_L \Psi_R+ c.c.)\) in terms of Weyl spinors. However such terms are forbidden in the Standard Model as such a term would not be gauge invariant. However, we can write things such as \[\bar{L}_L H e_R + c.c.\] as the Higgs \(H\) is a \(2\) of \(SU(2)\). These are called Yukawa couplings and we can write them for all quarks and leptons. After \(H\) gets its background value the above becomes something like \[\sqrt{m/\lambda} \, , \bar{e}_L e_R + c.c.\] which ends up giving the electron a mass, so that we might describe it in terms of a Dirac spinor. In fact, the most general thing we can write in the quark sector is \[q_{L i} \tilde{H} u_{Rj} f_{u\, ij} + q_{L i} H d_{R j} f_{d\, ij}\] which is all fine by gauge invariance. The fact that the matrices \(f_{u\, ij} \neq f_{d\, ij}\) leads to quark mixing a.k.a. the CKM matrix. For three generations (and no less) this ends up causing CP violation, which is one of the necessary conditions for matter-antimatter asymmetry in the universe.

4.4 The ’t Hooft-Polyakov monopole*

In 1974 Gerard ’t Hooft and Aleksandr M. Polyakov discovered that nonabelian gauge theories with scalar fields transforming in the adjoint representation admit smooth magnetic monopoles as static finite energy solutions of their equations of motion .

The field theory of interest is the so-called Georgi-Glashow model (or \(SU(2)\) adjoint Higgs model) a field theory in three space and one time dimension, with \(G=SU(2)\) gauge group, a scalar field \(\Phi\) transforming in the (3-dimensional) adjoint representation, which we represent as a \(2\times 2\) traceless hermitian matrix. The Lagrangian density is \[\begin{equation} \label{GeorgiGlashow} \begin{split} \mathcal{L}&= -\frac{1}{2g^2_{YM}} tr(F_{\mu\nu}F^{\mu\nu}) - tr((D_\mu\Phi)(D^\mu\Phi))-V(\Phi)~,\\ V(\Phi) &= \lambda \left(\frac{1}{2}tr(\Phi^2)-v^2 \right)^2~, \end{split} \end{equation}\] where \(\lambda, v>0\) are constants and \[\begin{split} F_{\mu\nu} &= \partial_\mu A_\nu - \partial_\nu A_\mu - i [A_\mu,A_\nu]\\ D_\mu\Phi &= \partial_\mu\Phi-i [A_\mu,\Phi]~. \end{split}\]

We can calculate the Hamiltonian (or energy) density \(\mathcal{H}\) as the Legendre transform of the Lagrangian density \(\mathcal{L}\), and from it the total energy \(E=\int d^3x ~\mathcal{H}\) of the system, which is by construction gauge invariant (as should be the case for all physically observable quantities). We will be interested in static field configurations, so we can drop all time derivatives \(\partial_0\). It is then convenient to work in the temporal gauge \(A_0=0\), which we can always achieve by a suitable gauge transformation, so that we can drop all time covariant derivatives \(D_0\). In the temporal gauge, the energy of static field configurations is \[\begin{equation} \label{energy_GeorgiGlashow} E = \int d^3 x~\left[ \frac{1}{g_{YM}^2} tr(B_i B_i) +tr((D_i\Phi)(D_i\Phi)) +V(\Phi) \right]~, \end{equation}\] where \(B_i = \frac{1}{2}\epsilon_{ijk}F_{jk}\) are the components of the nonabelian magnetic field \(\boldsymbol{B}\). \(i=1,2,3\) runs over spatial Euclidean indices (which we write up or down since the spatial metric is \(\delta_{ij}\)), and as usual repeated indices are summed over.

The energy is the integral of a sum of squares, and is minimized by setting \[\begin{equation} \label{vacua_GeorgiGlashow} \boldsymbol{B} =0~, \qquad \boldsymbol{D} \Phi =0~, \qquad tr(\Phi^2)=2v^2~. \end{equation}\] The first vector equation tells us that \(F_{ij}=0\), so the vector potential \(\boldsymbol{A}=(A_1,A_2,A_3)\) is ‘pure gauge’: \(A_j = i h (\partial_j h^{-1})=-i (\partial_j h) h^{-1}\) for a function \(h(\boldsymbol{x})\) which takes values in \(SU(2)\). The second vector equation tells us that the adjoint scalar field \(\Phi\) is covariantly constant. The final scalar equation tells us that \(\Phi\) minimizes the scalar potential. By a gauge transformation we can set \(\boldsymbol{A}=0\), then the second equation sets \(\Phi\) to be constant. Letting \(\Phi=\phi^a \sigma_a\), where \((\sigma_a)\) are the Pauli matrices, we find that \[\begin{equation} \label{vacua_GeorgiGlashow2} tr(\Phi^2)=2v^2 \qquad \Leftrightarrow \qquad (\phi^1)^2 + (\phi^2)^2 + (\phi^3)^2 = v^2~, \end{equation}\] so the vacuum manifold is a 2-sphere of radius \(v\): \[\begin{equation} \label{vacuum_manifold_GG} \begin{split} \mathcal{V}&= \{ \Phi = \phi^a \sigma_a \in su(2) ~|~ tr(\Phi^2) = 2v^2\}\\ &= \{ \boldsymbol{\phi}=(\phi^1,\phi^2,\phi^3) \in \mathbb{R}^3~|~ \boldsymbol{\phi}^2 = v^2 \}\cong S^2~. \end{split} \end{equation}\] By a constant gauge transformation, we can take \[\begin{equation} \label{vacuum_z} \Phi = \begin{pmatrix} v & 0\\ 0 & -v \end{pmatrix} = v \sigma_3 \qquad \boldsymbol{\phi }= (0,0,v)~. \end{equation}\] Any choice of vacuum breaks the gauge group \(G=SU(2)\) down to a subgroup \(H=U(1)\) which leaves the vacuum invariant.

In order for the energy \(\eqref{energy_GeorgiGlashow}\) to be finite, we demand the boundary conditions \[\begin{equation} \label{BC_GeorgiGlashow} \boldsymbol{B} \to 0~, \qquad \boldsymbol{D}\Phi\to 0~, \qquad tr(\Phi^2)\to 2v^2 \qquad \text{as}~~ |\boldsymbol{x}|\to \infty~, \end{equation}\] so the fields must tend to a vacuum at spatial infinity. Note: this can be a different vacuum for each direction. As in the abelian Higgs model, we can use the gauge redundancy to work in a radial gauge, where \(A_r=0\). Then the limits of the fields as \(r\to\infty\) with \((\theta,\varphi)\) fixed exist. In particular, the limit of the adjoint scalar field at spatial infinity defines a map \[\begin{equation} \label{Phi_infty_GG} \begin{tabular}{rccl} $\Phi_\infty$: & $S^2_\infty$ & $\to$ & $\mathcal{V}\cong S^2$\\ & $(\theta,\varphi)$ & $\mapsto$ & $\Phi_{\infty}(\theta,\varphi):=\lim\limits_{r\to\infty} \Phi(r,\theta,\varphi)$~, \end{tabular} \end{equation}\] which is characterized by an integer, the topological degree of the map, which is a generalization of the winding number for maps from \(S^1\) to \(S^1\): 32 \[\begin{equation} \label{degree_S2} \nu = \frac{1}{8\pi v^3} \int_{S^2_\infty} \epsilon_{ijk} \boldsymbol{\phi}_{\infty} \cdot(\partial_j \boldsymbol{\phi}_{\infty} \times \partial_k \boldsymbol{\phi}_{\infty})~d^2 \sigma_i~, \end{equation}\] where \(\boldsymbol{\phi}_\infty=((\phi_\infty)^1,(\phi_\infty)^2,(\phi_\infty)^3)\). Note: the prefactor of \(v^{-3}\) is there because the target (image) of \(\boldsymbol{\phi}_\infty\) is a 2-sphere of radius \(v\).

4.6. Define \[F_{\mu\nu}^{U(1)}:= \frac{1}{2v} tr(\Phi_\infty F_{\mu\nu})\] to be the field strength of the unbroken \(H=U(1)\) subgroup of the gauge group \(G=SU(2)\). Show that the magnetic charge \[\label{magnetic_charge_unbroken_U(1)} m^{U(1)}:=\frac{1}{2\pi} \int_{S^2_\infty} \boldsymbol{B}^{U(1)}\cdot d^2 \vec\sigma\] of this unbroken \(U(1)\) is proportional to the topological degree \(\nu\) of \(\Phi_\infty\), and find the proportionality factor.

As an example, the map \[\begin{equation} \label{identity_hedgehog} \Phi_\infty=v~\hat x\cdot \boldsymbol{\sigma}~, \end{equation}\] where \(\hat x = \boldsymbol{x}/|\boldsymbol{x}|= \boldsymbol{x}/r\) and \(\boldsymbol{\sigma}=(\sigma_1,\sigma_2,\sigma_3)\), has degree \(\nu=1\). This is the identity map from \(S^2\) to \(S^2\), up to an overall constant factor that takes care of the radius of the target sphere. We note incidentally that we can write \(\eqref{identity_hedgehog}\) as \[\Phi_\infty = v e^{-i\alpha} \sigma_3 e^{i\alpha}~\] with \[\alpha = \frac{\theta}{2}(-\sin \varphi ~\sigma_1 +\cos \varphi~ \sigma_2) = \frac{\theta}{2} e^{-i\varphi \sigma_3/2} ~\sigma_2~ e^{i\varphi \sigma_3/2}~.\] So \(\Phi_\infty\) reduces to the constant vacuum with \(\Phi=v \sigma_3\) in \(\eqref{vacuum_z}\), if we perform a gauge transformation with parameter \(g=e^{i\alpha}\). Note however that this gauge transformation is singular at \(\theta=\pi\), the south pole of the 2-sphere, where \(\varphi\) is ill-defined. (The gauge transformation is regular at the north pole \(\theta=0\), thanks to the \(\theta\) prefactor in \(\alpha\). This statement can be checked by switching to local coordinates which are well-defined at either pole.)

We are now ready to introduce the ’t Hooft-Polyakov ‘hedgehog’ ansatz, so called because the vector field \(\boldsymbol{\phi}\) points in the radial direction and looks a bit like a hedgehog. We assume that the adjoint scalar and the gauge field (written as a matrix-valued differential \(A=A_\mu dx^\mu\)) take the form \[\begin{equation} \label{tHooft-Polyakov_ansatz} \begin{split} \Phi &= \frac{\boldsymbol{x} \cdot \boldsymbol{\sigma}}{r^2} H(vr)\\ A &= \sigma_a \epsilon_{aij} \frac{x_i dx_j}{r^2} \left[1-K(vr)\right]~. \end{split} \end{equation}\] Note that the dependence on the angular polar coordinates in space \(\mathbb{R}^3\) is correlated with the behaviour in the internal space in which the fields take values. We also assume the asymptotics \[\label{tH-P asymptotics_infty} \underline{\xi\equiv vr \to \infty:} \qquad H(\xi)-\xi \to 0~, \qquad K(\xi)\to 0~\] at spatial infinity, to satisfy the boundary conditions \(\eqref{BC_GeorgiGlashow}\) which are needed for the energy to be finite, 33 and \[\label{tH-P asymptotics_0} \underline{\xi\equiv vr \to 0:} \qquad H(\xi)= O(\xi)~, \qquad K(\xi)-1= O(\xi)~\] to ensure regularity (smoothness) at the centre of the monopole, and finiteness of the energy at short distances from the centre.

Note that the adjoint scalar field approaches \(\eqref{identity_hedgehog}\) at spatial infinity, which has topological degree 1. The magnetic field also approaches an abelian magnetic monopole for the unbroken gauge group \(H=U(1)\) at spatial infinity. Indeed, if one applies the above singular gauge transformation, the gauge field \(A_\mu^{U(1)}\) looks precisely like a Dirac monopole in the northern patch (or for \(\theta\neq \pi\)). One can find an analogous singular gauge transformation to obtain the Dirac monopole in the southern patch (or for \(\theta\neq 0\)).

One can substitute the ’t Hooft-Polyakov ansatz \(\eqref{tHooft-Polyakov_ansatz}\) in the equations of motion, to find a system of two coupled ODE’s for the functions \(H(\xi)\) and \(K(\xi)\). Together with the boundary conditions \(\eqref{tH-P asymptotics_infty}\)-\(\eqref{tH-P asymptotics_0}\), this defines a well-posed boundary value problem which can be solved numerically. This shows the existence of a finite energy static solution which describes a magnetically charged object of finite size.

We can use what is called a Bogomol’nyi-type argument to find a lower bound for the energy in each topological sector, namely for field configurations with given topological degree for the adjoint scalar, or equivalently magnetic charge for the unbroken \(U(1)\) gauge field. This is called the Bogomol’nyi-Prasad-Sommerfield (or BPS) bound . The idea is to write \[\begin{equation} \label{Bogo_bound_monopoles} \begin{split} E &= \int d^3 x~ \left[ \frac{1}{g^2_{YM}} tr(\boldsymbol{B}^2) + tr((\boldsymbol{D} \Phi)^2) + V(\Phi) \right]\\ &\ge \int d^3x~ tr\left(\left(\frac{1}{g_{YM}}\boldsymbol{B} \mp \boldsymbol{D} \Phi \right)^2 \pm \frac{2}{g_{YM}} \boldsymbol{B} \cdot \boldsymbol{D}\Phi \right)\\ &\ge \pm \frac{2}{g_{YM}} \int d^3x~ tr(\boldsymbol{B} \cdot \boldsymbol{D}\Phi) = \pm \frac{2}{g_{YM}} \int d^3x~ tr(\boldsymbol{D} \cdot (\Phi \boldsymbol{B}))\\ &= \pm \frac{2}{g_{YM}} \int d^3x~ \nabla \cdot tr(\Phi \boldsymbol{B}) = \pm \frac{2}{g_{YM}} \int_{S^2_\infty} tr(\Phi_\infty \boldsymbol{B})\cdot d^2 \boldsymbol{\sigma }\\ &= \pm \frac{4v}{g_{YM}} \int_{S^2_\infty} \boldsymbol{B}^{U(1)}\cdot d^2 \boldsymbol{\sigma }= \pm \frac{8\pi v}{g_{YM}} m^{U(1)} ~. \end{split} \end{equation}\] Going from the first to the second line, we dropped the contribution of the (non-negative) potential energy and completed a square. We then dropped the square to get to the third line, and then used the Bianchi identity \(\boldsymbol{D} \cdot \boldsymbol{B}=0\). Going to the fourth line we took the gauge covariant divergence outside the trace, and replaced it by a standard divergence since the trace is gauge invariant. Then we used Gauss’ theorem (aka divergence theorem) to rewrite the lower bound as a surface integral, which in the last line we related to the magnetic charge of the unbroken \(H=U(1)\) subgroup of the gauge group, defined in \(\eqref{magnetic_charge_unbroken_U(1)}\). We have deduced the BPS bound \[\begin{equation} \label{Bogo_bound_monopoles_2} E \ge \frac{8\pi v}{g_{YM}} |m^{U(1)}|~, \end{equation}\] which is a lower bound for the energy in terms of the magnetic charge.

The bound is saturated, that is \(E=\frac{8\pi v}{g_{YM}} |m^{U(1)}|\), if and only if \[\lambda \to 0 \quad \text{keeping~}v~ \text{fixed}~,\] which is called the BPS limit, and the fields satisfy the 1st order Bogomol’nyi equation \[\begin{equation} \label{BPS_equation} \boldsymbol{B} = \mathrm{sign}(m^{U(1)}) g_{YM} \boldsymbol{D} \Phi. \end{equation}\] Solutions to the Bogomol’nyi equations for monopoles come in infinite families, parametrized by continuous parameters also known as moduli. For \(G=SU(2)\), the moduli space of \(n\) BPS monopoles (solutions of the Bogomol’nyi equations with total magnetic charge \(m^{U(1)}=n>0\)) has \(4n\) real dimensions.

5 Bundles, connections, curvature and sections*

This is a bonus chapter that sketches some of the differential geometry that underlies gauge theories. We won’t have time in the lectures for this advanced material, which is best learned in a different module. I include it here for completeness for students who would like to learn more. This material will not be examined.

So far we have learned how to formulate gauge theories in terms of gauge invariant actions for the gauge field and (potentially) charged fields. Our goal in this chapter will be understand how to describe gauge transformations, gauge fields, their field strengths, and charged fields geometrically. We will learn about fibre bundles, which are a consistent way of adding extra structure on top of a differentiable manifold.

I should warn you that the general formal definition is quite abstract, but I will try to build towards it slowly by successive generalizations. At the beginning I will give you a flavour of the abstract “intrinsic” approach, which defines concepts without making reference to a coordinate system. This can be hard to grasp, and this is not a course on differential geometry, so we will spend most of our time working in the “extrinsic” approach, which uses local coordinates. The extrinsic approach has the disadvantage that one needs to make sure that no definitions depend on the choice of coordinates used, but the advantage of being more explicit and accessible to beginners. This will be more than sufficient for our purposes.

This chapter is largely based on lectures 2 and 5 in Ooguri’s lecture course on Mathematics for Theoretical Physicists . Other references which cover the same material in more detail are .

5.1 The tangent bundle

The basic data of a differentiable manifold.

Recall the definition of a differentiable manifold \(M\) (of dimension \(n\)) from the first term, see figure 5.1. It consists of a countable atlas \(\{(U_i,\varphi_i)_{i \in I}\}\) of coordinate charts (or patches) \((U_i,\varphi_i)\), where \(U_i\) is an open subset of \(M\), \(\varphi_i: U_i \to \mathbb{R}^n\) is an invertible map from \(U_i\) to an open subset of \(\mathbb{R}^n\), and \(M=\bigcup_{i\in I} U_i\). Given a point \(p \in M\), its image under \(\varphi_i(p)=(x^1_{(i)},\dots,x^n_{(i)})\) under \(\varphi_i\) gives the coordinates of point \(p\) in the patch \(U_i\). We refer to these as local coordinates. If two patches \(U_i\) and \(U_j\) overlap on \(U_i \cap U_j \neq \emptyset\), then we can use two sets of coordinates. For any pair of overlapping patches, We require the transition functions \[\varphi_j \circ \varphi_i^{-1}: ~~ \varphi_i(U_i \cap U_j ) \to \varphi_j(U_i \cap U_j ) ~,\] which are invertible, to be smooth. This makes \(M\) a differentiable manifold.

Next we give the intrinsic definition of a differentiable (real) function. A function \[\begin{equation} \label{differentiable_fn} \begin{tabular}{rccl} $\hat f$: & $M$ & $\to$ & $\mathbb{R}$\\ & $p$ & $\mapsto$ & $\hat f(p)$ \end{tabular} \end{equation}\] is differentiable (/smooth) if for all charts \((U_i,\varphi_i)\), its extrinsic expression in local coordinates \[\begin{equation} \label{differentiable_fn_coords} \begin{tabular}{rccl} $f_{(i)}:= \hat f \circ \varphi_i^{-1} $: & $\varphi_i(U_i)$ & $\to$ & $\mathbb{R}$\\ & $x_{(i)}=(x^1_{(i)},\dots,x^n_{(i)})$ & $\mapsto$ & $f_{(i)}(x_{(i)})$ \end{tabular} \end{equation}\] is a differentiable/smooth function of \(n\) real variables. The requirement that the transition functions \(\varphi_j \circ \varphi_i^{-1}\) of a differentiable manifold are smooth ensures that if \(f\) is smooth in one set of local coordinates, it is smooth in all sets of local coordinates. We denote the set of smooth function on \(M\) by \(C^\infty(M)\).

In the following, to avoid cluttering the notation, we will drop the subscripts which label the different patches, unless they are strictly necessary. Note that we have used hats to distinguish the intrinsically defined value \(\hat f(p)\) of the function at a point in the manifold from its extrinsic description \(f(x)=(\hat f \circ \varphi^{-1})(x)\) in terms of local coordinates \(x=\varphi(p)\) in a coordinate chart \((U,\varphi)\).

Last term you defined tangent vectors to a curve \(C\) at a point \(p\) in the manifold \(M\). You saw that the set of tangent vectors to all curves passing through the point \(p\) is an \(n\)-dimensional real vector space, which is the tangent space \(T_p M\) of the manifold \(M\) at point \(p\). Next, we would like to extend this construction from a single point \(p\) to the whole manifold \(M\). Informally, we would like to define \[\begin{equation} \label{tangent_bundle_as_union} TM = \bigcup_{p\in M} T_p M~, \end{equation}\] a “bundle” of the tangent spaces at all the points in the manifold. This is called the tangent bundle \(TM\) of \(M\). The question is: how do we define this object properly? To gain intuition, it is useful to to take an equivalent but complementary view of tangent vectors. (We will see how this is related to the definitions that you saw last term below.)

We define a tangent vector field \(v\) on \(M\) as a map \[\begin{equation} \label{tangent_vector_field} \begin{tabular}{rccl} $\hat v$: & $C^\infty(M)$ & $\to$ & $C^\infty(M)$\\ & $\hat f$ & $\mapsto$ & $\hat v(\hat f)$ \end{tabular} \end{equation}\] which obeys the following two properties:

  1. linearity: \(\forall a_1, a_2 \in \mathbb{R}\), \(\forall \hat f_1, \hat f_2 \in C^\infty(M)\), \[\begin{equation} \label{linearity} \hat v(a_1 \hat f_1 + a_2 \hat f_2) = a_1 \hat v(\hat f_1)+a_2 \hat v(\hat f_2) \end{equation}\]

  2. Leibniz rule: \(\forall \hat f, \hat g \in C^\infty(M)\), \[\begin{equation} \label{Leibniz} \hat v(\hat f \hat g) = \hat v(\hat f)\hat g + \hat f \hat v(\hat g)~. \end{equation}\]

Tangent vector fields form a vector space, more about this later.

\(\ast\) EXERCISE:

20pt0pt

Let \(\hat v, \hat w\) be tangent vector fields.

  1. Show that \(\hat w \circ \hat v\) is not a tangent vector field.

  2. Show that \([\hat w, \hat v]=\hat w \circ \hat v - \hat w \circ \hat v\) is a tangent vector field.

Given a tangent vector field \(\hat v\) on \(M\) and a point \(p\in M\), we can (re-)define a tangent vector \(\hat v_p \in T_p M\) at a point \(p\) by evaluating everything at point \(p\): 34 \[\begin{equation} \label{tangent_vector} \begin{tabular}{rccl} $\hat v_p$: & $C^\infty(M)$ & $\to$ & $\mathbb{R}$\\ & $\hat f$ & $\mapsto$ & $\hat v_p(\hat f) :=\left(\hat v(\hat f)\right)(p)$ \end{tabular} \end{equation}\]

The data needed to define a tangent vector to a curve, applied to a function.

See figure 5.2 for a depiction of the relevant data.

You may ask: how is this definition of tangent vectors related to the definition in terms of tangents to a curve, that you encountered in the first term? Given a smooth curve through \(p\), which is defined by a map from an interval \(I\) to the manifold \(M\), \[\begin{equation} \label{curve} \begin{tabular}{rccl} $c$: & $I \subseteq \mathbb{R}$ & $\to$ & $\mathbb{R}$\\ & $\tau$ & $\mapsto$ & $c(\tau)$ \end{tabular} \end{equation}\] with \(c(0)=p\), we can define a tangent vector \(\hat v_p\) to the curve \(C=c(I)\) by \[\begin{equation} \label{tangent_to_curve} \hat v_p(\hat f) = \frac{d}{d\tau} \hat f(c(\tau))\Big|_{\tau=0}~, \end{equation}\] which is defined intrinsically for all smooth functions \(\hat f \in C^\infty(M)\). See figure 5.2 To understand what is going on, let’s express this in local coordinates \(x^\mu\) in a chart \((U,\varphi)\), where the curve is parametrized by \[(\varphi\circ c)(\tau) \equiv x(\tau) = (x^1(\tau), \dots, x^n(\tau))~,\] and the function \(\hat f(p)\) is represented as \(f(x)=(\hat f \circ \varphi^{-1})(x)\): \[\begin{equation} \label{tangent_to_curve_2} \begin{split} \hat v_p(\hat f) &= \frac{d}{d\tau} (\hat f\circ c)(\tau))\Big|_{\tau=0}= \frac{d}{d\tau} (\hat f\circ \varphi^{-1} \circ \varphi\circ c)(\tau))\Big|_{\tau=0}~\\ &= \frac{d}{d\tau} f(x(\tau))\big|_{\tau=0}= \dot x^\mu(\tau) \frac{\partial f(x)}{\partial x^\mu}\Big|_{x=x(0)=\varphi(p)}~, \end{split} \end{equation}\] where we used basic properties of the composition of functions, as well as the chain rule in the last equality (dots denote derivatives with respect to \(\tau\)). We recognize the result as the directional derivative of the function \(f\) along the tangent to the curve at the point \(p\), which has coordinates \(x=x(0)\).

REMARKS:

  1. When you described the tangent vector to a curve at a point \(p\) using local coordinates in the first term, \(\dot x^\mu(0)\) were the components of the tangent vector.

  2. To construct a basis of the tangent space \(T_p M\), you used curves \(C_a\) which fixed all coordinates \(x^\mu \neq a\) and varied only \(x^a(\tau)=x^a(0) + \tau\). The components of the tangent vector \(e_a\) to such a curve are then \(\dot x^{\mu}(0) = \delta^\mu_a\), and we have \[e_a(\hat f) = \frac{\partial}{\partial x^a} f(x)\bigg|_x=\varphi(p) = \frac{\partial}{\partial x^a} \hat f(\varphi^{-1}(x))\bigg|_x=\varphi(p)~,\] or for short \[\begin{equation} \label{basis_tangent} e_a = (\partial_a)_p~, \end{equation}\] where \((\partial_a)_p\) is \(\frac{\partial}{\partial x^a}\) when we work in local coordinates \(x=\varphi(p)\).

In summary, we can write any tangent vector \(\hat v_p \in T_p M\) intrinsically as \[\begin{equation} \label{tangent_vector_intrins} \hat v_p = \hat v^a (\partial_a)_p~, \end{equation}\] or extrinsically (in local coordinates) as \[\begin{equation} \label{tangent_vector_extrins} v = v^a \frac{\partial}{\partial x^a}~, \end{equation}\] where the components \(\hat v^a = v^a\) are \(n\) real numbers.

Now let’s consider a collection of tangent spaces over every point on \(M\): the tangent bundle \[TM = \bigcup_{p \in M} T_p M~.\] Using the isomorphism \(T_p M \cong \mathbb{R}^n\) for all \(p \in M\), we can view the tangent bundle locally as as \(U_i \times \mathbb{R}^n\). we can see that \(TM\) is naturally a manifold of dimension \(2n\). For each coordinate chart \((U_i,\varphi_i)\) on \(M\), we define coordinates \((x^\mu, v^\nu)\) on \(\bigcup_{p \in U_i} T_p M\), where \((x^\mu)\) are coordinates on \(U_i\), and we parametrize a tangent vector as \[\begin{equation} \label{tangent_vector_TM} v = v^\nu \frac{\partial}{\partial x^\nu}~. \end{equation}\] We call \(M\) the base of the tangent bundle, and \(\mathbb{R}^n \cong T_p M\) the fibre of the tangent bundle. 35

A (smooth) tangent vector field is then (in local coordinates) \[\begin{equation} \label{tangent_vector_field} v = v^\mu(x) \frac{\partial}{\partial x^\mu}~, \end{equation}\] with components \(v^\mu(x)\) which vary smoothly as \(p\) varies over \(M\).

\(\ast\) EXERCISE:

20pt0pt

Check that the local description \(\eqref{tangent_vector_field}\) of a tangent vector field maps smooth functions to smooth functions, is linear, and obeys the Leibniz rule.

We say that a (smooth) tangent vector field \(\eqref{tangent_vector_field}\) is a (smooth) section of the tangent bundle \(TM\), and write \(v \in \Gamma(TM)\). The reason for this terminology is as follows (see figure 5.3:

The tangent bundle and a tangent vector field.

What we have seen so far is a local description of the tangent bundle \(TM\) in a coordinate patch. When we change patch from \(U\) to \(\tilde U\) (on their overlap \(U \cap \tilde U\)) in the base \(M\), the coordinates on \(M\) change as 36 \[\begin{equation} \label{coord_change} x^\mu \mapsto \tilde x^\mu = \tilde x^\mu(x)~. \end{equation}\] In addition, we need to specify how the fibre coordinates change. We require the tangent space coordinates to change like \[\begin{equation} \label{tangent_bundle_trans_fn} v^\mu \mapsto \tilde v^\mu = \frac{\partial\tilde x^\mu}{\partial x^\nu} v^\nu~, \end{equation}\] so that \[\begin{equation} \label{tangent_vector_unchanged} v = v^\mu \frac{\partial}{\partial x^\mu} = \tilde v^\mu \frac{\partial}{\partial\tilde x^\mu} \end{equation}\] is independent of the choice of coordinates.

Using the chain rule, \[\frac{\partial}{\partial x^\mu} = \frac{\partial\tilde x^\nu}{\partial x^\mu} \frac{\partial}{\partial\tilde x^\nu}\quad \longrightarrow \quad v^\mu \frac{\partial}{\partial x^\mu} = v^\mu \frac{\partial\tilde x^\nu}{\partial x^\mu} \frac{\partial}{\partial\tilde x^\nu} = \tilde v^\nu \frac{\partial}{\partial\tilde x^\nu}~.\]

Now recall that every vector space \(V\) has a dual vector space \(V*\), which is the space of linear functionals on \(V\). Given a basis \(e_a\) of \(V\), we can choose a basis \(e^{*a}\) of the dual space \(V^*\) by requiring that \(e^{*a}(e_b) = \delta^a_b\). Then given \(v=v^a e_a \in V\) and \(w=w_a e^{*a}\), we have \(w(v)=w_a v^a\). We can apply these ideas to the tangent space \(T_p M\), and define its dual vector space, the cotangent space \(T_p^* M\). An element \(\omega\) of the cotangent space is a linear functional on the tangent space, \[\begin{equation} \label{cotangent_vector} \begin{tabular}{rccl} $\omega$: & $T_pM$ & $\to$ & $\mathbb{R}$\\ & $v$ & $\mapsto$ & $\omega(v)$ \end{tabular} \end{equation}\] such that for all coefficients \(a_1, a_2 \in \mathbb{R}\) and for all tangent vectors \(v_1, v_2 \in T_p M\), \[\begin{equation} \label{linearity_cotangent} \omega(a_1 v_1 + a_2 v_2) = a_1 \omega(v_1) + a_2 \omega(v_2)~. \end{equation}\]

The dual basis to the basis of partial derivatives \(\left\{ \frac{\partial}{\partial x^\mu}\right\}\) for the tangent space \(T_p M\) is the basis of differentials \(\{ dx^\mu\}\) for the cotangent space \(T_p^*M\), where we require \[\label{dual_bases_TT*} dx^\mu \left( \frac{\partial}{\partial x^\nu}\right) = \delta^\mu_\nu~.\] So we can write any cotangent vector \(\omega \in T_p^*M\) as \[\begin{equation} \label{cotangent_vector} \omega = \omega_\mu dx^\mu~. \end{equation}\] Under a change of coordinates \(\eqref{coord_change}\) on \(M\), we will require that the cotangent space coordinates transform as \[\begin{equation} \label{cotangent_coord_change} \omega_\mu \mapsto \tilde \omega_\mu = \frac{\partial x^\nu}{\partial\tilde x^\mu} \omega_\nu~, \end{equation}\] so that \[\begin{equation} \label{cotangent_vector_unchanged} \omega = \omega_\mu dx^\mu = \tilde \omega_\mu d\tilde x^\mu \end{equation}\] is independent of the choice of coordinates.

\(\ast\) EXERCISE:

20pt0pt

  1. Use the definition \(df(x)= \frac{\partial f(x)}{\partial x^\mu}dx^\mu\) of the differential of a function to show that under a coordinate change \(\eqref{coord_change}\) \[\begin{equation} \label{differential_coord_change} dx^\mu \mapsto d\tilde x^\mu = \frac{\partial\tilde x^\mu}{\partial x^\nu} dx^\nu \end{equation}\] and therefore \[\omega = \omega_\mu dx^\mu \mapsto \tilde \omega = \tilde\omega_\mu d\tilde x^\mu = \omega_\nu dx^\nu = \omega~.\]

  2. Let \(v = v^\mu \frac{\partial}{\partial x^\mu}\in T_pM\) and \(\omega = \omega_\mu d x^\mu\in T_p^*M\). Show that \[\omega(v)=\omega_\mu v^\mu\] and that it is independent of the choice of coordinates: \[\omega_\mu v^\mu = \tilde\omega_\mu \tilde v^\mu~.\]

With all these data we can construct the cotangent bundle \[\begin{equation} \label{cotangent_bundle} T^*M = \bigcup_{p \in M} T_p^* M \end{equation}\] as a collection of cotangent spaces over every point on \(M\). For each coordinate chart \((U_i,\varphi_i)\) on \(M\), we require the cotangent bundle to locally look like \(T^*U_i = \bigcup_{p \in U_i} T_p^* M \cong U_i \times \mathbb{R}^n\), with coordinates \((x^\mu, \omega_\nu)\) for the base and the fibre respectively. Under a change of coordinates \(\eqref{coord_change}\) in the base \(M\), the fibre coordinates change as in \(\eqref{cotangent_coord_change}\), so that \(\omega=\omega_\mu dx^\mu\) is coordinate independent.

A (smooth) cotangent vector field is, in local coordinates, \[\begin{equation} \label{cotangent_vector_field} \omega = \omega_\mu(x) dx^\mu~, \end{equation}\] where \(\omega_\mu(x)\) are smooth functions. It is a (smooth) section of the cotangent bundle \(T^*M\), and we write \(\omega \in \Gamma(T^*M)\). See figure 5.4

The cotangent bundle and a cotangent vector field.

REMARKS:

  1. In Lagrangian mechanics, the generalised coordinates \(q^i\) and the generalised velocities \(v^j\) are coordinates on the tangent bundle \(TM\) of the configuration space \(M\). The generalised coordinates \(q^i\) are coordinates on the base \(M\), and the generalised velocities \(v^j\) are coordinates on the fibre \(T_p M\). Under time evolution, the trajectory of the generalized coordinates traces a curve \((q^i(t))\) in the configuration space \(M\), while the generalised velocities \((v^j(t))=(\dot q^j(t))\) are the components of the tangent vector \(v=v^j(t) \frac{\partial}{\partial q^j}\) to the trajectory.

  2. In Hamiltonian mechanics, the generalised coordinates \(q^i\) and the generalised momenta \(p_j\) are coordinates on the cotangent bundle \(TM\) of the configuration space \(M\), where we identify \(p_j = \frac{\partial L}{\partial v^j}= \frac{\partial L}{\partial\dot q^j}\). Now \(\theta=p_j(t) d q^j\) is a cotangent vector. The relation between Lagrangian and Hamiltonian can be written as \[H = L - \dot q^i \frac{\partial L}{\partial\dot q^i} = L - \theta(v).\]

5.2 Fibre bundles

We can generalise the previous construction by replacing the tangent space \(T_pM\) or cotangent space \(T_p^*M\) by a more general fibre.

The simplest generalization is the notion of vector bundle \(E\), which consists of a base \(M=\bigcup_i U_i\) (of dimension \(\dim M =n\)) and of a fibre \(F\) which is a fixed vector space \(V\) (of dimension \(\dim V =m\)) over every point in \(M\). Locally, the vector bundle \(E\) looks like \(U_i \times V\), with coordinates \((x,v)\).

Mathematically, a differentiable manifold \(E\) is called a (smooth) vector bundle if:

  1. There exists a projection map \[\begin{equation} \label{proj_map_VB} \pi: \quad E \to M \end{equation}\] such that \[\begin{equation} \label{proj_map_VB_2} \forall p \in M \qquad \pi^{-1}(p) \cong V~, \end{equation}\] where \(V\) is a vector space.

  2. There exist atlases of \(E\) and of \(M\) such that for all charts of \(M\) there exists a smooth map \[\begin{equation} \label{local_trivialisation} \varphi: \quad \pi^{-1}(U) \to U \times V~, \end{equation}\] which is called a local trivialisation of the vector bundle \(E\) over \(M\).

Part 1 is a way of saying that the base \(M\) is part of the total space \(E\), and that for each point in \(M\) we have a vector space \(V\). Part 2 means that we can use local coordinates \((x,v)\) for \(E\), where \(x\) is a local coordinate for a point \(p\) in the base \(M\), and \(v\) is a local coordinate of the fibre, the vector space \(\pi^{-1}(p)\) associated to the point \(p\). The structure of a vector bundle is summarized in Figure 5.5.

Schematic depiction of a vector bundle.

(To be precise, the vector bundle is the collection \((E,M,\pi,V)\) of the total space \(E\), the base \(M\) which is obtained by the projection map \(\pi\), and the fibre \(V\), which is the preimage of a point in the base under the projection map.)

To fully specify the vector bundle when we work in local coordinates, we need to state what happens to the fibre coordinates when we change coordinates in the base, from a neighbourhood \(U\) with coordinates \(x\) to a naighbourhood \(\tilde U\) with coordinates \(\tilde x\). The change of coordinates in the base and the fibre is \[\begin{equation} \label{coord_change_VB} \begin{split} x^\mu & \mapsto \tilde x^\mu=\tilde x^\mu(x)\\ v &\mapsto \tilde v = t(x) v~, \end{split} \end{equation}\] where the transition function for the fibre is an \(x\)-dependent invertible linear transformation: 37 \[\begin{equation} \label{trans_func_VB} t(x) \in GL(V)\equiv GL(m,\mathbb{R}) \end{equation}\]

Triple overlap and coordinates in local trivialisations of a vector bundle.

There is a consistency condition associated to triple overlaps \(U_i \cap U_j \cap U_k\), which ensures the uniqueness of the vector bundle. See Figure 5.6. Let \((x_i,v_i)\) be local coordinates in \(U_i \times V\), and likewise for \(j\) and \(k\), and \(t_{j \leftarrow i}(x_i)\) be the transition function for the fibre when we switch to the \(i\)-th trivialization to the \(j\)-th trivialization, and similarly for other transition functions. 38 Then there are two ways of going from the \(i\)-th trivialization to the \(k\)-th trivialization: we can either go from \(i\) to \(k\) directly, or go from \(i\) to \(j\) and then from \(j\) to \(k\). The results of the two processes are \[\label{compatibility_triple_overlaps} \begin{align} v_k &= t_{k \leftarrow i}(x_i)v_i\\ v_k &= t_{k \leftarrow j}(x_j)v_j= t_{k \leftarrow j}(x_j)t_{j \leftarrow i}(x_i) v_i~. \end{align}\] Demanding the compatibility of the two expressions for every vector \(v_i\) leads to the cocycle condition \[\begin{equation} \label{cocycle_cond} t_{k \leftarrow i}(x_i)=t_{k \leftarrow j}(x_j(x_i))t_{j \leftarrow i}(x_i)~. \end{equation}\] It can be proven that there are no further compatibility conditions associated to quadruple or higher overlaps.

\(\ast\) EXERCISE:

20pt0pt Show that the transition functions for the tangent bundle \(TM\) and the cotangent bundle \(T^*M\) obey \(\eqref{trans_func_VB}\) and the cocycle condition \(\eqref{cocycle_cond}\).

REMARKS:

  1. Unlike for \(TM\) and \(T^*M\), the transition functions for the fibre of a general vector bundle are independent of the transition functions for the base.

  2. We could take \(\tilde x^\mu=x^\mu\), namely not change coordinates in the base, but still change coordinates in the fibre. Equations \(\eqref{trans_func_VB}\) and \(\eqref{cocycle_cond}\) must still hold.

Vocabulary: A (usually complex) vector bundle with one-dimensional fibre is called a line bundle.

We can generalize the previous structure further if we allow the fibre \(F\) to be a more general object than a vector space. We will restrict ourselves to considering fibres \(F\) which are differentiable manifolds themselves, even though this assumption can be relaxed further. Vector bundles are included as a special case, since a vector space is a differentiable manifold.

A differentiable manifold \(E\) is called a (smooth) fibre bundle if:

  1. There exists a projection map \[\begin{equation} \label{proj_map_FB} \pi: \quad E \to M \end{equation}\] such that \[\begin{equation} \label{proj_map_VB_2} \forall p \in M \qquad \pi^{-1}(p) \cong F~. \end{equation}\]

  2. There exist atlases of \(E\) and of \(M\) such that for all charts of \(M\) there exists a smooth map \[\begin{equation} \label{local_trivialisation_FB} \varphi: \quad \pi^{-1}(U) \to U \times F~, \end{equation}\] which is called a local trivialisation of the fibre bundle \(E\) over \(M\).

The interpretation is the same as for vector bundles, with the exception that the fibre need not be a vector space. In a local trivialisation, we can choose local coordinates \((x,y)\), where \(x\) is a local coordinate on the base \(M\) and \(y\) is a local coordinate on the fibre \(F\). When we change coordinates in the base, the fibre coordinates must change appropriately, and the transition functions for the fibre must obey a cocycle condition.

The transition functions for the fibre are elements of a group, which is called the structure group of the fibre bundle \(E\). 39

EXAMPLE: A principal bundle 40 is a fibre bundle where the fibre is a Lie group, \(F=G\), for example \(G=U(1)\), \(G=SU(2)\) or \(G=SO(3)\). Let \((x,h)\) be coordinates in (the image of) a local trivialization \(U\times G\), and \((\tilde x,\tilde h)\) be coordinates in \(U\times G\), where \(h, \tilde h\) are elements of the group \(G\). We require the transition function \(t(x)\) for the fibre to be a group element itself, \(t(x)=g(x)\in G\) for all \(x\), which acts by group multiplication on the fibre coordinate: \[(x^\mu, h) ~ \mapsto ~ (\tilde x^\mu(x), \tilde h = g(x) h)~.\] So for a principal \(G\)-bundle the fibre is the Lie group \(G\), and the structure group is also \(G\).

REMARKS:

  1. This is called a ‘principal’ bundle because of its importance: it controls the structure of infinitely many vector bundles. Indeed, for each representation \(\mathbf{r}\) of \(G\) we have a vector space \(V^{(\mathbf{r})}\) of dimension \(r\) and an action of the Lie group \(G\) on \(V^{(\mathbf{r})}\) by a representation matrix \(r(g)\). We can then define an associated vector bundle E with \[\begin{equation} \label{assoc_VB} \begin{tabular}{cc} fibre & $F=V^{(\mathbf{r})}$\\ transitions functions & $t(x)=r(g(x))$ \end{tabular} \end{equation}\] so that under a change of coordinates \[\begin{equation} \label{assoc_VB_2} (x,v) \mapsto (\tilde x(x), \tilde v=r(g(x))v)~. \end{equation}\] The ‘associated vector bundle’ is associated to the principal bundle \(P\) and the representation \(\mathbf{r}\).

  2. We can start to observe a correspondence between Maths and Physics emerge:

    Maths Physics
    Principal \(G\)-bundle Gauge symmetry \(G\)
    (Section of) Associated vector bundle Charged field

    We will complete this correspondence in the next section.

5.3 Connection, holonomy and curvature

Schematic depiction of a vector field.

Let \(v(x)\) be a smooth section of a vector bundle over \(M\), written in local coordinates. See figure 5.7. Can we define partial derivatives of \(v\), or directional derivatives of \(v\) along a curve \(C\) in \(M\), which in local coordinates is parametrised by \(x^\mu=x^\mu(\tau)\)?

Schematic depiction of the notion of parallel transport.

We immediately run into a problem: we cannot subtract vectors defined at infinitesimally close points, as we would do to define a derivative, because these two vectors belong to two different vector spaces. In order to define a notion of directional derivative, we need a way of comparing vectors defined at different points along the curve. Let \(p_0=c(0)\) and \(p=c(\tau)\) be two points along the curve \(C\), with coordinates \(=x(0)\) and \(x(\tau)\) respectively. Associated to those two points we have two distinct (though isomorphic) vector spaces, \(V_0 \equiv \pi^{-1}(p_0)=\pi^{-1}(c(0))\) and \(V_\tau \equiv \pi^{-1}(p)=\pi^{-1}(c(\tau))\). We can compare elements of \(V_0\) and elements of \(V_\tau\) by introducing a notion of parallel transport of vectors along the curve \(C\), which is realised by an invertible linear map \[\begin{equation} \label{parallel_transport} \begin{tabular}{rccl} $\Omega(\tau)$: & $V_0$ & $\to$ & $V_\tau$\\ & $v_0$ & $\mapsto$ & $\Omega(\tau)v_0$ \end{tabular} \end{equation}\] which obeys \(\Omega(0)=\mathbbm{1}\). See figure 5.8. Picking a basis of the vector space \(V\), \(\Omega(\tau)\) is a matrix in \(GL(V)\).

More generally, we can compare vectors in the fibres above any two points \(c(\tau)\) and \(c(\tau')\) along the curve \(C\) by using the map \[\begin{equation} \label{parallel_transport_2} \Omega(\tau')\Omega^{-1}(\tau): \quad V_\tau \to V_{\tau'}~. \end{equation}\]

By comparing the values of the vector field at infinitesimally close points, with coordinates \(x^\mu=x^\mu(\tau)\) and \(x^\mu(\tau+\epsilon d\tau)\), we can define the covariant derivative \(\nabla_\mu v\) by \[\begin{equation} \label{covariant_derivative_VB} \nabla v = \nabla_\mu v ~dx^\mu:= \lim_{\epsilon\to 0} \frac{v(x(\tau+\epsilon d\tau))- \Omega(\tau+\epsilon d\tau)\Omega^{-1}(\tau) v(x(\tau))}{\epsilon} \end{equation}\] where \(dx^\mu = \dot x^\mu(\tau)d\tau\) in the parametrization of the curve. The parameter \(\epsilon\) is a book-keeping device which I have introduced to keep track of infinitesimals and to define the limit.

REMARK: The definition of the covariant derivative \(\eqref{covariant_derivative_VB}\) of the vector field \(v\) depends on the local form of parallel transport \(\Omega\) in an infinitesimal neighbourhood of \(\tau\). Letting \[\begin{equation} \label{parallel_transport_infinitesimal} \Omega(\tau+\epsilon d\tau) = \Omega(\tau) - \epsilon \mathcal{A}(x(\tau)) \Omega(\tau) + \mathcal{O}(\epsilon^2)~, \end{equation}\] the equation \(\eqref{covariant_derivative_VB}\) becomes \[\begin{equation} \label{covariant_derivative_VB_connection} \nabla v(x) = dv(x) + \mathcal{A}(x)v(x)~, \end{equation}\] where \(\mathcal{A}(x)\), which is called the connection of the vector bundle, is a matrix-valued cotangent vector field (or equivalently, a matrix-valued differential form): \[\begin{equation} \label{connection} \mathcal{A}(x) = \mathcal{A}_\mu(x) dx^\mu~, \end{equation}\] with \(\mathcal{A}_mu\) a matrix in \(gl(V)\) for each \(\mu\) and \(x\). 41

In components, the covariant derivative reads \[\begin{equation} \label{covariant_derivative_VB_components} \nabla_\mu v^\alpha(x) = \partial_\mu v^\alpha(x) + \mathcal{A}_\mu(x)^\alpha{}_\beta v^\beta(x)~. \end{equation}\] The connection \(\mathcal{A}\) encodes the infinitesimal version of parallel transport.

Now consider a change of coordinates in the fibre only: \[\begin{equation} \label{coord_change_fibre} (x,v) ~\mapsto~ (x, \tilde v = t(x) v)~. \end{equation}\] Being a map from \(V_0\) to \(V_\tau\), the parallel transport map \(\Omega(\tau)\) transforms like \[\Omega(\tau) ~\mapsto ~ t(x(\tau)) \Omega(\tau) t(x(0))^{-1}\] under changes of coordinates in the fibres. Using the definition \(\eqref{covariant_derivative_VB}\), it follows that \(\nabla_\mu v\) transforms like \(v\): \[\begin{equation} \label{cov_der_transfo} \nabla_\mu v(x) ~\mapsto ~ t(x) \nabla_\mu v(x)~, \end{equation}\] or in terms of differential operators \[\begin{equation} \label{cov_der_transfo_2} \nabla_\mu ~\mapsto ~ t(x) \nabla_\mu t(x)^{-1}~. \end{equation}\] This requires the connection to transform as follows: \[\begin{equation} \label{coord_change_connection} \mathcal{A}_\mu ~ \mapsto ~ \tilde \mathcal{A}_\mu = t \partial_\mu t^{-1} + t \mathcal{A}_\mu t^{-1}~. \end{equation}\]

REMARKS:

  1. This construction works for any vector bundle \(E\). In fact, it works for any fibre bundle, with minor adjustments which I leave as an exercise for the interested reader.

  2. When \(E\) is a vector bundle associated to a principal \(G\)-bundle \(G\) and a representation \(\mathbf{r}\), the connection is \(\mathcal{A}_\mu = -i A_\mu^{(\mathbf{r})}\), with \(A_\mu^{(\mathbf{r})}\) the gauge field, acting in the representation \(\mathbf{r}\). For a principal \(G\)-bundle, \(\mathcal{A}_\mu=-i A_\mu\), where \(A_\mu\) is the Lie algebra valued gauge field which transforms into \(g A_\mu g^{-1}+ i g \partial_\mu g^{-1}\). (The conversion factors of \(i\) are conventional: they are there because physicists have good reasons to like unitary and hermitian operators.)

  3. When \(E\) is the tangent bundle, \(\mathcal{A}_\mu\) is the affine connection which appears in differential geometry and general relativity, also known as Levi-Civita connection.

Parallel transport and The holonomy \Omega_C along the loop C.

Now let’s return to the finite version of parallel transport. Consider a closed curve (or loop) \(C\) in the base manifold \(M\), starting and ending at the same point \(p_0\), which is called the base point of the loop. See figure 5.9. We can parallel transport a vector \(v_0 \in \pi^{-1}(p_0)\) along the loop \(C\). When we reach the end of the loop we obtain a new vector \(\Omega_C v_0 \in \pi^{-1}(p_0)\), which is ‘rotated’ by a transformation42 \[\begin{equation} \label{holonomy} \Omega_C \in GL(V) \end{equation}\] compared to the original vector \(v_0\). This is called the holonomy (of the connection \(\mathcal{A}_\mu\)) along the loop \(C\).

Holonomies along loops starting and ending at the same base point \(p_0\) form a group, called the holonomy group, which is a subgroup of \(GL(V)\). This is a consequence of the definition of parallel transport and of the fact that closed paths themselves form a group, where the composition law is the concatenation of paths. A bit more explicitly:

Concatenation of two paths C_1 and C_2.

REMARK: The holonomy group is generically non-abelian: \[\Omega_{C_1} \Omega_{C_2} \neq \Omega_{C_2} \Omega_{C_1}.\] If we parallel transport first along \(C_1\) and then along \(C_2\), we’ll usually get a different result than if we parallel transported first along \(C_2\) and then along \(C_1\).43

\(\ast\) EXERCISE:

Curvature, from the holonomy along the perimeter of an infinitesimal parallelogram.

The curvature \(\mathcal{F}_{\mu\nu}\) is the holonomy along an infinitesimal loop. More precisely, consider an infinitesimal loop \(dC\) which is the perimeter of a parallelogram with vertices \[x^\mu~, \quad x^\mu + \epsilon v^\mu~, \quad x^\mu + \epsilon (v^\mu+w^\mu)~, \quad x^\mu + \epsilon w^\mu~,\] as in figure 5.11. Then \[\begin{equation} \label{curvature_0} \Omega_{dC} = \mathbbm{1}+ \epsilon^2 \mathcal{F}_{\mu\nu}(x) v^\mu w^\nu + \mathcal{O}(\epsilon^3)~, \end{equation}\] where \[\begin{equation} \label{curvature} \mathcal{F}_{\mu\nu} = \partial_\mu \mathcal{A}_\nu - \partial_\nu \mathcal{A}_\mu + [\mathcal{A}_\mu, \mathcal{A}_\nu]~. \end{equation}\]

Exercise.

Under a change of coordinates in the fibre \(\eqref{coord_change_fibre}\), the curvature transforms as follows: \[\mathcal{F}_{\mu\nu} ~ \mapsto ~ t \mathcal{F}_{\mu\nu} t^{-1}~.\]

Contractible and non-contractible loops on a 2-torus.

REMARKS:

  1. For a principal \(G\)-bundle, \(\mathcal{F}_{\mu\nu}=-iF_{\mu\nu}\), where \(F_{\mu\nu}\) is the field strength of \(A_\mu\). (Similarly, \(\mathcal{F}_{\mu\nu}=-iF_{\mu\nu}^{(\mathbf{r})}\) for an associated vector bundle.

  2. Let us assume that the curvature vanishes. This does not mean that the connection vanishes. This has the surprising consequence that the holonomy can be non-trivial (that is, \(\Omega_C \neq \mathbbm{1}\)) if the loop \(C\) is not contractible to a point. For instance, on a 2-torus \(T^2\) (the surface of a doughnut), see figure 5.12, the holonomy along the loop \(C_1\), which is not contractible, can be non-trivial, whereas the holonomy along the loop \(C_1\), which is continuously contractible to a point, can be shown to be trivial.

Vocabulary: if the curvature vanishes, \(\mathcal{F}_{\mu\nu}=0\), we say that \(\mathcal{A}_\mu\) is a flat connection, or equivalently that the bundle \(E\) is flat. The holonomy of a flat connection is called monodromy.

There is a lot more that can be said, but this will be left to future courses. I’ll conclude this chapter by summarizing the correspondence between the geometry of fibre bundles and the formulation of gauge theories in physics:

Geometry Physics
Principal \(G\)-bundle \(P\) Gauge symmetry \(G\)
Connection \(\mathcal{A}_\mu\) of \(P\) Gauge field (or vector potential) \(A_\mu\)
Curvature \(\mathcal{F}_{\mu\nu}\) of \(P\) Field strength \(F_{\mu\nu}\)
(Section of) Associated vector bundle Charged field
Covariant derivative \(\nabla_\mu\) Gauge covariant derivative \(D_\mu\)
Parallel transport Wilson line
Trace of the holonomy Wilson loop

  1. There is a deeper meaning which is that these are in the tangent (\(x^\mu\)) and cotangent spaces (\(x_\mu\)) of space-time.↩︎

  2. For \(r_d(g)\), \(g \in SU(2)\), the eigenvalues are real or come in pairs of complex conjugates, so this does not happen.↩︎

  3. One may choose other boundary conditions as well.↩︎

  4. Recall that \(\partial_0=\frac{\partial}{\partial t}\) and \(\partial_i = \frac{\partial}{\partial x^i}=(\boldsymbol{\nabla})_i\), and that the Lorentz transformation \(x^\mu \mapsto x'^\mu =\Lambda^\mu_{\,\,\ \nu} x^\nu\) of the spacetime coordinates implies the following Lorentz transformation of the derivatives: \[\partial_\mu \mapsto \partial'_\mu = \Lambda_\mu{}^\rho \partial_\rho = (\Lambda^{-1})^\rho{}_\mu \partial_\rho~.\]↩︎

  5. Recall that by definition a Lorentz tensor with two indices transforms as \[F^{\mu \nu}(x) \mapsto F'^{\mu \nu}(x) =\Lambda^{\mu}_{\,\,\, \rho} \Lambda^{\nu}_{\,\,\, \sigma} F^{\rho \sigma}(\Lambda^{-1}x)\] under a Lorentz transformation.↩︎

  6. Vocabulary: a tensor with \(n\) indices is called an \(n\)-th rank tensor or equivalently a rank-\(n\) tensor.↩︎

  7. We assume that all fields are smooth functions, hence they have continuous second partial derivatives and Schwarz/Clairaut’s theorem applies. It turns out that this assumption is false for generic field configurations in quantum field theory, but we are only doing classical field theory here, and we’ll leave that story for another day.↩︎

  8. This is known as the Poincaré lemma, which is a generalization of the fact that \(\boldsymbol{\nabla} \times \boldsymbol{F} = 0\) implies \(\boldsymbol{F} = \nabla \phi\) locally (see AMV). An open set \(U\) is called star-shaped (or a star domain) if there exists a point \(p\in U\) such that for any \(q \in U\), the line segment from \(p\) to \(q\) is contained in \(U\).↩︎

  9. The overall minus sign is there to ensure that the Hamiltonian of the electromagnetic field is positive definite. More about this later.↩︎

  10. As I will stress later, this is a misnomer: a gauge ‘symmetry’ is not really a symmetry of a physical system. Rather, it is a redundancy in our description of the system.↩︎

  11. We assume that the fields obey boundary conditions such that this holds, e.g. that they vanish fast enough at infinity, or that they obey (along with the gauge parameter) periodic boundary conditions.↩︎

  12. Recall that mathematically, this is a map from Minkowski space-time \(\mathbb{R}^{1,3}\) to \(\mathbb{C}\), which associates a complex number to each point in space-time:

    \(\phi\): \(\mathbb{R}^{1,3}\) \(\to\) \(\mathbb{C}\)
    \(x^\mu\) \(\mapsto\) \(\phi(x)\)

    Greek indices \(\mu, \nu,\dots\) are space-time indices running from 0 to 3. (Roman indices \(i,j,\dots\) are spatial indices running from 1 to 3. Index 0 is for time.) Unless we explicitly state otherwise, we will typically assume that all fields are smooth.↩︎

  13. Recall that \(|\partial_\mu\phi|^2\) is a short-hand notation for \(\partial_\mu \bar\phi \partial^\mu \phi\), where Einstein summation convention (repeated indices are summed over) is understood. Recalling that we work with Minkowski metric \([\eta_{\mu\nu}]=\mbox{diag}(-1,+1,+1,+1)\), this means that \(|\partial_\mu\phi|^2=-|\partial_0\phi|^2+|\partial_i \phi|^2 = -|\dot\phi|^2+|\nabla \phi|^2\).↩︎

  14. See section 2.6 of if you want to read more about this.↩︎

  15. If you are confused by these statements and manipulations, act with the differential operator on any smooth test function \(f(x)\). If \(X\) and \(Y\) are two differential operators, then \(X=Y\) iff \(Xf=Yf\) for all smooth test functions. Similarly \(X \mapsto Y\) iff \(Xf \mapsto Yf\) for all smooth test functions.↩︎

  16. Naively you might want to impose the simpler identification \(A_\mu^{(1)}= A_\mu^{(2)}\), but taking into account that gauge fields are only defined modulo gauge transformations, one is led to the more general (and mathematically correct) identification in the main text. It took physicists several decades to appreciate this point.↩︎

  17. Foliation is a mathematical term, from ‘folia’, Latin for ‘leaf’. You can look up the technical definition if you are interested. For our purposes, you can take it to mean that field space is a union of disjoint orbits of the gauge group.↩︎

  18. If you are formally minded, you would say that the physical configuration space \(\mathcal{C}\) is the quotient of the field space \(\mathcal{F}\) by the gauge group \(\mathcal{G}\), \[\mathcal{C}=\mathcal{F}/\mathcal{G}~,\] namely the set of equivalence classes of field configurations under the equivalence relation \(\eqref{gauge_equivalence}\).↩︎

  19. Here the right-hand side \(- \partial_\mu A^\mu\) is given and acts as a source in a relativistic Poisson equation for \(\alpha\). Solutions can be found by the method of Green’s functions.↩︎

  20. The Lorenz gauge is due to the Danish physicist Ludvig Lorenz, not to be confused with the more famous Dutch physicist Hendrik Lorentz, who is responsible for the Lorentz transformations which leave the laws of special relativity invariant, as well as for introducing the Lorentz force which acts on relativistic particles moving in a magnetic field. Click on the names of the physicists to see who is who.↩︎

  21. For the gauge group \(G=U(1)\), which we are considering here, the Wilson line and the gauge transformations \(e^{i\alpha(x_i)}\) commute, so we could have written the gauge transformation of the Wilson line simply as \[W_C(x_2,x_1) \mapsto e^{i (\alpha(x_2)-\alpha(x_1))} W_C(x_2,x_1)~.\] I wrote the result like \(\eqref{Wilson_line_gauge_transfo}\) for comparison to the case of a non-abelian gauge group, which we will study later.↩︎

  22. \(F=\frac{1}{2}F_{\mu\nu}dx^\mu\wedge dx^\nu\) is called a differential 2-form. It can be shown that the surface integral of a differential 2-form is independent under reparametrizations of the surface that preserve its orientation, much like line integrals of a differential 1-form \(A= A_\mu dx^\mu\).↩︎

  23. The subscript \(+\) is simply a label, the reason for which will become clear later.↩︎

  24. These spherical coordinates are ill-defined near the poles, but this won’t be important for what follows. One can find a set of well-defined coordinates in the two patches, for example the stereographic coordinates that you encountered in the first term. What matters is that there is no single set of coordinates which cover the whole \(S^2\).↩︎

  25. The vanishing of \(A^+\) at the north pole and of \(A^-\) at the south pole is what ensures that they are well defined there, even if the polar coordinates are ill defined. This can be checked explicitly by switching to stereographic coordinates or to Cartesian coordinates.↩︎

  26. If the representation \(\mathbf{r}\) is irreducible we think of \(\phi\) as a single field; if representation \(\mathbf{r}\) is reducible, namely it is the direct sum of multiple irreps of \(G\), then we think of \(\phi\) as describing multiple charged fields.↩︎

  27. Note: from now on I will ignore the distinction between the Lie group \(G\) and the gauge group \(\mathcal{G}\), which consists of coordinate-dependent elements of \(G\). I will simply use \(G\) for the gauge group.↩︎

  28. Here \(\mathds{1}\) is the identity matrix of the same size as \(A_\mu\), e.g. the \(N\times N\) identity matrix for \(G=SU(N)\). It is customary to omit the identity matrix from the notation and simply write, and I’ll follow that convention and only restore \(\mathds{1}\) when it helps to understand what is going on. If you are formally minded and want to be very precise, you might write the covariant derivative as \[D_\mu = \partial_\mu \otimes\mathds{1}- i A_\mu^a(x)\otimes t_a~,\] which acts on the tensor product \(C^\infty(U)\otimes V\) of the vector space \(C^\infty(U)\) of smooth functions defined on a patch \(U\) of space-time and of the finite-dimensional vector space \(V\) associated to the fundamental representation. We won’t need to worry about such level of abstraction and formality.↩︎

  29. The derivative of a (matrix-valued) function is a (matrix-valued) function.↩︎

  30. It’s constant in the sense that it does not depend on space-time. In quantum field theory, \(g_{YM}\) develops a dependence on the energy scale at which we are probing the system, so ‘constant’ is a misnomer. With that in mind, even though it’s not relevant for this course, I’ll typically call \(g_{YM}\) simply the ‘Yang-Mills coupling’.↩︎

  31. Actually, we are only sure about the gauge algebra, which leaves freedom for the gauge group to be \(U(1) \times SU(2) \times SU(3)/\mathbb{Z}_N\) for any \(N \in \{ 1, 2, 3, 6\}\), see for some discussion.↩︎

  32. Mathematically, this is because \(\Pi_2(S^2)=\mathbb{Z}\).↩︎

  33. It can be shown that the solution approaches the limiting values exponentially fast, much faster than is needed for the integral \(\eqref{energy_GeorgiGlashow}\) to converge.↩︎

  34. A tangent vector \(\hat v_p\) at a point \(p\) is also linear and obeys a form of the Leibniz rule: \[\begin{split} 1. & \quad \hat v_p(a_1 \hat f_1 + a_2 \hat f_2) = a_1 \hat v_p(\hat f_1)+a_2 \hat v_p(\hat f_2) \\ 2. & \quad \hat v_p(\hat f \hat g) = \hat v_p(\hat f)\hat g(p) + \hat f(p) \hat v_p(\hat g)~, \end{split}\] as an immediate consequence of \(\eqref{linearity}\) and \(\eqref{Leibniz}\) for tangent vector fields. One can also define tangent vectors at a point more abstractly using the axioms in this footnote without making reference to tangent vector fields, and the introduce tangent vector fields from this.↩︎

  35. For the attentive reader: in order to equip \(TM\) with the structure of a differentiable manifold, we need to specify smooth transition functions for all its \(2n\) coordinates, not just for the base coordinates. We will do that shortly, in equation \(\eqref{tangent_bundle_trans_fn}\).↩︎

  36. (In terms of the invertible maps \(\varphi:~~ U \to \mathbb{R}^n\) and \(\tilde \varphi:~~\tilde U \to \mathbb{R}^n\), the change of coordinates is given by the transition function \(\tilde\varphi \circ \varphi^{-1}\): \[x=\varphi(p) \mapsto \tilde x=\tilde\varphi(p) = (\tilde\varphi \circ \varphi^{-1})(x)~.\]↩︎

  37. This is for a real vector bundle, in which the fibre \(V\) is a real vector space. If \(V\) is a vector space over a field \(\mathbb{F}\), replace \(\mathbb{R}\) by \(\mathbb{F}\).↩︎

  38. Here \(i,j,k\) are labels, not vector indices. In the notation used at the beginning of this chapter, I would have written \((x_{(i)},v_{(i)})\) etc. I am omitting brackets here to avoid cluttering the notation.↩︎

  39. The transition functions for the base \(M\) are also elements of a group, the diffeomorphism group of \(M\).↩︎

  40. This is often misspelt as principle bundle. We shouldn’t change our principles as we change coordinates in the base, therefore principle bundles are not a good idea.↩︎

  41. Formally, \(\mathcal{A}_\mu\) takes value in the Lie algebra of the structure group of the vector bundle.↩︎

  42. Recall that \(\pi^{-1}(p_0)=V\).↩︎

  43. There are exceptions, for instance if the structure group of the fibre bundle is abelian, or if the connection vanishes. Hence the qualifier ‘generically’.↩︎