Notes on General Relativity

Dr A. Donos

1 Introduction

In this course we will discuss Einstein’s theory for a classical theory of gravity. The wish list for a successful theory of classical gravity has two ingredients:

1.

How does gravity affect probe particles & fields
2.

How does mass (energy) density produce a gravitational field

1.1 (Why not) Newton’s gravity

The first attempt belongs to Newton according to whom gravitational effects are the result of a field that fills out the whole space. To answer the first question 1, Newton proposed the following theoretical assumptions:

•

The work

$\displaystyle W=\int_{A}^{B}\vec{F}_{g}\,d\vec{x}\,,$ (1)

done by the gravitational force $\vec{F}_{g}$ on a particle that moves between two points is independent of the path i.e. it is a conservative. In modern language, Stokes’ theorem suggests that the vector $\vec{F}_{g}$ is proportional to the gradient of a scalar function $\Phi$ , the gravitational potential of the field. The constant of proportionality defines the gravitational mass $m_{g}$ and $\vec{F}_{g}=-m_{g}\,\vec{\nabla}\Phi$ .
•

The gravitational mass $m_{g}$ would in general be different from the mass of inertia $m$ that enters Newton’s second law

$\displaystyle m\,\vec{a}=-m_{g}\vec{\nabla}\Phi\,.$ (2)
•

It is an experimental fact that all objects accelerate in the same way when inside the same gravitational field and therefore they must have $m=m_{g}$ . This is known as the equivalence principle and therefore

$\displaystyle\vec{a}=-\vec{\nabla}\Phi\,.$ (3)

In Newton’s language, the aim of the second item in our wish list for a gravitational theory wants to answer how $\Phi$ is determined given a gravitational mass density $\rho$ depends both on the spatial coordinates $x^{i}$ as well as on time $t$ . This is determined after solving the elliptic equation

\displaystyle\nabla^{2}\Phi(x^{i},t)=4\pi\,G_{N}\,\rho(x^{i},t)\,,

(4)

where $G_{N}$ is Newton’s gravitational constant. The general solution of equation (4) takes the form

\displaystyle\Phi(\vec{x},t)=\int_{\mathbb{R}^{3}}G(\vec{x},\vec{x}^{\prime})% \,\rho(\vec{x}^{\prime},t)\,d^{3}\vec{x}^{\prime},

(5)

with $G(\vec{x},\vec{x}^{\prime})$ the Green’s function for the boundary value problem we need to solve. After imposing the reasonable requirement that a distribution $\rho(x^{i},t)$ of compact support should have a vanishing gravitational potential at infinity, we find that

\displaystyle G(\vec{x},\vec{x}^{\prime})=-\frac{1}{4\pi}\,\frac{1}{\left|\vec% {x}-\vec{x}^{\prime}\right|}\,.

(6)

The above discussion is actually a theory of gravity for us. However, by looking at equation (5) we see that a small, local change in the distribution $\rho(x^{i},t)$ will instantaneously change the gravitational potential $\Phi(x^{i},t)$ everywhere in space. In other words, the information about the change $\delta\rho(x^{i},t)$ is transmitted everywhere in space without any delay. Thinking in loose terms, this is the result of the absence of time derivatives from equation (4) which would make the gravitational potential behave more like a wave rather than something static and rigid.

This is very different from what we have learned Maxwell’s theory where a local change in the charge density $\rho_{e}(x^{i},t)$ would first result in radiation that would later settle down to static electric and magnetic fields. In the next section we will examine Maxwell’s theory and try to extract the invariants of space and time, thinking of it as a fundamental theory.

As we will see later in the course Einstein’s resolution was rather elegant and groundbreaking answer:

1.

There is no gravitational force! Spacetime is not flat and all particles move on straight lines in a curved spacetime.
2.

The curvature of spacetime is determined by the state of matter inside it.

The hope is that the two answers above will start making sense by the end of the course.

1.2 Maxwell’s Theory and Symmetries of Spacetime

In Maxwell’s theory, particles of electric charge $q$ which move at a velocity $\vec{v}$ inside an electric field $\vec{E}$ and magnetic field $\vec{B}$ experience the force,

\displaystyle\vec{F}_{E/M}=q\,\vec{E}+q\,\vec{v}\times\vec{B}\,.

(7)

The first term is the Coulomb force while the second one is the Lorentz one.

On the other hand, an electric charge density $\rho_{e}$ and current density $\vec{J}$ will act as sources for the electromagnetic field. This is described by Maxwell’s equations which read,

	$\displaystyle\nabla\vec{E}=4\pi\,\rho_{e},\qquad\nabla\vec{B}=0$
	$\displaystyle\nabla\times\vec{E}+\frac{1}{c}\partial_{t}\vec{B}=0,\qquad\nabla% \times\vec{B}-\frac{1}{c}\partial_{t}\vec{E}=\frac{4\pi}{c}\vec{J}\,,$		(8)

with $c$ the speed of light. The above set of equations provides us the second item on our wish list for a classical theory of electromagnetism. An important feature of Maxwell’s theory is that it contains time derivatives of the electric and magnetic fields. The first physical consequence of this fact is that the way time derivatives enter in (1.2) allow for the existence of time dependent electric and magnetic fields in the form of waves in empty space with $\rho_{e}=0$ and $\vec{J}=0$ .

The second important consequence of the time derivatives is that small local changes $\delta\rho_{e}$ and $\delta\vec{J}$ in the sources are not transmitted instantaneously everywhere in space through the generated fields. At a more fundamental level, apart from spatial rotations and translations, equations (1.2) are also left invariant under the novel transformation,

	$\displaystyle x^{\prime}=\gamma\left(x-v\,t\right),\quad y^{\prime}=y,\quad z^% {\prime}=z,\quad ct^{\prime}=\gamma\left(ct-\frac{v}{c}x\right),$
	$\displaystyle E_{x}^{\prime}=E_{x},\quad E_{y}^{\prime}=\gamma\left(E_{y}-vB_{% z}\right),\quad E_{z}^{\prime}=\gamma\left(E_{z}+vB_{y}\right),$
	$\displaystyle B_{x}^{\prime}=B_{x},\quad B_{y}^{\prime}=\gamma\left(B_{y}+% \frac{v}{c^{2}}E_{z}\right),\quad B_{z}^{\prime}=\gamma\left(B_{z}-\frac{v}{c^% {2}}E_{y}\right),$
	$\displaystyle\gamma=\left(1-v^{2}/c^{2}\right)^{-1/2}.$		(9)

This transformation is parametrised by a velocity parameter $v$ with $|v|<c$ and represents a Lorentz boost in the $x$ direction. Similar transformations can be written for the Lorentz boosts in the $y$ and $z$ directions.

This is a rather radical transformation from Newton’s point of view as it also transforms time as if it was a coordinate. This is exactly the lesson we are learning from electromagnetism, $t$ should be on equal footing with the spatial coordinates $x$ , $y$ and $z$ .

1.3 Special Relativity

In the previous subsection we discovered Lorentz boosts as symmetries of Maxwell’s theory. Einstein appreciated this message and considered Lorentz boosts as a fundamental symmetry of empty spacetime. As we emphasised already, this promotes (or downgrades) time to yet another coordinate. We will therefore now use the notation $x^{\mu}$ for the coordinates of a point in spacetime with $\mu=0,\ldots,3$ where $x^{0}=ct$ , $x^{1}=x$ , $x^{2}=y$ and $x^{3}=z$ .

The full set of symmetries is now rotations, translations as well as Lorentz boosts and can be written as,¹¹ 1 We will be using the standard Einstein summation convention for repeated indices i.e. $\alpha_{\mu}\beta^{\mu}\equiv\alpha^{0}\beta_{0}+\alpha^{1}\beta_{1}+\cdots+% \alpha^{d}\beta_{d}$ in $d+1$ dimensions.

\displaystyle x^{\prime\mu}=\Lambda^{\mu}{}_{\nu}x^{\nu}+\beta^{\mu}\,.

(10)

The $\left(\begin{smallmatrix}1\\ 1\end{smallmatrix}\right)$ tensor²² 2 If you are not familiar with this terminology, you can think of $\Lambda^{\mu}{}_{\nu}$ simply as a $4\times 4$ matrix for the time being. We will define tensors more carefully in section 2. $\Lambda^{\mu}{}_{\nu}$ parametrises the Lorentz boosts as well as the standard group of spatial rotations $SO(3)$ . This is the Lorentz group of transformations. The vector $\beta^{\mu}$ simply represents a space time translation which together with the Lorentz group form the Poincare group.

For the Lorentz boost of equation (1.2) we can write

\displaystyle\Lambda^{\mu}{}_{\nu}=\left(\begin{array}[]{cccc}\cosh\phi&-\sinh% \phi&0&0\\ -\sinh\phi&\cosh\phi&0&0\\ 0&0&1&0\\ 0&0&0&1\end{array}\right)\,,

(11)

with $\phi=\tanh^{-1}(v/c)$ . A purely spatial rotation takes the form

\displaystyle\Lambda^{\mu}{}_{\nu}=\left(\begin{array}[]{cccc}1&0&0&0\\ 0&&&\\ 0&&R^{i}{}_{j}&\\ 0&&&\end{array}\right)\,,

(12)

with $R^{i}_{j}$ the matrix representation of a three dimensional Euclidean rotation. For a rotation in the $x$ - $y$ plane by an angle $\varphi$ we have

\displaystyle\Lambda^{\mu}{}_{\nu}=\left(\begin{array}[]{cccc}1&0&0&0\\ 0&\cos\varphi&\sin\varphi&0\\ 0&-\sin\varphi&\cos\varphi&0\\ 0&0&0&1\end{array}\right)\,.

(13)

We see that in total we have three parameters for the Lorentz boosts in the different directions along with the three parameters for the Euclidean rotations. Therefore, a generic element of the Lorentz group in four dimensions is fully specified by six parameters, this is the dimension of the group. This makes the Poincare group which includes space time translations a ten dimensional group.

The above shows that the “Euclidean” distance between points,

\displaystyle\Delta s_{E}^{2}=\Delta x^{2}+\Delta y^{2}+\Delta y^{2}\,,

(14)

is only left invariant under a subset of the Lorentz group, the standard spatial rotations. However, now that we are thinking of the full Lorentzian group as a fundamental symmetry, we need to do better than that and come up with a distance between points which is independent of the frame we choose.

We are then looking for a matrix $\eta_{\mu\nu}$ such that the number

\displaystyle\Delta s^{2}=\Delta x^{\mu}\eta_{\mu\nu}\Delta x^{\nu}\,,

(15)

is invariant under the Lorentz transformations (10). This requirement gives

$\displaystyle\Delta s^{2}=\Delta x^{\rho}\eta_{\rho\nu}\Delta x^{\nu}$	$\displaystyle=\Delta x^{\prime\pi}\eta_{\pi\mu}\Delta x^{\prime\mu}\Rightarrow$
$\displaystyle\Delta x^{\rho}\eta_{\rho\nu}\Delta x^{\nu}$	$\displaystyle=\Delta x^{\rho}\Lambda^{\pi}{}_{\rho}\eta_{\pi\mu}\Lambda^{\mu}{% }_{\nu}\Delta x^{\nu}\Rightarrow$
$\displaystyle\eta_{\rho\nu}$	$\displaystyle=\Lambda^{\pi}{}_{\rho}\eta_{\pi\mu}\Lambda^{\mu}{}_{\nu}\,,$	(16)

where in the last line we used that the equality has to hold for any vector $\Delta x^{\mu}$ . Demanding that the matrix $\eta_{\mu\nu}$ is such that the last equation holds for any Lorentz transformation $\Lambda^{\mu}_{\nu}$ fixes

\displaystyle\eta^{\mu}{}_{\nu}=\left(\begin{array}[]{cccc}-1&0&0&0\\ 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1\end{array}\right)\,.

(17)

Our four dimensional spacetime, equipped with the “inner product” $\alpha\cdot\beta\equiv\eta_{\mu\nu}\alpha^{\mu}\beta^{\nu}$ is called Minkowski space.

The distance (14) which is invariant under Euclidean rotations is strictly positive for non-zero vectors. However, the Lorentz invariant norm (15) can have either sign and can also be equal to zero for certain non-zero space time vectors. For any vector $V^{\mu}$ with a negative norm $|V|^{2}=\eta_{\mu\nu}V^{\mu}V^{\nu}=-l^{2}$ one can find a frame in which in which the vector has coordinates

\displaystyle V^{\mu}=\left(\begin{array}[]{c}l\\ 0\\ 0\\ 0\end{array}\right)\,,

(18)

after performing a Lorentz transformation. In other words, there is always a frame in which a negative norm vector points only in the time direction. Similarly, for any vector $x^{\mu}$ with positive norm $|V|^{2}=\eta_{\mu\nu}V^{\mu}V^{\nu}=l^{2}$ , we can find a frame in which it takes the form

\displaystyle V^{\mu}=\left(\begin{array}[]{c}0\\ l\\ 0\\ 0\end{array}\right)\,,

(19)

and points in a spatial direction. Lastly, for a zero norm vector we can always find a frame in which

\displaystyle V^{\mu}=\left(\begin{array}[]{c}l\\ l\\ 0\\ 0\end{array}\right)\,,

(20)

pointing at the trajectory of a light ray. For this reason we call positive norm vectors space-like, negative norm ones time-like and non-trivial zero norm vectors light-like.

Now that we have constructed an invariant distance between two points, we might wonder about an appropriate definition for the invariant length $L[x^{\mu}]$ of a space time curve $x^{\mu}(\lambda)$ with parameter $\lambda$ . I fact, it has to be a functional of the form

\displaystyle L[x^{\mu}]=\int_{\lambda_{i}}^{\lambda_{f}}d\lambda\,g(V^{\mu}(% \lambda))\,,

(21)

by using its tangent vector,

\displaystyle V^{\mu}=\dot{x}^{\mu}(\lambda)\equiv\frac{d}{d\lambda}x^{\mu}(% \lambda)\,.

(22)

In order to do this, we want to impose the following two important geometrical restrictions on our defintion of $L[x^{\mu}]$ which has to be invariant under,

•

Lorentz transformations $x^{\prime\mu}(\lambda)=\Lambda^{\mu}{}_{\nu}x^{\nu}(\lambda)$ ,
•

curve reparametrisations of the form $\lambda=f(\tau)$ with $f$ a monotonically increasing function.

The first requirement is trivially satisfied by making the function $g$ only a function of the norm of the tangent vector i.e. $g(V^{\mu})=h(\eta_{\mu\nu}V^{\mu}V^{\nu})$ . On the other hand, under a curve reparametrisation equation (21) gives

\displaystyle L[x^{\mu}]=\int_{\tau_{i}}^{\tau_{f}}d\tau\dot{f}(\tau)\,h\left(% (\dot{f}(\tau))^{-2}\,\eta_{\mu\nu}\frac{d}{d\tau}{x}^{\mu}(f(\tau))\frac{d}{d% \tau}{x}^{\nu}(f(\tau))\right)\,,

(23)

where $\lambda_{i,f}=f(\tau_{i,f})$ . However, if the definition (21) is parametrisation independent, the above should also be equal to,

\displaystyle L[x^{\mu}]=\int_{\tau_{i}}^{\tau_{f}}d\tau\,h\left(\eta_{\mu\nu}% \frac{d}{d\tau}{x}^{\mu}(f(\tau))\frac{d}{d\tau}{x}^{\nu}(f(\tau))\right)\,.

(24)

This is satisfied by the choice $h(\eta_{\mu\nu}V^{\mu}V^{\nu})=\sqrt{|\eta_{\mu\nu}V^{\mu}V^{\nu}|}$ and therefore,

\displaystyle L[x^{\mu}]=\int_{\lambda_{i}}^{\lambda_{f}}d\lambda\,\sqrt{|\eta% _{\mu\nu}V^{\mu}V^{\nu}|}\,.

(25)

Finally, we would like to discuss the relativistic free particle which is supposed to move on a straight line in space time. We can give two different definitions for what we mean by a straight line:

•

A straight line is a curve $x^{\mu}(\lambda)$ whose tangent vector $\dot{x}^{\mu}(\lambda)$ remains constant³³ 3 In fact, we will later see that this definition is not the most general. and therefore $\ddot{x}^{\mu}(\lambda)=0$ . This is a local definition.
•

A straight line $x^{\mu}(\lambda)$ extremises the functional $L[x^{\mu}]$ . This is a global definition.

One can easily show that the curve $x^{\mu}(\lambda)=V^{\mu}\,\lambda+\beta^{\mu}$ with $V^{\mu}$ and $\beta^{\mu}$ constant vectors satisfies both conditions and it is a straight line. This is precisely what a free relativistic particle wants to do, it will move on a straight line.

According to Einstein’s theory of gravity, Minkowski spacetime that we discussed in this section represents empty space. Einstein’s elegant answers to our wishlist boils down to the following:

1.

There is no gravitational force, all particles move on straight lines.
2.

Matter changes the geometry of spacetime, changing it from Minkowski to something with curvature.

The aim of the rest of the course is to make sense of the above two statements.

2 Elements of Differential Geometry

In order to discuss curved spacetimes and the physics of fields on them, we will have to introduce certain aspects of differential geometry. In the following subsections we will discuss manifolds, which are relevant the spacetimes themselves. In order to describe the geometry of our manifolds and physical fields which will live inside them, we will also need to introduce vectors, 1-forms and higher rank tensors. The next important ingredient to understand curvature and physical laws is differentiation in curved spacetimes. Finally, we will discuss the spacetime metric and curvature via the Riemann tensor which is the main object in Einstein’s General Relativity.

2.1 Manifolds and functions

A manifold is a minimal structure that we will need in order to describe our spacetimes in terms of coordinates systems. Before defining what a manifold is, we will need a notion of coordinate systems which we define according to,

Definition 2.1.

Given a topological space $M$ and an open subset $U\subset M$ , a coordinate system or chart on $U$ is a the pair $(U,\phi)$ with $\phi$ “1-1” map $\phi:U\rightarrow\mathbb{R}^{n}$ . The integer $n$ is called the dimension of $U$ .

A coordinate system allows us to describe the points of an open set by uniquely assigned a set of number to each one of them i.e. for a point $p\in U$ we can write $\phi(p)=(x^{1},x^{2},\ldots,x^{n})$ . Given a function $f:U\rightarrow\mathbb{R}$ , we can consider the composition $f\circ\phi^{-1}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ . Based on this composition we can decide whether this composition is differentiable in the standard sense of analysis in many variables. This clearly depends on the chart and the properties of the function $\phi$ . This implies that in order to make the question of differentiability more geometrical, we need to impose certain restrictions on our charts. Moreover, we need a sense in which differentiability of functions will be coordinate system independent. This is more of a restriction on the coordinate systems we are allowed to consider. For this reason we came up with the idea of a differentiable manifold,

Definition 2.2.

A differentiable manifold of dimension $n$ is a topological space⁴⁴ 4 Strictly speaking Hausdorff and second countable but we won’t need too much of this information in our course. with a collection of charts $(U_{\alpha},\phi_{\alpha})$ such that:

1.

The union of all the open sets in the collection of charts covers the set $M$ , i.e. $\cup_{\alpha}U_{\alpha}=M$
2.

If $U_{\alpha}\cap U_{\beta}\neq\emptyset$ then the function $\phi_{\alpha}\circ\phi_{\beta}^{-1}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ is infinitely differentiable
3.

The collection of all charts $(U_{\alpha},\phi_{\alpha})$ is maximal.

Notice that the number of dimensions $n$ does not change as we move around our manifold. The first condition makes sure that for any point $p\in M$ , we can find a coordinate system that describes it along with its immediate topological neighbourhood. The second condition gives meaning to functions which are differentiable at a point $p\in M$ without any reference to a coordinate system. To see this, consider a point $p\in M$ which belongs to the open sets $p\in U_{\gamma}\subset M$ and that the function $f:M\rightarrow\mathbb{R}$ is differentiable with respect to a specific chart $(U_{\delta},\phi_{\delta})$ at $p$ i.e. the function $(f\circ\phi_{\delta}^{-1})(x^{\mu})$ is differentiable at $x^{\mu}=\phi_{\delta}(p)$ . We can now consider a different coordinate system $(U_{\epsilon},\phi_{\epsilon})$ with $p\subset U_{\epsilon}$ and wonder about the differentiability of $(f\circ\phi_{\epsilon}^{-1})(y^{\mu})$ at $y^{\mu}=\phi_{\epsilon}(p)$ which is the same point from the point of view of $M$ . For this reason we can write $f\circ\phi^{-1}_{\epsilon}=(f\circ\phi_{\delta}^{-1})\circ(\phi_{\delta}\circ% \phi_{\epsilon}^{-1})$ and also note that $f\circ\phi_{\delta}^{-1}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ while $\phi_{\delta}\circ\phi_{\epsilon}^{-1}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ . We therefore have the composition of two functions of which we know that the first is differentiable by hypothesis while the second one is differentiable by the second item in the definition 2.2.

Even though the notion of differentiability is coordinate system invariant in a differential manifold, the partial derivatives of a function themselves are coordinate dependent. Suppose that we have two coordinate systems $(U_{1},\phi_{1})$ and $(U_{2},\phi_{2})$ and a point $p\in U_{1}\cap U_{2}$ . We will call the coordinates $\phi_{1}(p)=\{x^{\mu}\}$ and the coordinates $\phi_{2}(p)=\{y^{\mu}\}$ . If $f$ is a differentiable function at $p$ , we can also construct the functions $\hat{f}=f\circ\phi_{1}$ and $\tilde{f}=f\circ\phi_{2}$ . By using the identity $f\circ\phi_{2}^{-1}=f\circ\phi_{1}^{-1}\circ\phi_{1}\circ\phi_{2}^{-1}$ we can write that $\tilde{f}(y^{\mu})=\hat{f}(x^{\nu}(y^{\mu}))$ giving the relation

\displaystyle\partial_{y^{i}}\tilde{f}(y^{\mu})=\sum_{j}\frac{\partial x^{j}}{% \partial y^{i}}\partial_{x^{j}}\hat{f}(x^{\nu}(y^{\mu}))\,,

(26)

where we used the chain rule for partial derivatives. The above is true for any differentiable function $f$ giving,

\displaystyle\partial_{y^{i}}=\frac{\partial x^{j}}{\partial y^{i}}\,\partial_% {x^{j}}\,,

(27)

where we used Einstein’s convention for repeated indices. The transformation (27) is an important property of how partial derivatives transform under coordinate transformations and will show up frequently in our discussions.

2.2 Vectors and 1-forms

In the previous subsection we discussed partial functions and their partial derivatives. In this section we will introduce the notion of vectors on a general manifold. In order to define vectors we will have to think of them as being defined pointwise. The case you might be most familiar with is flat space, including Minkowski space. In flat space, vectors can be defined in a much more straightforward way since the vector space, tangent at each point is isomorphic to the space itself. On a more general manifold $M$ , we can define a vector $V$ at a $p\in M$ according to:

Definition 2.3.

A vector $V$ is a linear map $V:C^{\infty}(p)\rightarrow\mathbb{R}$ which satisfies the Leibniz rule of differentiation. As such, for any two real numbers $a,b\in\mathbb{R}$ and differentiable functions $f,g:M\rightarrow\mathbb{R}$ we must have,

1.

$V(a\,f+b\,g)(p)=a\,V(f)(p)+b\,V(g)(p)$
2.

$V(fg)(p)=f(p)V(g)(p)+g(p)V(f)(p)\,.$

Moreover, at any point $p\in M$ we can define the addition of two vectors $V$ and $W$ according to

\displaystyle(V+W)(g)=V(g)+W(g)

(28)

for any function $g$ . It is relatively straightforward to check that the set of all vectors at a point $p\in M$ form a linear space over real numbers, the tangent space $T_{p}$ .

At this point all this might seem too abstract and an example might be helpful. Given a curve $\gamma:\mathbb{R}\rightarrow M$ with parameter $\lambda\in\mathbb{R}$ and a point $\gamma(\lambda_{0})$ , a natural definition is its tangent vector $V_{p}$ . According to the definition 2.3 we need to define it by giving the image of each differentiable function $f$ at $\gamma(\lambda_{0})$ under $V_{p}$ .

Example 2.1.

The tangent vector $V\in T_{p}$ of a curve $\gamma(\lambda)$ at a point $p=\gamma(\lambda_{0})$ is defined so that fo any $f\in C^{1}(p)$ ,

\displaystyle V(f)=\frac{d}{d\lambda}(f\circ\gamma)(\lambda_{0})\,.

(29)

Notice that this definition is coordinate independent and the vector is really a geometric object. We can do a bit more if we give a coordinate system $(U,\phi)$ such that $p\in U$ and. We can write

	$\displaystyle V(f)$	$\displaystyle=\frac{d}{d\lambda}((f\circ\phi^{-1})\circ(\phi\circ\gamma))(% \lambda_{0})=\left.\frac{dx^{\mu}}{d\lambda}\right\|_{\lambda=\lambda_{0}}\left% .\frac{\partial f}{\partial x^{\mu}}\right\|_{x^{\mu}=x^{\mu}(\lambda_{0})}$
		$\displaystyle=\left(\left.\frac{dx^{\mu}}{d\lambda}\right\|_{\lambda=\lambda_{0% }}\,\partial_{\mu}\right)\left.f\right\|_{x^{\mu}=x^{\mu}(\lambda_{0})}\,,$		(30)

for any function $f$ . We see that, given a coordinate system, the tangent vector is expressed as a linear combination of partial derivatives,

\displaystyle V=V^{\mu}(p)\,\partial_{\mu},\quad V^{\mu}(p)=\left.\frac{dx^{% \mu}}{d\lambda}\right|_{\lambda=\lambda_{0}}\,.

(31)

It is worth comparing this result to what we had in Minkowski space in equation (22).

This result is much more general, any linear combination of the form $V=b^{\mu}\,\partial_{\mu}$ is a vector as it satisfies all the properties of the definition 2.3. What is not obvious but very important is that, given a specific coordinate system, its partial derivatives form a basis for $T_{p}$ , the coordinate basis $\{\partial_{\mu}\}$ . Therefore, the dimensions of $T_{p}$ are equal to the dimensions of the manifold $M$ and $\dim T_{p}=\dim M=n$ .

Suppose that $p\in U_{1}\cap U_{2}$ and $(U_{1},\phi_{1})$ , $(U_{2},\phi_{2})$ two coordinate systems with $\{x^{\mu}\}=\phi_{1}(p)$ and $\{y^{\mu}\}=\phi_{2}(p)$ . The corresponding basis for $T_{p}$ are $\{\partial_{x^{\nu}}\}$ and $\{\partial_{y^{\mu}}\}$ . Equation (27) is telling us precisely how the basis of partial derivatives $\{\partial_{\mu}\}$ transforms under a coordinate transformation

\{y^{\mu}\}=\{y^{\mu}(x^{\nu})\}\equiv(\phi_{2}\circ\phi_{1}^{-1})(\{x^{\nu}\}).

The basic requirement is that the vector $V$ is geometric and independent of the basis we write it in. This suggests that the components of the vector have to be such that

\displaystyle V=V^{i}\,\partial_{x^{i}}=V^{j\prime}\,\partial_{y^{j}}\,.

(32)

However, according to the transformation rule (27) we can write

\displaystyle V=V^{i}\,\partial_{x^{i}}=V^{i}\,\frac{\partial y^{j}}{\partial x% ^{i}}\,\partial_{y^{j}}\,,

(33)

and by comparison to the previous equation we have that the components of a vector transform according to

\displaystyle V^{j\prime}=\frac{\partial y^{j}}{\partial x^{i}}\,V^{i}\,.

(34)

So far, we have discussed the coordinate basis $\{\partial_{\mu}\}$ of the tangent space $T_{p}$ . As for any $n$ dimensional vector space, there are more general basis $\{e_{\nu}\}$ that we can use. This is again the statement that vectors are geometric objects, independent of the basis we use to describe them.

We will now move forward with the definition of the cotangent vectors,

Definition 2.4.

The space of all linear functionals $\omega:T_{p}\rightarrow\mathbb{R}$ is called the cotangent space $T^{\ast}_{p}$ . If $V,W\in T_{p}$ and $a,b\in\mathbb{R}$ then

\displaystyle\omega(a\,V+b\,W)=a\,\omega(V)+b\,\omega(W)\,.

(35)

In linear algebra terms, the cotangent space $T^{\ast}_{p}$ is the dual space of the tangent space $T_{p}$ . The dual space is itself a vector space under the addition rule

\displaystyle(a\,\omega+b\,\psi)(V)=a\,\omega(V)+b\,\psi(V)\,,

(36)

for all $a,b\in\mathbb{R}$ , vectors $V\in T_{p}$ and 1-forms $\omega,\psi\in T^{\ast}_{p}$ . As a theorem from linear algebra we know that $\dim T_{p}=\dim T^{\ast}_{p}$ suggesting that a basis $\{\theta^{\mu}\}$ will consist of $n$ 1-forms and which we can use to write any 1-form as a linear combination $\omega=\omega_{\mu}\,\theta^{\mu}$ with $\omega_{\mu}\in\mathbb{R}$ the components of the 1-form $\omega$ .

Definition 2.5.

Given a basis $\{e_{\mu}\}$ for the tangent space $T_{p}$ , there is a special $\{\theta^{\mu}\}$ basis, the dual basis, of $T^{\ast}_{p}$ that we can always choose such that $\theta^{\mu}(e_{\nu})=\delta^{\mu}_{\nu}$ , with $\delta^{\mu}_{\nu}$ being Kronecker’s delta. For the case where we have a coordinate basis and $e_{\nu}=\partial_{x^{\nu}}$ , the notation for the dual basis is $\theta^{\mu}=dx^{\mu}$ .

The dual basis is very convenient to work with since if $V=V^{\nu}e_{\nu}$ and $\omega=\omega_{\mu}\theta^{\mu}$ we have,

\displaystyle\omega(V)=\omega(V^{\nu}e_{\nu})=V^{\nu}\,\omega(e_{\nu})=V^{\nu}% \omega_{\mu}\,\theta^{\mu}(e_{\nu})=V^{\nu}\omega_{\mu}\delta^{\mu}_{\nu}=V^{% \nu}\omega_{\nu}\,.

(37)

As we have seen, the dual basis $\{dx^{\mu}\}$ is tied to a coordinate basis $\{\partial_{x^{\nu}}\}$ and the latter transforms according to (27). It is natural to ask how the dual basis transforms as well as the components of a 1-form under coordinate transformations. This will be fixed by the requirement that the number $\omega(V)$ has to be independent of the coordinate system, for any $V\in T_{p}$ . In equation (32) we have seen how to write a vector $V$ in in different coordinate systems. Correspondingly for the 1-form $\omega$ , we can write

\displaystyle\omega=\omega_{\nu}\,dx^{\nu}=\omega_{\mu}^{\prime}\,dy^{\mu}\,.

(38)

Base on the above expressions and also (37) we can write

\displaystyle\omega(V)=\omega_{\nu}V^{\nu}=\omega_{\mu}^{\prime}V^{\nu\prime}=% \omega_{\mu}^{\prime}\frac{\partial y^{\mu}}{\partial x^{\nu}}V^{\nu}\,,

(39)

where we used the transformation rule (34). Since the above has to be true for any vector $V$ , we conclude that we must have,

\displaystyle\omega_{\mu}^{\prime}=\frac{\partial x^{\nu}}{\partial y^{\mu}}\,% \omega_{\nu}\,.

(40)

Moreover, equation (38) implies that the dual basis vectors have to transform according to

\displaystyle dy^{\mu}=\frac{\partial y^{\mu}}{\partial x^{\nu}}\,dx^{\nu}\,.

(41)

We will conclude this section with an observation that will allow us to define higher rank tensors. As we saw, the 1-forms are the dual vectors of the tangent vectors. One might wonder what would happen if we now tried to define the dual $T^{\ast\ast}_{p}$ of the dual vector space $T_{p}^{\ast}$ . The simple answer is that $T^{\ast\ast}_{p}$ is isomorphic to $T_{p}$ and we wouldn’t gain something from doing that.

However, this is telling something important about what follows in the next section. To see how this works we need to discuss the isomorphism $\mathcal{F}:T_{p}\rightarrow T^{\ast\ast}_{p}$ . We need to find a map $\mathcal{F}$ that takes a vector $V$ and maps is to a functional of 1-forms $\mathcal{F}(V)$ . This can be simply defined through,

\displaystyle\mathcal{F}(V)(\omega)=\omega(V)\,,

(42)

for any $\omega\in T^{\ast}_{p}$ . One can show that the converse is also true, i.e. for any $Q\in T^{\ast\ast}_{p}$ one can find a $V\in T_{p}$ such that $Q=\mathcal{F}(V)$ and therefore the map $\mathcal{F}$ is bijective. From now on we will be using the notation,

\displaystyle V(\omega)\equiv\mathcal{F}(V)(\omega)\,,

(43)

showing that we can write

\displaystyle\omega(V)=V(\omega)\,,

(44)

in a meaningful way. For the basis vectors in particular we can also write

\displaystyle e_{\nu}(\theta^{\mu})=\theta^{\mu}(e_{\nu})=\delta^{\mu}_{\nu}\,.

(45)

2.3 Higher rank tensors

In the previous section we discussed vectors and 1-forms. One of the striking conclusions was that we can see 1-forms as functionals of vectors and vectors as functions of 1-forms! We might wonder whether there is more general structures of similar logic. The answer is that vectors and 1-forms are special kinds of tensors, of type $\left(\begin{smallmatrix}1\\ 0\end{smallmatrix}\right)$ and $\left(\begin{smallmatrix}0\\ 1\end{smallmatrix}\right)$ respectively. More generally we can define,

Definition 2.6.

An $\left(\begin{smallmatrix}r\\ s\end{smallmatrix}\right)$ tensor is a map $T:(T^{\ast}_{p})^{r}\otimes(T_{p})^{s}\rightarrow\mathbb{R}$ which is linear in all of its arguments i.e. it is multilinear. More specifically it takes

Given the basis $\{e_{\mu}\}$ for $T_{p}$ and $\{\theta^{\mu}\}$ for $T^{\ast}_{p}$ we can write the numbers,

\displaystyle T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}=T(\theta^{\mu_{1% }},\ldots,\theta^{\mu_{r}},e_{\nu_{1}},\ldots,e_{\nu_{s}}),

(46)

and as we will see these will be the components of our tensor. To appreciate their importance, consider a general set of $s$ vectors $V_{i}\in T_{p}$ , $i=1,\ldots,s$ and $r$ 1-forms $\omega_{j}\in T^{\ast}_{p}$ , $j=1,\ldots,r$ written in terms of our basis,

\displaystyle V_{i}=(V_{i})^{\mu}\,e_{\mu},\quad\omega_{j}=(\omega_{j})_{\mu}% \,\theta^{\mu}\,.

(47)

We now plug them in as arguments to our $\left(\begin{smallmatrix}r\\ s\end{smallmatrix}\right)$ tensor $T$ to obtain,

$\displaystyle T(\omega_{1},\ldots,\omega_{r},V_{1},\ldots,V_{s})$	$\displaystyle=T((\omega_{1})_{\mu_{1}}\,\theta^{\mu_{1}},\ldots,(\omega_{r})_{% \mu_{r}}\,\theta^{\mu_{r}},(V_{1})^{\nu_{1}}\,e_{\nu_{1}},\ldots,(V_{s})^{\nu_% {s}}\,e_{\nu_{s}})$
	$\displaystyle=(\omega_{1})_{\mu_{1}}\cdots(\omega_{r})_{\mu_{r}}\,(V_{1})^{\nu% _{1}}\cdots(V_{s})^{\nu_{s}}\,T(\theta^{\mu_{1}},\ldots,\theta^{\mu_{r}},e_{% \nu_{1}},\ldots,e_{\nu_{s}})$
	$\displaystyle=(\omega_{1})_{\mu_{1}}\cdots(\omega_{r})_{\mu_{r}}\,(V_{1})^{\nu% _{1}}\cdots(V_{s})^{\nu_{s}}\,T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}\,,$	(48)

where in the second line we used the fact that $T$ is linear in all of its arguments. The above shows that if we know the numbers $T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}$ , we know all the information about the tensor $T$ we need to evaluate it on any set of 1-forms and vectors. Using the property (45) we can write,

\displaystyle T=T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}\,e_{\mu_{1}}% \otimes\cdots\otimes e_{\mu_{r}}\otimes\theta^{\nu_{1}}\otimes\cdots\otimes% \theta^{\nu_{s}}\,,

(49)

since the above expression will give the same result with equation (2.3) after plugging in the argument,

\displaystyle T(\omega_{1},\ldots,\omega_{r},V_{1},\ldots,V_{s})=T(\omega_{1}% \otimes\ldots\otimes\omega_{r}\otimes V_{1}\otimes\ldots\otimes V_{s})\,.

(50)

The above considerations apply for any choice of basis $\{e_{\nu}\}$ and its dual $\{\theta^{\mu}\}$ . We will now specialise to a coordinate basis $\{\partial_{x^{\nu}}\}$ and its dual $\{dx^{\mu}\}$ to ask how would the components of the tensor transform under a coordinate transformation $x^{\mu}=x^{\mu}(y^{\nu})$ . We can write,

	$\displaystyle T$	$\displaystyle=T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}\,\partial_{x^{% \mu_{1}}}\otimes\cdots\otimes\partial_{x^{\mu_{r}}}\otimes dx^{\nu_{1}}\otimes% \cdots\otimes dx^{\nu_{s}}$
		$\displaystyle=T^{\mu_{1}^{\prime}\cdots\mu_{r}^{\prime}}_{\nu_{1}^{\prime}% \cdots\nu_{s}^{\prime}}\,\partial_{y^{\mu_{1}^{\prime}}}\otimes\cdots\otimes% \partial_{y^{\mu_{r}^{\prime}}}\otimes dy^{\nu_{1}^{\prime}}\otimes\cdots% \otimes dy^{\nu_{s}^{\prime}}$		(51)

and after using the transformation rules (27) and (41) we obtain,

\displaystyle T^{\mu_{1}^{\prime}\cdots\mu_{r}^{\prime}}_{\nu_{1}^{\prime}% \cdots\nu_{s}^{\prime}}=\prod_{i=1}^{r}\frac{\partial y^{\mu_{i}^{\prime}}}{% \partial x^{\mu_{i}}}\,\prod_{j=1}^{s}\frac{\partial x^{\nu_{j}}}{\partial y^{% \nu_{j}^{\prime}}}\,T^{\mu_{1}\cdots\mu_{r}}_{\nu_{1}\cdots\nu_{s}}\,.

(52)

We can now introduce a couple of operations that we can do with tensors to produce new ones. The first one is the tensor product:

Definition 2.7.

Given a $\left(\begin{smallmatrix}p\\ q\end{smallmatrix}\right)$ tensor $S$ and a $\left(\begin{smallmatrix}r\\ s\end{smallmatrix}\right)$ tensor $W$ , we can define a $\left(\begin{smallmatrix}p+r\\ q+s\end{smallmatrix}\right)$ tensor $T=S\otimes W$ . Given $q+s$ vectors $V_{i}\in T_{p}$ , $i=1,\ldots,q+s$ and $p+r$ 1-forms $\omega_{j}\in T^{\ast}_{p}$ , $j=1,\ldots,p+r$ the tensor product $T$ takes the value,

	$\displaystyle T(\omega_{1},\ldots,\omega_{p+r},V_{1},\ldots,V_{q+s})=$
	$\displaystyle\qquad\qquad S(\omega_{1},\ldots,\omega_{p},V_{1},\ldots,V_{q})\,% W(\omega_{p+1},\ldots,\omega_{p+r},V_{q+1},\ldots,V_{q+s})\,.$		(53)

One can easily check that the higher rank tensor we construct in this ways is indeed a tensor since it satisfies the definition 2.6. In terms of components, we can write

\displaystyle T^{\mu_{1},\cdots,\mu_{p+r}}_{\nu_{1},\dots,\nu_{q+s}}=S^{\mu_{1% },\cdots,\mu_{p}}_{\nu_{1},\dots,\nu_{q}}\,W^{\mu_{p+1},\cdots,\mu_{p+r}}_{\nu% _{q+1},\dots,\nu_{q+s}}\,.

(54)

The second operation we can define produces tensors of lower rank and is called a contraction,

Definition 2.8.

Given a rank $\left(\begin{smallmatrix}p\\ q\end{smallmatrix}\right)$ tensor $T$ we can construct a $\left(\begin{smallmatrix}p-1\\ q-1\end{smallmatrix}\right)$ tensor $S$ according to

\displaystyle S(\omega_{1},\ldots,\omega_{p-1},V_{1},\ldots,V_{q-1})=T(\omega_% {1}\ldots,\theta^{(\mu)},\ldots,\omega_{p-1},V_{1},\ldots,e_{(\mu)},\ldots,V_{% q-1})\,,

(55)

where the index $(\mu)$ is being summed over and $\{\theta^{(\mu)}\}$ is the dual basis of $\{e_{(\nu)}\}$ . Notice that we have have $p\,q$ different ways of performing the contraction producing $p\,q$ different tensors $S$ .

In terms of components, starting from a $\left(\begin{smallmatrix}2\\ 2\end{smallmatrix}\right)$ tensor $T^{\alpha\beta}_{\mu\nu}$ we can produce the four inequivalent $\left(\begin{smallmatrix}1\\ 1\end{smallmatrix}\right)$ tensors

\displaystyle T^{\lambda\alpha}_{\lambda\mu}\neq T^{\alpha\lambda}_{\lambda\mu% }\neq T^{\alpha\lambda}_{\mu\lambda}\neq T^{\lambda\alpha}_{\mu\lambda}\,.

(56)

2.4 Differentiation

From a physics point of view, all classical physical laws are expressed as differential equations. It is crucial to understand differentiation on a differentiable manifold, in a way that is independent of coordinate systems. In this sense, a derivative should map tensors to tensors of different rank.

2.4.1 External differentiation of $n$ -forms

Let’s first consider a form field $V_{\mu}$ described in a particular coordinate system $\{x^{\nu}\}$ . It is indeed very tempting to try and defined a tensor whose components are simply given by the partial derivatives $\partial_{x^{\nu}}V_{\mu}$ . We are certainly allowed to do this but the question is whether this object can be promoted to a $\left(\begin{smallmatrix}0\\ 2\end{smallmatrix}\right)$ tensor. If this definition was really independent of the coordinate system, then someone using the coordinate system $\{y^{\mu}=y^{\mu}(x^{\nu})\}$ would have to write the “components” $\partial_{y^{\lambda}}V^{\prime}_{\rho}$ with $V^{\prime}_{\rho}$ the components of the original 1-form in their coordinate system.

Using the transformation rules (27) and (40) we have,

\displaystyle\partial_{y^{\lambda}}V^{\prime}_{\rho}=\frac{\partial x^{\nu}}{% \partial y^{\lambda}}\frac{\partial x^{\sigma}}{\partial y^{\rho}}\,\partial_{% x^{\nu}}V_{\sigma}+V_{\sigma}\,\frac{\partial^{2}x^{\sigma}}{\partial y^{% \lambda}\,\partial y^{\rho}}\,.

(57)

The last term in the equation above spoils the transformation rule (52) for $r=0$ and $s=2$ and therefore we don’t have a tensor. Notice that the “naughty” term is symmetric in the free indices $\lambda$ and $\rho$ . Therefore, if we instead consider the object with the two indices antisymmetrised,

\displaystyle(dV)_{\nu\mu}\equiv 2\,\partial_{\left[x^{\nu}\right.}V_{\left.% \mu\right]}=\partial_{x^{\nu}}V_{\mu}-\partial_{x^{\mu}}V_{\nu}\,,

(58)

the symmetric term in the transformation will drop out and we will be left with

\displaystyle\partial_{\left[y^{\lambda}\right.}V^{\prime}_{\left.\rho\right]}% =\frac{\partial x^{\nu}}{\partial y^{\lambda}}\frac{\partial x^{\sigma}}{% \partial y^{\rho}}\,\partial_{\left[x^{\nu}\right.}V_{\left.\sigma\right]}\,,

(59)

which is exactly what we want in order to have a $\left(\begin{smallmatrix}0\\ 2\end{smallmatrix}\right)$ tensor. In writing the above, we have introduced the symbol $d$ which maps 1-forms to antisymetric $\left(\begin{smallmatrix}0\\ 2\end{smallmatrix}\right)$ tensors.

The second , somewhat simpler, object we want to examine is the transformation rule of partial derivatives $\partial_{x^{\mu}}f$ of a function $f$ . We already know from the rule (27) that under a change of coordinates

\displaystyle\partial_{y^{\mu}}f=\frac{\partial x^{\nu}}{\partial y^{\mu}}\,% \partial_{x^{\nu}}f

(60)

and this certainly agrees with the rule (52) for $r=0$ and $s=1$ . In a given coordinate system, we can therefore think of the partial derivatives of a function as the components of a $\left(\begin{smallmatrix}0\\ 1\end{smallmatrix}\right)$ tensor or 1-form. Using the coordinate basis, we can write,

\displaystyle df=\partial_{x^{\nu}}f\,dx^{\nu}\,.

(61)

We have once again used the symbol $d$ which maps functions to 1-forms.

We saw that the exterior derivative maps functions to 1-forms and 1-forms to a $\left(\begin{smallmatrix}0\\ 2\end{smallmatrix}\right)$ tensor which is antisymmetric in its indices. As we will see shortly, antisymmetric objects are special when partial differentiation is concerned. For this reason we define,

Definition 2.9.

A $p$ -form is a tensor $w_{\mu_{1},\ldots,\mu_{p}}$ of rank $\left(\begin{smallmatrix}0\\ p\end{smallmatrix}\right)$ which is antisymmetric under the exchange of any two of its adjacent indices. For the set of all $p$ -forms defined on the differentiable manifold $M$ we use the symbol $\Lambda^{p}(M)$ .

Under this definition, the exterior derivative maps a 1-form to a 2-form. More generally, the exterior derivative of a $p$ -form $w_{\mu_{1},\ldots,\mu_{p}}$ ,

\displaystyle(dw)_{\mu_{1},\ldots,\mu_{p+1}}=(p+1)\,\partial_{\left[\mu_{1}% \right.}w_{\left.\mu_{2},\ldots,\mu_{p+1}\right]}\,,

(62)

which is obviously antisymmetric in all of its indices due to the antisymmetrisation and therefore a $p+1$ form. One can check that this object indeed transforms as a $\left(\begin{smallmatrix}0\\ p+1\end{smallmatrix}\right)$ tensor under coordinate transformations. The above shows that we can think of the exterior derivative as a map $d:\Lambda^{p}(M)\rightarrow\Lambda^{p+1}(M)$ . To make things unified, we can think of functions as $0$ -forms which are mapped to 1-forms by the exterior derivative.

A natural question to ask is what happens when we apply the exterior derivative on a $p$ -form $w$ twice. The definition (62) is telling us that we would have to antisymmetrise the partial derivatives which would give zero. To see why this is so, we can consider the second exterior derivative of a function,

\displaystyle(d(df))_{\mu\nu}=\partial_{\mu}(df)_{\nu}-\partial_{\nu}(df)_{\mu% }=\partial_{\mu}\partial_{\nu}f-\partial_{\nu}\partial_{\mu}f=0\,,

(63)

since the partial derivatives of a smooth function commute with each other.

2.4.2 The covariant derivative

In the previous subsection we saw that we define a derivative of $p$ -forms by using only their partial derivatives. This is certainly important as we did not have to introduce any additional structure. However, we would still like to be able to have a notion of differentiation for general tensors which are not necessarily forms. For this reason we will have to introduce the covariant derivative:

Definition 2.10.

A covariant derivative $\nabla$ is a map from $\left(\begin{smallmatrix}p\\ q\end{smallmatrix}\right)$ tensors to $\left(\begin{smallmatrix}p\\ q+1\end{smallmatrix}\right)$ tensors such that:

1.

$\nabla(T+S)=\nabla T+\nabla S$
2.

$\nabla(T\otimes S)=(\nabla T)\otimes S+T\otimes(\nabla S)$
3.

It commutes with contraction
4.

If $f$ is a function on $M$ then $\nabla f=df$

The above definition fixes the way that a covariant derivative acts on functions. As we will show shortly, its full action is completely determined by a set of of numbers with three indices $\Gamma^{\lambda}_{\nu\mu}$ . These are called the connection coefficients which are such that

\displaystyle\nabla e_{(\mu)}=\Gamma^{\lambda}_{\nu\mu}\,e_{(\lambda)}\otimes% \theta^{(\nu)}

(64)

telling us precisely how the covariant derivative acts on our chosen basis of the tangent space $T_{p}$ .

In order to discuss the action of the covariant derivative on a general vector field $V$ , we write it in terms of the basis vectors as $V=V^{\mu}\,e_{(\mu)}$ and we think of the components $V^{\mu}$ as functions. Using the first two properties along with the fourth one we can write

\displaystyle\nabla V=\nabla(V^{\mu}\,e_{(\mu)})=e_{(\mu)}\otimes dV^{\mu}+V^{% \mu}\otimes\nabla e_{(\mu)}\,.

(65)

In a coordinate basis we know that we can write $dV^{\mu}=\partial_{\nu}V^{\mu}\,dx^{\nu}$ and after using the definition (64) we have

	$\displaystyle\nabla V$	$\displaystyle=\partial_{\nu}V^{\mu}\,\partial_{\mu}\otimes dx^{\nu}+\Gamma^{% \lambda}_{\nu\mu}V^{\mu}\,\partial_{\lambda}\otimes dx^{\nu}$
		$\displaystyle=(\partial_{\nu}V^{\mu}+\Gamma^{\mu}_{\nu\lambda}V^{\lambda})\,% \partial_{\mu}\otimes dx^{\nu}\,.$		(66)

In the last equation we only renamed the repeated indices. From the above we can read off the components

\displaystyle\nabla_{\mu}V^{\nu}=\partial_{\mu}V^{\nu}+\Gamma^{\nu}_{\mu% \lambda}V^{\lambda}\,,

(67)

in a coordinate basis.

It might not be immediately obvious but the above properties also fix the action of the covariant derivative on 1-forms. To see this, we consider the contraction $\phi=V^{\nu}\omega_{\nu}$ which is a scalar function and we know that

\displaystyle\nabla_{\mu}\phi=\partial_{\mu}\phi=\partial_{\mu}V^{\nu}\,\omega% _{\nu}+V^{\nu}\,\partial_{\mu}\omega_{\nu}\,.

(68)

However, we know that the covariant derivative has to commute with contraction and we can also write,

	$\displaystyle\nabla_{\mu}\phi$	$\displaystyle=\nabla_{\mu}V^{\nu}\,\omega_{\nu}+V^{\nu}\,\nabla_{\mu}\omega_{\nu}$
		$\displaystyle=(\partial_{\mu}V^{\nu}+\Gamma^{\nu}_{\mu\lambda}V^{\lambda})% \omega_{\nu}+V^{\nu}\,\nabla_{\mu}\omega_{\nu}\,,$		(69)

where we used equation (67). By simply comparing the two expressions for the covariant derivative of $\phi$ and by demanding that they hold for any vector $V$ we conclude that,

\displaystyle\nabla_{\mu}\omega_{\nu}=\partial_{\mu}\omega_{\nu}-\Gamma^{% \lambda}_{\mu\nu}\omega_{\lambda}\,.

(70)

The next thing we would like to know is how the covariant derivative acts on the dual basis vectors $\theta^{(\mu)}$ . In order to decide about that we will rethink about the covariant derivative on a general 1-form. This time we are going to write this as

	$\displaystyle\nabla\omega$	$\displaystyle=\nabla(\omega_{\mu}\theta^{(\mu)})=\nabla\omega_{\mu}\otimes% \theta^{(\mu)}+\omega_{\mu}\,\nabla\theta^{(\mu)}$
		$\displaystyle=\partial_{\nu}\omega_{\mu}\,\theta^{(\nu)}\otimes\theta^{(\mu)}+% \omega_{\mu}\,\nabla\theta^{(\mu)}\,.$		(71)

By comparing the above with equation (70) we see that we must have,

\displaystyle\nabla\theta^{(\lambda)}=-\Gamma^{\lambda}_{\mu\nu}\,\theta^{(\nu% )}\otimes\theta^{(\mu)}\,.

(72)

The above result allows us to write the covariant derivative of any tensor. By writing the tensor in terms of the basis as in (49) we find that,

	$\displaystyle\nabla_{\lambda}T^{\mu_{1},\ldots,\mu_{p}}_{\nu_{1},\ldots,\nu_{q% }}=$	$\displaystyle\partial_{\lambda}T^{\mu_{1},\ldots,\mu_{p}}_{\nu_{1},\ldots,\nu_% {q}}+\Gamma^{\mu_{1}}_{\lambda\rho}T^{\rho\mu_{2},\cdots,\mu_{p}}_{\nu_{1},% \dots,\nu_{p}}+\cdots+\Gamma^{\mu_{p}}_{\lambda\rho}T^{\mu_{1},\cdots,\rho}_{% \nu_{1},\dots,\nu_{p}}$
		$\displaystyle-\Gamma^{\rho}_{\lambda\nu_{1}}T^{\mu_{1},\ldots,\mu_{p}}_{\rho,% \ldots,\nu_{q}}-\cdots-\Gamma^{\rho}_{\lambda\nu_{q}}T^{\mu_{1},\ldots,\mu_{p}% }_{\nu_{1},\ldots,\rho}\,.$		(73)

When we defined the connection coefficients in equation (64), we chose to not call the symbol $\Gamma^{\lambda}_{\nu\mu}$ a tensor, despite the fact that it carries three indices. However, it does transform under a coordinate transformation and we want to know the way this is happening. The basic requirement is that the covariant derivative of a vector transforms as a $\left(\begin{smallmatrix}1\\ 1\end{smallmatrix}\right)$ tensor. In a coordinate system different from the one used in e.g. (67) we would write something similar

\displaystyle\nabla_{\mu^{\prime}}\tilde{V}^{\nu^{\prime}}=\partial_{\mu^{% \prime}}\tilde{V}^{\nu^{\prime}}+\tilde{\Gamma}^{\nu\prime}_{\mu^{\prime}% \lambda^{\prime}}\tilde{V}^{\lambda^{\prime}}\,.

(74)

By using the transformation rules (27) and (34) we can write

\displaystyle\partial_{\mu^{\prime}}\tilde{V}^{\nu^{\prime}}=\frac{\partial x^% {\rho}}{\partial x^{\mu^{\prime}}}\frac{\partial x^{\nu^{\prime}}}{\partial x^% {\sigma}}\partial_{\rho}V^{\sigma}+\frac{\partial x^{\rho}}{\partial x^{\mu^{% \prime}}}\frac{\partial^{2}x^{\nu^{\prime}}}{\partial x^{\rho}\,\partial x^{% \sigma}}V^{\sigma}\,.

(75)

Insisting that we should have

\displaystyle\nabla_{\mu^{\prime}}\tilde{V}^{\nu^{\prime}}=\frac{\partial x^{% \rho}}{\partial x^{\mu^{\prime}}}\frac{\partial x^{\nu^{\prime}}}{\partial x^{% \sigma}}\,\nabla_{\rho}V^{\sigma}\,,

(76)

for any vector $V$ we obtain,

\displaystyle\tilde{\Gamma}^{\nu^{\prime}}_{\mu^{\prime}\lambda^{\prime}}=% \frac{\partial x^{\sigma}}{\partial x^{\lambda^{\prime}}}\frac{\partial x^{% \rho}}{\partial x^{\mu^{\prime}}}\,\left[\frac{\partial x^{\nu^{\prime}}}{% \partial x^{\gamma}}\Gamma^{\gamma}_{\sigma\rho}-\frac{\partial^{2}x^{\nu^{% \prime}}}{\partial x^{\rho}\,\partial x^{\sigma}}\right]\,.

(77)

We see that the second term in the square bracket prevents the connection coefficients from forming a tensor but it works exactly in a way that makes the covariant derivative of a tensor another tensor.

It is worth noticing that the “naughty” term above is independent of the connection coefficients themselves. This is telling us that if we had two different connection coefficients $\Gamma^{\lambda}_{\mu\nu}$ and $\bar{\Gamma}^{\lambda}_{\mu\nu}$ , their difference

\displaystyle S^{\lambda}_{\mu\nu}=\Gamma^{\lambda}_{\mu\nu}-\bar{\Gamma}^{% \lambda}_{\mu\nu}\,,

(78)

would be a $\left(\begin{smallmatrix}1\\ 2\end{smallmatrix}\right)$ tensor. A particular choice is $\bar{\Gamma}^{\lambda}_{\mu\nu}=\Gamma^{\lambda}_{\nu\mu}$ and then the tensor

\displaystyle T^{\lambda}_{\mu\nu}=\Gamma^{\lambda}_{\mu\nu}-\Gamma^{\lambda}_% {\nu\mu}\,,

(79)

is the torsion tensor.

We know that partial derivatives of smooth functions commute. For the covariant derivative on the other hand we have that

\displaystyle[\nabla_{\mu},\nabla_{\nu}]f=\nabla_{\mu}\nabla_{\nu}f-\nabla_{% \nu}\nabla_{\mu}f=T^{\lambda}_{\mu\nu}\nabla_{\lambda}f\,,

(80)

which is not necessarily zero. For the rest of the course we will assume that our covariant derivatives will have zero torsion.

We will now consider the commutator of derivatives on vectors and we write,

\displaystyle[\nabla_{\mu},\nabla_{\nu}]V^{\lambda}=\nabla_{\mu}\nabla_{\nu}V^% {\lambda}-\nabla_{\nu}\nabla_{\mu}V^{\lambda}\,.

(81)

Using the covariant differentiation rule (2.4.2) for $p=1$ and $q=1$ we can write,

\displaystyle\nabla_{\mu}\nabla_{\nu}V^{\lambda}=\partial_{\mu}\nabla_{\nu}V^{% \lambda}+\Gamma_{\mu\rho}^{\lambda}\nabla_{\nu}V^{\rho}-\Gamma_{\mu\nu}^{\rho}% \nabla_{\rho}V^{\lambda}\,.

(82)

After a little algebra one can show the surprising fact that the right hand side doesn’t contain any derivatives of the vector field⁵⁵ 5 If we had not restricted our connection coefficients to have zero torsion the final result would read $[\nabla_{\mu},\nabla_{\nu}]V^{\lambda}=R^{\lambda}{}_{\rho\mu\nu}V^{\rho}-T^{% \rho}_{\mu\nu}\nabla_{\rho}V^{\lambda}$ . In fact, it defines the Riemann tensor,

\displaystyle[\nabla_{\mu},\nabla_{\nu}]V^{\lambda}=R^{\lambda}{}_{\rho\mu\nu}% V^{\rho}\,,

(83)

with

\displaystyle R^{\lambda}{}_{\rho\mu\nu}=\partial_{\mu}\Gamma^{\lambda}_{\nu% \rho}-\partial_{\nu}\Gamma^{\lambda}_{\mu\rho}+\Gamma^{\lambda}_{\mu\sigma}% \Gamma^{\sigma}_{\nu\rho}-\Gamma^{\lambda}_{\nu\sigma}\Gamma^{\sigma}_{\mu\rho% }\,.

(84)

Regarding the commutator of derivatives, we can ask the same question for 1-forms giving the result

\displaystyle[\nabla_{\mu},\nabla_{\nu}]\omega_{\lambda}=-R^{\rho}{}_{\lambda% \mu\nu}\omega_{\rho}\,.

(85)

2.5 The metric

In previous sections we have discussed the tangent space $T_{p}$ which is a linear space. A natural additional structure to consider is the inner product in $T_{p}$ for all points $p\in M$ . This can be done by introducing the metric:

Definition 2.11.

The metric $g_{\mu\nu}$ is a $\left(\begin{smallmatrix}0\\ 2\end{smallmatrix}\right)$ tensor which is symmetric in its indices $g_{\mu\nu}=g_{\nu\mu}$ and is non-degenerate with $g=\det(g_{\mu\nu})\neq 0$ .

The inner product of two vectors $V,W\in T_{p}$ is simply then $g(V,W)=g(W,V)=g_{\mu\nu}V^{\nu}W^{\mu}$ . In the context of general relativity we will relax the usual positivity of the inner product. We define:

Definition 2.12.

The signature is the difference between the number of positive and negative eigenvalues of the metric. If all the eigenvalues are positive the manifold is called Riemannian. If one of the eigenvalues is negative, the metric is called pseudo-Riemannian.

Notice that it is meaningful to discuss the signature of the spacetime globally. The fact that the metric is a non-degenerate tensor prevents the flip of any of the signs of its eigenvalues as we navigate through the manifold. According to relativity, spacetime will simply be a pseudo-Riemannian manifold. In the case of empty spacetime this was simply flat space equipped with the Minkowski metric (17). Similarly to Minkowski space, we call a vector $V^{\mu}$ to be timelike if $g(V,V)<0$ , null or lightlike if $g(V,V)=0$ and spacelike if $g(V,V)>0$ .

In terms of notation, we will often write the components of the metric tensor in the form,

\displaystyle ds^{2}=g_{\mu\nu}(x^{\lambda})\,dx^{\mu}dx^{\nu}\,.

(86)

Moreover, since the metric is a non-degenerate tensor, we can define the inverse $\left(\begin{smallmatrix}2\\ 0\end{smallmatrix}\right)$ tensor $g^{\mu\nu}$ such that,

\displaystyle g^{\mu\lambda}g_{\lambda\nu}=\delta^{\mu}_{\nu}\,.

(87)

Now that we have the notion of length for the vectors in $T_{p}$ , we can use our experience from special relativity in section 1 to define the length $L[\gamma]$ of a curve $\gamma:\mathbb{R}\rightarrow M$ . In a general manifold our requirements for the definition of $L[\gamma]$ are that:

•

It has to be invariant under coordinate transformations
•

It has to be invariant under reparametrisations

The above are both satisfied by the definition

Definition 2.13.

The length $L[\gamma]$ of a curve $\gamma:\mathbb{R}\rightarrow M$ with parameter $\lambda$ and tangent vector $\dot{x}^{\mu}$ is,

\displaystyle L[\gamma]=\int_{\lambda_{i}}^{\lambda_{f}}\sqrt{|g_{\mu\nu}(x^{% \rho}(\lambda))\,\dot{x}^{\mu}(\lambda)\,\dot{x}^{\nu}(\lambda)|}\,d\lambda\,.

(88)

The above expression is manifestly invariant under coordinate transformations. Just like in the case of Minowski space in equation (21), the square root is there to take care of reparametrisation invariance.

The final fact we want to mention is true for any vector space with an inner product. In the context of differential geometry we refer to it as the “lowering” and “raising” of indices. The metric provides a natural isomorphism between $T_{p}$ and $T_{p}^{\ast}$ . This is simply the linear map $\phi:T_{p}\rightarrow T_{p}^{\ast}$ such that

\displaystyle(\phi(V))(W)=g(V,W)\,,

(89)

which indeed makes $\phi(V)$ a well defined linear functional acting on the random vector $W$ . In components we see that

\displaystyle\phi(V)_{\nu}=V_{\nu}\equiv g_{\nu\mu}V^{\mu}\,.

(90)

From now on we don’t need to write $\phi(V)$ , we will only write $V_{\mu}$ with its index down. Moreover, we can lower and raise the indices of any tensor e.g.

\displaystyle T^{\mu\nu\lambda}=g^{\sigma\lambda}T^{\mu\nu}{}_{\sigma},\qquad T% ^{\mu}{}_{\rho\sigma}=g_{\nu\rho}T^{\mu\nu}{}_{\sigma}\,.

(91)

2.6 Parallel transport and the Levi-Civita connection

An important concept in defining straight lines is that of parallel transporting a vector (or a tensor) along a particular curve. From a physics point of view, we can understand free motion as the motion in which the tangent vector (or velocity) does not change along a spacetime curve. Before moving to the physics aspects of this statement, we might want to better understand parallel transport itself.

This is particularly intuitive and technically easy to understand in Minkowski space when using cartesian coordinates. Suppose that in that case we have a curve $x^{\mu}(\lambda)$ with parameter $\lambda$ . Suppose also that along the curve we also have a vector $A^{\mu}(\lambda)$ which remains parallel to itself as we move along the curve. It is natural to write that the condition for parallel transport in this case is simply

\displaystyle\frac{d}{d\lambda}A^{\mu}(\lambda)=0\,.

(92)

In order to make further progress and be able to generalise this statement to a curved space, we consider a vector field $W^{\mu}(x^{\nu})$ such that when it is restricted on our curve, it is equal to the vector $A^{\mu}(\lambda)$ . In order words the vector field $W^{\mu}$ is such that

\displaystyle W^{\mu}(x^{\nu}(\lambda))=A^{\mu}(\lambda)\,.

(93)

The parallel transport condition then reads

\displaystyle\dot{x}^{\nu}\partial_{\nu}W^{\mu}=0\Rightarrow V^{\nu}\partial_{% \nu}W^{\mu}=0

(94)

where $V^{\nu}$ is the tangent vector to our curve. More generally, we will be frequently using the fact that along a curve $x^{\mu}(\lambda)$ we can always replace

\displaystyle V^{\mu}\partial_{\mu}\rightarrow\frac{d}{d\lambda}\,.

(95)

From the above we see that in a general manifold with connection $\nabla_{\nu}$ it makes sense to replace this condition by

\displaystyle V^{\nu}\nabla_{\nu}W^{\mu}=0\,.

(96)

For a tensor $T^{\mu_{1},\ldots,\mu_{p}}_{\nu_{1},\ldots,\nu_{q}}$ that is parallel transported along a curve $\gamma$ with tangent $V^{\mu}$ it makes sense to have,

\displaystyle V^{\lambda}\,\nabla_{\lambda}T^{\mu_{1},\ldots,\mu_{p}}_{\nu_{1}% ,\ldots,\nu_{q}}=0\,.

(97)

In the previous section we introduced the notion of an inner product between vectors and vector fields. It is natural to ask the question of what happens with the inner product when we parallel transport two vector fields $A^{\mu}$ and $B^{\mu}$ along a curve $\gamma$ with tangent vector $V^{\mu}$ . We would certainly expect that we should have a constant inner product along $\gamma$ provided that

\displaystyle V^{\lambda}\nabla_{\lambda}A^{\mu}=0,\qquad V^{\lambda}\nabla_{% \lambda}B^{\mu}=0\,.

(98)

However, what we have instead is,

\displaystyle V^{\lambda}\nabla_{\lambda}g(A,B)=V^{\lambda}\nabla_{\lambda}(g_% {\mu\nu}A^{\mu}B^{\nu})=V^{\lambda}A^{\mu}B^{\nu}\,\nabla_{\lambda}g_{\mu\nu}\,,

(99)

which is not zero in general unless the covariant derivative of $g_{\mu\nu}$ vanishes. It is only then that we can make sense of parallel transport of vectors while maintaining their inner product.

Definition 2.14.

The metric singles out a unique torsion free covariant derivative. The metric compatible or Levi-Civita connection defined by requiring

\displaystyle\nabla_{\lambda}g_{\mu\nu}=0\,.

(100)

In fact, as we will show, we can express the Levi-Civita connection $\Gamma^{\lambda}_{\mu\nu}$ in terms of the metric components. In order to do this, we write the defining condition three times with the three indices interchanged,

$\displaystyle\nabla_{\lambda}g_{\mu\nu}$	$\displaystyle=\partial_{\lambda}g_{\mu\nu}-\Gamma^{\sigma}_{\lambda\mu}g_{% \sigma\nu}-\Gamma^{\sigma}_{\lambda\nu}g_{\mu\sigma}=0$
$\displaystyle\nabla_{\mu}g_{\lambda\nu}$	$\displaystyle=\partial_{\mu}g_{\lambda\nu}-\Gamma^{\sigma}_{\lambda\mu}g_{% \sigma\nu}-\Gamma^{\sigma}_{\mu\nu}g_{\lambda\sigma}=0$
$\displaystyle\nabla_{\nu}g_{\lambda\mu}$	$\displaystyle=\partial_{\nu}g_{\lambda\mu}-\Gamma^{\sigma}_{\lambda\nu}g_{% \sigma\mu}-\Gamma^{\sigma}_{\mu\nu}g_{\lambda\sigma}=0\,.$	(101)

By subtracting the bottom two equations and from the top one we find,

\displaystyle\Gamma^{\rho}_{\mu\nu}=\frac{1}{2}g^{\rho\lambda}\left(\partial_{% \mu}g_{\nu\lambda}+\partial_{\nu}g_{\lambda\mu}-\partial_{\lambda}g_{\mu\nu}% \right)\,,

(102)

which is a unique solution as promised. These are called the Christoffel symbols.

2.7 Curvature

In this section we will examine curvature via the Riemann tensor that we defined in equation (84). We will do this for the Levi-Civita connection that we introduced in the previous section with connection coefficients given by the Christoffel symbols in equation (102). We now define $R_{\rho\sigma\mu\nu}=g_{\rho\lambda}R^{\lambda}{}_{\sigma\mu\nu}$ for which there are four algebraic properties which hold by construction:

•

$R_{\rho\sigma\mu\nu}=-R_{\sigma\rho\mu\nu}$ ,
•

$R_{\rho\sigma\mu\nu}=-R_{\rho\sigma\nu\mu}$ ,
•

$R_{\rho\left[\sigma\mu\nu\right]}=0\,,$ (or $R_{\rho\sigma\mu\nu}+R_{\rho\nu\sigma\mu}+R_{\rho\mu\nu\sigma}=0$ ),
•

$R_{\rho\sigma\mu\nu}=R_{\mu\nu\rho\sigma}$ .

It is worth noting that the last identity is not independent of the first three. The above identities significantly reduce the number of independent components of the Riemann tensor.

For a generic $\left(\begin{smallmatrix}0\\ 4\end{smallmatrix}\right)$ tensor in $n$ -dimensions there are $n^{4}$ independent components as there as $n$ different choices for each index. However, the Riemann tensor satisfies three independent symmetry properties under the exchange of its indices and that significantly constrains the independent ones. From the first two properties, we see that it is antisymmetric under the exchange of the first two and the last two indices. This is telling us that for each pair of these indices there are $n(n-1)/2$ independent choices. By momentarily ignoring the third constraint this is telling us that we would have $n^{2}(n-1)^{2}/4$ independent components. Since the fourth property is not independent, we would only have to count how many constraints we need to impose due to the third property and subtract from the $n^{2}(n-1)^{2}/4$ components allowed from the first two.

In order to count these constraints we need to find how many choices we have for the first index $\rho$ , how many choices we have for the antisymmetric group of three indices $\left[\sigma\mu\nu\right]$ and multiply them. For the $\rho$ index we simply have $n$ choices. For the group of the three anti-symmetrised indices, we know that they all have to be different from each other, otherwise we would get something trivial due to the antisymmetry. Simple combinatorics then suggest that we have $\left(\begin{smallmatrix}n\\ 3\end{smallmatrix}\right)=\frac{n!}{3!\,(n-3)!}$ choices. Putting everything together, we conclude that the Riemann tensor has

\displaystyle\frac{n^{2}(n-1)^{2}}{4}-n\,\frac{n!}{3!\,(n-3)!}=\frac{1}{12}n^{% 2}(n^{2}-1)\,,

(103)

independent components.

We see that in one dimension the Riemann tensor is always trivial which is obvious from the fact that it has at least one antisymmetric pair of indices and there is no room to select different indices in one dimension. A more geometric way to see this is to notice that even if we have a non-trivial metric in dimensions $ds^{2}=f(x)\,dx^{2}$ , we can always find a coordinate transformation $y=y(x)$ to bring the metric to the flat form $ds^{2}=dy^{2}$ . We are particularly interested in the four dimensional case where we find that the Riemann tensor has $20$ independent components.

Apart from the algebraic properties, the Riemann tensor satisfies the second Bianchi identity

\displaystyle\nabla_{\left[\lambda\right.}R_{\left.\rho\sigma\right]\mu\nu}=0\,,

(104)

which is a set of differential identities.

The Riemann tensor is going to be the object that will be important in any theory of gravity. It is built out of derivatives of the metric in a way that it forms a tensor and it is therefore geometric. In loose terms, it is going to be the kinetic term in the equations of motion of the metric which is a dynamical object itself. Even though the Riemann tensor seems to be an object which contains all the coordinate independent information, Einstein’s General Relativity is based on the Ricci tensor, which is defined by

Definition 2.15.

\displaystyle R_{\mu\nu}=R^{\lambda}{}_{\mu\lambda\nu}\,,

(105)

which is symmetric $R_{\mu\nu}=R_{\nu\mu}$ and has only $n(n+1)/2$ independent components in $n$ dimensions.

Another useful quantity we can define characterising the curvature of our geometry is the Ricci scalar,

Definition 2.16.

\displaystyle R=g^{\mu\nu}R_{\mu\nu}\,.

(106)

Finally, by using the above objects, we can define the Einstein tensor,

Definition 2.17.

\displaystyle G_{\mu\nu}=R_{\mu\nu}-\frac{1}{2}g_{\mu\nu}\,R\,,

(107)

for which the Bianchi identity (104) implies

\displaystyle\nabla^{\mu}G_{\mu\nu}=0\,.

(108)

This identity is one of the most important ones in this section. We will have the chance to appreciate it when we discuss Einstein’s equations for gravity in later sections.

2.8 Integration on manifolds

Several quantities like the action and conserved charges in physics require a notion of integration. Something quite tempting is to define the integral of a function according to,

\displaystyle I=\int_{\phi_{1}(U)}d^{n}x\,f(x^{\mu})\,,

(109)

for a given coordinate system $(U,\phi_{1})$ with coordinate $x^{\mu}$ . Someone that prefers to work in a different coordinate system $(U,\phi_{2})$ with coordinates $y^{\nu}=y^{\nu}(x^{\mu})$ should be able to use the same definition and find the same number. In their coordinate system they would write,

\displaystyle I^{\prime}=\int_{\phi_{2}(U)}d^{n}y\,f(y^{\nu})=\int_{\phi_{1}(U% )}d^{n}x\left|\frac{\partial y^{\nu}}{\partial x^{\mu}}\right|f(y^{\nu}(x^{\mu% }))\neq I\,.

(110)

The above shows that we need to come up with something slightly more sophisticated than integrating functions in a particular coordinate system. The quantity that shows up and spoils things for us is the Jacobian determinant of the coordinate transformation.

The next thing we want to try is to integrate $n$ -forms on an $n$ dimensional manifold. Before doing that, let’s see how an $n$ -form transforms under coordinate transformations in more detail. Because an $n$ -form $\omega_{\mu_{1},\ldots,\mu_{n}}$ is totally antisymmetric, all the components are parametrised by a single number according to

\displaystyle\omega_{\mu_{1},\ldots,\mu_{n}}=\Omega\,\varepsilon_{\mu_{1},% \ldots,\mu_{n}}\,,

(111)

where $\varepsilon_{\mu_{1},\ldots,\mu_{n}}$ is the totally antisymmetric symbol with $\varepsilon_{0,\ldots,n-1}=1$ . In a different coordinate system with coordinates $y^{\mu}$ we will have that

\displaystyle\omega^{\prime}_{\mu^{\prime}_{1},\ldots,\mu^{\prime}_{n}}=\Omega% ^{\prime}\,\varepsilon_{\mu^{\prime}_{1},\ldots,\mu^{\prime}_{n}}\,.

(112)

However, we know from the transformation rules (52) for $p=0$ and $q=n$ that we must have

$\displaystyle\omega^{\prime}_{\mu^{\prime}_{1},\ldots,\mu^{\prime}_{n}}$	$\displaystyle=\frac{\partial x^{\mu_{1}}}{\partial y^{\mu^{\prime}_{1}}}\cdots% \frac{\partial x^{\mu_{n}}}{\partial y^{\mu^{\prime}_{n}}}\,\omega_{\mu_{1},% \ldots,\mu_{n}}\Rightarrow$
$\displaystyle\Omega^{\prime}\,\varepsilon_{\mu^{\prime}_{1},\ldots,\mu^{\prime% }_{n}}$	$\displaystyle=\frac{\partial x^{\mu_{1}}}{\partial y^{\mu^{\prime}_{1}}}\cdots% \frac{\partial x^{\mu_{n}}}{\partial y^{\mu^{\prime}_{n}}}\,\Omega\,% \varepsilon_{\mu_{1},\ldots,\mu_{n}}\Rightarrow$
$\displaystyle\Omega^{\prime}$	$\displaystyle=\left\|\frac{\partial x^{\mu}}{\partial y^{\nu}}\right\|\,\Omega\,.$	(113)

We see that the components of $n$ -forms transform with the inverse of the Jacobian determinant. This allows us to write the integral of an $n$ -form

\displaystyle I=\int_{M}\omega\equiv\int_{\phi(M)}d^{n}x\,\Omega(x^{\mu})\,,

(114)

and the previous discussion shows that the outcome of integration will be coordinate system independent.

We see that even though we cannot define the integral of a scalar function, we still define the integrals of $n$ -forms. However, we would still like to find a notion in which we can integrate functions. Before attempting that, we need to mention a universal $n$ -form which exists for all manifolds equipped with a metric $g_{\mu\nu}$ . This is the volume form $\mathrm{vol}$ with components,

\displaystyle\mathrm{vol}_{\mu_{1},\ldots,\mu_{n}}=\sqrt{|g|}\,\varepsilon_{% \mu_{1},\ldots,\mu_{n}}\,,

(115)

where $g$ is the determinant of the metric tensor. For a pseudo-Riemannian manifold the absolute value can be removed by introducing a minus sign i.e. $|g|=-g$ .

After this definition, we can think of the integral of a scalar function $f$ as

\displaystyle I=\int_{M}f\,\mathrm{vol}=\int_{\phi(M)}d^{n}x\,\sqrt{-g(x^{\mu}% )}\,f(x^{\mu})\,.

(116)

3 Free Particles and Fields

3.1 Free particles

As we mentioned in section 1, Einstein’s idea about gravity is that we experience gravity by moving on straight lines inside a non-trivial curved background. One might try to use two seemingly different definitions for when a curve $\gamma:\mathbb{R}\rightarrow M$ is a straight line. The two proposals are that a straight line:

•

extremises its length $L[\gamma]$ as defined in equation (25)
•

has its tangent vector $V^{\mu}$ parallel transported along its trajectory i.e. there is no acceleration and we have to satisfy the geodesic equation

$\displaystyle V^{\mu}\nabla_{\mu}V^{\nu}=\alpha\,V^{\nu}\,,$ (117)

with $\alpha$ some scalar function.

We might be surprised by the potentially non-trivial right hand side in equation (117). After all, when we defined parallel transport in equation (97) the right hand side was trivial. The key point is that equation (97) is invariant under reparametrisations of the curve since those do not alter the tensor that is being transported. The difference in this case is that the tensor that is parallel transported coincides with the tangent vector and that depends on the parametrisation of the curve.

To see this, suppose that we change the parameter $\lambda$ of our curve to $\tau$ according to $\lambda=f(\tau)$ . The corresponding tangent vectors are $V^{\mu}=\frac{d}{d\lambda}x^{\mu}(\lambda)$ and $Y^{\mu}=\frac{d}{d\tau}x^{\mu}(\tau)$ . By using the chain rule, we can show that these are related by,

\displaystyle Y^{\mu}=f^{\prime}\,V^{\mu}\,.

(118)

If we satisfy the geodesic equation (117) in the $\tau$ parametrisation with $\alpha=\beta$ then by using the replacement rule (95) we can write,

$\displaystyle Y^{\mu}\nabla_{\mu}Y^{\nu}$	$\displaystyle=\beta\,Y^{\nu}\Rightarrow$
$\displaystyle f^{\prime}\,V^{\mu}\nabla_{\mu}(f^{\prime\,}V^{\nu})$	$\displaystyle=\beta\,f^{\prime}\,V^{\nu}\Rightarrow$
$\displaystyle f^{\prime\prime}\,V^{\nu}+(f^{\prime})^{2}\,V^{\mu}\nabla_{\mu}V% ^{\nu}$	$\displaystyle=\beta\,f^{\prime}\,V^{\nu}\Rightarrow$	(119)
$\displaystyle V^{\mu}\nabla_{\mu}V^{\nu}$	$\displaystyle=\frac{f^{\prime}\,\beta-f^{\prime\prime}}{(f^{\prime})^{2}}\,V^{% \nu}\,,$	(120)

from where we read off that $\alpha=\frac{f^{\prime}\,\beta-f^{\prime\prime}}{(f^{\prime})^{2}}$ for the $\lambda$ parametrisation and therefore $\alpha$ depends on our parameter choice. This is telling us that we can choose it so that $\alpha=0$ in the geodesic equation (117).

Definition 3.1.

The parametrisation for which the geodesic equation (117) has $\alpha=0$ is called affine.

The geodesic equation (117) when written in coordinates reads,

	$\displaystyle V^{\mu}\,\partial_{\mu}V^{\nu}+\Gamma^{\nu}_{\mu\lambda}V^{\mu}V% ^{\lambda}$	$\displaystyle=\alpha\,V^{\nu}\rightarrow$
	$\displaystyle\ddot{x}^{\nu}(\lambda)+\Gamma^{\nu}_{\mu\lambda}(x^{\rho}(% \lambda))\dot{x}^{\mu}(\lambda)\dot{x}^{\lambda}(\lambda)$	$\displaystyle=\alpha\,\dot{x}^{\nu}(\lambda)\,,$		(121)

where we used the replacement rule (95). In order to show the equivalence between the two proposals for the trajectory of a free particle, we can also extremise the length (25) with respect to the coordinate $x^{\mu}(\lambda)$ of the curve to find the same equation with,

\displaystyle\alpha=\frac{d}{d\lambda}\ln\sqrt{|g_{\mu\nu}\dot{x}^{\mu}\dot{x}% ^{\nu}|}\,.

(122)

Notice that we can show the above expression for $\alpha$ starting from the geodesic equation (117) and contracting both sides with the vector $V^{\mu}$ .

From the above expression for $\alpha$ we see that in the affine parametrisation the norm of the tangent vector remains invariant throughout the whole motion of the particle.

A question we would like to address is whether there is an action principle for the curve $x^{\mu}(\tau)$ from which we can obtain the geodesic equation (117) in an affine parametrisation with $\alpha=0$ . One can easily check that the action

\displaystyle L^{\prime}[x^{\mu}]=\int d\tau\,g_{\mu\nu}(x^{\lambda}(\tau))\,V% ^{\mu}(\tau)\,V^{\nu}(\tau)\,,

(123)

yields the correct equations of motion. As we might had expected, this is no longer invariant under reparametrisations as the parameter has been chosen to be affine.

3.2 The Newtonian limit

A natural question to ask is what happens in a limit where our particles move at small velocity⁶⁶ 6 For a particle which is not “moving” at all in space in our coordinate system we would have $\dot{x}^{i}=0$ and $\dot{x}^{0}=c$ . with $\dot{x}^{i}\ll\dot{x}^{0}$ in a very weak, almost static gravitational background which is very close to being Minkowski spacetime. This is suggesting that background metric is infinitesimally close to Minkowski space. The metric we can write then takes the form,

\displaystyle g_{\mu\nu}=\eta_{\mu\nu}+h_{\mu\nu}\,,

(124)

with $h_{\mu\nu}$ a small correction. Moreover, since our background is almost static, we practically assume that the time partial derivatives are much smaller than the spatial ones i.e. $\partial_{0}h_{\mu\nu}\ll\partial_{i}h_{\mu\nu}$ .

We now examine the geodesic motion of our free particle by choosing an affine parameter $\tau$ in equation (3.1),

$\displaystyle\ddot{x}^{i}+\Gamma^{i}_{\mu\nu}\,\dot{x}^{\mu}\dot{x}^{\nu}=$	$\displaystyle 0\Rightarrow$
$\displaystyle\ddot{x}^{i}+\Gamma^{i}_{00}\,\dot{x}^{0}\dot{x}^{0}=$	$\displaystyle 0\Rightarrow$
$\displaystyle\ddot{x}^{i}+\Gamma^{i}_{00}c^{2}=$	$\displaystyle 0\,.$	(125)

In the above equation we dropped the subleading terms which contain $\dot{x}^{i}$ and we approximated $\dot{x}^{0}\approx c$ which is true for slowly moving particles. By using equation (102) for the Christoffel symbols we can write,

	$\displaystyle\Gamma^{i}_{00}$	$\displaystyle=\frac{1}{2}g^{ij}\left(\partial_{0}g_{0j}+\partial_{0}g_{0j}-% \partial_{j}g_{00}\right)\Rightarrow$
	$\displaystyle\Gamma^{i}_{00}$	$\displaystyle\approx-\frac{1}{2}\eta^{ij}\partial_{j}h_{00}\,,$		(126)

where we dropped all the partial derivatives in time since our background is almost static.

Setting $h_{00}=-2\Phi$ , we have the equation of motion

\displaystyle\frac{d^{2}}{d\tau^{2}}x^{i}=-c^{2}\,\partial_{i}\Phi\Rightarrow% \frac{d^{2}}{dt^{2}}x^{i}=-\,\partial_{i}\Phi

(127)

which is very similar to Newton’s equation of the motion of a massive particle in a gravitational potential $\Phi$ .

At this point need to stress that the equivalence principle we discussed in section 1 was crucial to find this kind of agreement. If the mass of inertia was not the same with the gravitational mass, Newton’s law would give an acceleration that would depend on them and agreement with Einstein’s theory would not be possible.

3.3 Classical fields and the stress tensor

In the previous section we dealt with free particles starting from our intuition in flat space and in cartesian coordinates. The key step was to replace,

\displaystyle V^{\mu}\partial_{\mu}T\rightarrow V^{\mu}\nabla_{\mu}T\,.

(128)

For a general field theory, the steps to understand it in curved spacetimes is very similar, the rule is to replace that partial derivatives of Minkowski space in cartesian coordinates by covariant derivatives,

\displaystyle\partial_{\mu}\rightarrow\nabla_{\mu}\,.

(129)

The simplest example we can think of is a scalar field $\phi$ with a potential $V(\phi)$ . In Minkowski spacetime with cartesian coordinates the equation of motion reads

\displaystyle\partial_{\mu}\partial^{\mu}\phi-V^{\prime}(\phi)=0\,.

(130)

In a more general spacetime this should be replaced by

\displaystyle\nabla_{\mu}\nabla^{\mu}\phi-V^{\prime}(\phi)=0\,.

(131)

Note that this has nothing to do with curvature, it is all about writing the correct equation of motion in a coordinate free fashion.

For the case of the Maxwell field $A_{\mu}$ , the equations of motion in the cartesian coordinates of Minkowski read,

	$\displaystyle\partial_{\mu}F^{\mu\nu}$	$\displaystyle=0\,,$
	$\displaystyle F_{\mu\nu}$	$\displaystyle=\partial_{\mu}A_{\nu}-\partial_{\nu}A_{\mu}\,.$		(132)

According to our previous discussion these should be replaced by

	$\displaystyle\nabla_{\mu}F^{\mu\nu}$	$\displaystyle=0\,,$
	$\displaystyle F_{\mu\nu}$	$\displaystyle=\nabla_{\mu}A_{\nu}-\nabla_{\nu}A_{\mu}=\partial_{\mu}A_{\nu}-% \partial_{\nu}A_{\mu}\,.$		(133)

Turning now our attention to the action for e.g. the scalar field we have that

\displaystyle S_{scalar}=\int d^{n}x\left(-\frac{1}{2}\partial_{\mu}\phi% \partial^{\mu}\phi-V(\phi)\right)\rightarrow\int d^{n}x\sqrt{-g}\left(-\frac{1% }{2}\nabla_{\mu}\phi\nabla^{\mu}\phi-V(\phi)\right)\,,

(134)

while for the Maxwell we have

\displaystyle S_{Maxwell}=\int d^{n}x\,\left(-\frac{1}{4}F^{\mu\nu}F_{\mu\nu}% \right)\rightarrow\int d^{n}x\sqrt{-g}\,\left(-\frac{1}{4}F^{\mu\nu}F_{\mu\nu}% \right)\,.

(135)

A good exercise is to show that the extremisation of the actions (134) and (135) indeed produces the equations of motion (131) and (3.3) respectively.

From the above we see that, for a fixed background metric, the action of e.g. a scalar field $\phi$ is also a functional of the metric $S[g_{\mu,\nu},\phi]$ according to

\displaystyle S_{matter}[g_{\mu,\nu},\phi]=\int d^{x}\mathcal{L}(g_{\mu\nu},% \phi,\nabla_{\mu}\phi)\,,

(136)

with $\mathcal{L}$ being the Lagrangian density. The equation of motion for the scalar is then given by

\displaystyle\frac{\delta S_{matter}}{\delta\phi(x^{\mu})}=0\,.

(137)

We now want to do something that might look weird.

For the time being, the metric does not satisfy any equations of motion, it is just a fixed background. We now want to vary the action with respect to both the field $\phi$ as well as the background metric to find,

\displaystyle\delta S_{matter}[g_{\mu\nu},\phi]=\int d^{n}x\,\left[\frac{% \delta S}{\delta g_{\mu\nu}(x^{\lambda})}\,\delta g_{\mu\nu}(x^{\lambda})+% \frac{\delta S}{\delta\phi(x^{\lambda})}\,\delta\phi(x^{\lambda})\right]\,.

(138)

After evaluating the above variation on a solution of the field equations of motion the second term drops out and we find,

\displaystyle\delta S_{matter}[g_{\mu\nu},\phi]=\frac{1}{2}\int d^{n}x\,\sqrt{% -g}\,T_{matter}^{\mu\nu}\delta g_{\mu\nu}\,,

(139)

where we defined the stress-enrgy tensor,

\displaystyle T_{matter}^{\mu\nu}(x^{\lambda})=2\frac{1}{\sqrt{-g(x^{\lambda})% }}\,\frac{\delta S_{matter}}{\delta g_{\mu\nu}(x^{\lambda})}\,,

(140)

which is symmetric in its indices. This is an important tensor and as we will show it always satisfies

\displaystyle\nabla_{\mu}T^{\mu\nu}_{matter}=0\,.

(141)

In order to show this, we consider a small change of coordinates,

\displaystyle x^{\mu\prime}=x^{\mu}-\varepsilon\,\xi^{\mu}(x^{\nu})\Rightarrow x% ^{\mu}=x^{\mu\prime}+\varepsilon\,\xi^{\mu}(x^{\nu\prime})\,,

(142)

with $0<\varepsilon\ll 1$ . We now want to write an expression for the change of the metric components $\delta g_{\mu\nu}(x^{\lambda\prime})$ up to order $\mathcal{O}(\varepsilon)$ . In order to do this we start from the general rule (52) for $p=0$ and $q=2$ :

$\displaystyle g^{\prime}_{\mu\nu}(x^{\lambda\prime})$	$\displaystyle=\frac{\partial x^{\sigma}}{\partial x^{\mu\prime}}\frac{\partial x% ^{\rho}}{\partial x^{\nu\prime}}\,g_{\rho\sigma}(x^{\lambda})$
	$\displaystyle=\left(\delta^{\sigma}_{\mu}+\varepsilon\,\partial^{\prime}_{\mu}% \xi^{\sigma}\right)\,\left(\delta^{\rho}_{\nu}+\varepsilon\,\partial^{\prime}_% {\nu}\xi^{\rho}\right)\,g_{\sigma\rho}(x^{\lambda})$
	$\displaystyle\approx g_{\mu\nu}(x^{\lambda})+\varepsilon\,\left(g_{\sigma\nu}% \partial_{\mu}\xi^{\sigma}+g_{\mu\rho}\partial_{\nu}\xi^{\rho}\right)$
	$\displaystyle=g_{\mu\nu}(x^{\lambda\prime}+\varepsilon\,\xi^{\lambda})+% \varepsilon\,\left(g_{\sigma\nu}\partial_{\mu}\xi^{\sigma}+g_{\mu\rho}\partial% _{\nu}\xi^{\rho}\right)$
	$\displaystyle\approx g_{\mu\nu}(x^{\lambda\prime})+\varepsilon\,\left(\xi^{% \sigma}\partial_{\sigma}g_{\mu\nu}+g_{\sigma\nu}\partial_{\mu}\xi^{\sigma}+g_{% \mu\rho}\partial_{\nu}\xi^{\rho}\right)\Rightarrow$
$\displaystyle\delta g_{\mu\nu}$	$\displaystyle=\varepsilon\,\left(\xi^{\sigma}\partial_{\sigma}g_{\mu\nu}+g_{% \sigma\nu}\partial_{\mu}\xi^{\sigma}+g_{\mu\rho}\partial_{\nu}\xi^{\rho}\right)$
$\displaystyle\delta g_{\mu\nu}$	$\displaystyle=\varepsilon\,2\,\nabla_{\left(\mu\right.}\xi_{\left.\nu\right)}\,.$	(143)

We saw that the variation of the on-shell action with respect to the metric yields the stress tensor in equation (139), independently of the variation we are performing. We will now consider the variation generated by the small change of coordinates (142). Since the action is defined through a coordinate independent integral, for this particular variation we must have,

$\displaystyle\delta S_{matter}[g_{\mu\nu},\phi]$	$\displaystyle=0\Rightarrow$
$\displaystyle\varepsilon\,\int d^{n}x\,\sqrt{-g}\,T_{matter}^{\mu\nu}\,\nabla_% {\left(\mu\right.}\xi_{\left.\nu\right)}$	$\displaystyle=0\Rightarrow$
$\displaystyle\int d^{n}x\,\sqrt{-g}\,\xi_{\nu}\,\nabla_{\mu}T_{matter}^{\mu\nu}$	$\displaystyle=0\,,$	(144)

for any smooth vector $\xi^{\mu}$ and therefore we must have (141).

3.4 Symmetries and conservation laws

Under an infinitesimal coordinate transformation (142), the change of metric is given by (3.3). This naturally leads to the discussion of symmetries in general spacetimes. A small coordinate transformation is a symmetry if it leaves the metric invariant i.e. when $\delta g_{\mu\nu}=0$ . In this case the coordinate transformation is generated by a Killing vector $K^{\mu}$ such that

\displaystyle\nabla_{\left(\mu\right.}K_{\left.\nu\right)}=0\,.

(145)

Conversely, if the background metric is such that it allows the existence of a Killing vector $K^{\mu}$ satisfying (145), it is invariant under the infinitesimal coordinate transformation

\displaystyle x^{\mu\prime}=x^{\mu}-\varepsilon\,K^{\mu}(x^{\nu})\,.

(146)

In section 1 we discussed Minkowski spacetime and its symmetries which form for the Poincare group. In our current language Minkowski space in four dimensions has the metric

\displaystyle ds^{2}=\eta_{\mu\nu}\,dx^{\mu}dx^{\nu}=-(dx^{0})^{2}+(dx^{1})^{2% }+(dx^{2})^{2}+(dx^{3})^{2}\,,

(147)

admitting the Killing vectors associated to:

•

Translations, $T_{(\mu)}=\partial_{\mu}$ ,
•

Lorentz transformations, $L_{(\mu\nu)}=x^{\mu}\partial_{\nu}-x^{\nu}\partial_{\nu}$ .

In the notation we used above the indices in the brackets on the left hand sides label different vectors, they are not spacetime indices.

In order to see how the existence of space-time symmetries imply the existence of conserved quantities we will consider a free particle moving along a geodesic $x^{\mu}(\tau)$ with affine parameter $\tau$ . If $V^{\mu}$ is the tangent vector and $K^{\mu}$ is a Kiling vector, we consider the scalar quantity $Q=V^{\mu}K_{\mu}$ . In order to show that this is conserved along the motion of the particle we examine the derivative,

\displaystyle\frac{d}{d\tau}Q=V^{\nu}\nabla_{\nu}Q=V^{\nu}\nabla_{\nu}(V^{\mu}% K_{\mu})=K_{\mu}(V^{\nu}\nabla_{\nu}V^{\mu})+V^{\mu}V^{\mu}\nabla_{\left(\mu% \right.}K_{\left.\nu\right)}=0\,.

(148)

The first term in the last equation is zero because of the geodesic motion (117) while the second term is zero according to (145) since $K^{\mu}$ is Killing.

Therefore, depending on the Killing vectors that our background might possess we can write different conserved quantities. Following closely the terminology from classical mechanics time translations are associated to energy while spatial translations are associated to linear momentum. Finally, spatial rotations are associate to angular momentum.

Turning our attention to classical field theory, the existence of a conserved quantity is associated conserved currents $J^{\mu}$ which satisfy the current conservation equation

\displaystyle\nabla_{\mu}J^{\mu}=0\Rightarrow\frac{1}{\sqrt{-g}}\partial_{\mu}% (\sqrt{-g}\,J^{\mu})=0\,.

(149)

The equality between the two expressions can be shown by simply using the expression (67) for the covariant derivative of a vector. The conserved quantity can then be constructed by integrating the time component of the conserved current along a surface with constant $x^{0}$ ,

\displaystyle Q(x^{0})=\int dx^{n-1}\sqrt{-g}\,J^{0}\,.

(150)

To se that this is constant in $x^{0}$ we consider the time derivative

\displaystyle\frac{d}{dx^{0}}Q(x^{0})=\int dx^{n-1}\partial_{0}(\sqrt{-g}\,J^{% 0})=-\int dx^{n-1}\partial_{i}(\sqrt{-g}\,J^{i})=0\,,

(151)

where we used the current conservation equation (149) to express the time derivative in terms of spatial ones and the fact that the integral of a divergence reduces to a surface integral at infinity, where our fields all become trivial.

The task now is now to construct a conserved current $J^{\mu}$ , given a field theory and a Killing vector $K^{\mu}$ . The statement is that for each Killing vector we can construct a conserved current given by the contraction

\displaystyle J^{\mu}=K_{\nu}T^{\mu\nu}_{matter}\,.

(152)

We now want to check its divergence,

\displaystyle\nabla_{\mu}J^{\mu}=\nabla_{\mu}(K_{\nu}T^{\mu\nu}_{matter})=K_{% \nu}\,\nabla_{\mu}T_{matter}^{\mu\nu}+T_{matter}^{\mu\nu}\,\nabla_{\left(\mu% \right.}K_{\left.\nu\right)}=0\,,

(153)

which indeed vanishes. In the derivation above we used equation (141) which is true independently of the matter content of the theory. Moreover, in order to write the second term in its symmetrised form we used that the stress tensor is symmetric. Finally we used the Killing equation (145) to put the second term equal to zero.

Once again we can associate momentum and energy to the stress energy tensor $T^{\mu\nu}_{matter}$ . It therefore makes sense to think of it as the potential sources in the equations for the metric which in Newton’s language would be the analogue of the gravitational potential.

4 Einstein’s Theory

In this section we will discuss Einstein’s answer to the second item on our wish list for a theory of gravity. In previous section we discussed how gravity affects the motion of a particle and the state of matter fields. Here we will see how matter affects the spacetime curvature.

4.1 The equations of motion of gravity

An attempt to write an equation of motion for the metric in the presence of matter fields would be something of them form

\displaystyle W_{\mu\nu}=\kappa\,T_{\mu\nu}\,.

(154)

The tensor $W_{\mu\nu}$ will be constructed from derivatives of the metric and $\kappa$ will be a constant proportionality that we will later fix by taking a weak gravity limit and demanding that we recover Newton’s gravity.

A first guess would be to say that $W_{\mu\nu}$ is simply the Ricci tensor that we wrote in definition 2.15. However, that would be inconsistent as the right hand is simply the stress tensor which is by construction divergence free. This leads us to consider the Einstein tensor in definition 2.17 which has the desired property. We therefore have that a consistent equation for gravity has the form

\displaystyle R_{\mu\nu}-\frac{1}{2}g_{\mu\nu}\,R=\kappa\,T_{\mu\nu}\,.

(155)

By contracting with the metric, we find that

\displaystyle R-2\,R=\kappa\,g^{\mu\nu}T_{\mu\nu}=\kappa\,T^{\mu}{}_{\mu}% \Rightarrow R=-\kappa\,T^{\mu}{}_{\mu}\,.

(156)

This allows us to write Einstein’s equation in the form

\displaystyle R_{\mu\nu}=\kappa\left(T_{\mu\nu}-\frac{1}{2}g_{\mu\nu}\,T^{% \lambda}{}_{\lambda}\right)\,.

(157)

In the remaining section we will carry out the necessary approximations to compare with Newton’s (4) after the identification we made in subsection 3.2 for the gravitational potential $\Phi$ . We recall that in the Newtonian limit we are close to Minkowski spacetime with a small correction according to the metric (124). For the matter fields, we will assume that they are close to being static and $T_{ij}\ll T_{00}$ . This gives us the leading order trace

\displaystyle T^{\mu}{}_{\mu}\approx-T_{00}\,.

(158)

Writing the equation $00$ component of Einstein’s equation (157) we obtain

\displaystyle R_{00}\approx\frac{\kappa}{2}\,T_{00}\,.

(159)

We now recall the definition 2.15 and the expression for the Riemann tensor (84) to write

	$\displaystyle R_{00}=R^{j}{}_{0j0}$	$\displaystyle=\partial_{j}\Gamma^{j}_{00}-\partial_{0}\Gamma^{j}_{j0}+\mathcal% {O}(h^{2})$
		$\displaystyle=\partial_{j}\Gamma^{j}_{00}+\mathcal{O}(\partial_{x^{0}})\approx% -\frac{1}{2}\partial_{i}\partial^{i}h_{00}=\partial_{i}\partial^{i}\Phi\,.$		(160)

In the approximations above we dropped all the terms which are higher order in the perturbation $h_{\mu\nu}$ as well as the time derivatives which are small in the Newtonian limit. In the final step we used our identification for the gravitational potential from section 3.2. The last step is to identify the energy density with the mass density in the non-relativistic limit $T_{00}=\rho$ giving,

\displaystyle\partial_{i}\partial^{i}\Phi=\frac{\kappa}{2}\,\rho\,.

(161)

This dictates the identification,

\displaystyle\kappa=8\pi G_{N}\,.

(162)

4.2 An action principle for gravity

In this subsection we will write down an action which yields Einstein’s equation of motion (155). The right hand side is the stress tensor which we know that we can get from simply varying the matter sector of the action with respect to the metric. This suggests that we can write the full action as a sum

\displaystyle S[g_{\mu\nu},\phi]=S_{gravity}[g_{\mu\nu}]+S_{matter}[g_{\mu\nu}% ,\phi]\,.

(163)

After varying with respect to $g^{\mu\nu}$ we gave the equation of motion

	$\displaystyle\frac{\delta S_{gravity}}{\delta g^{\mu\nu}}+\frac{\delta S_{% matter}}{\delta g^{\mu\nu}}=0\Rightarrow$
	$\displaystyle\frac{\delta S_{gravity}}{\delta g^{\mu\nu}}-\frac{1}{2}\sqrt{-g}% \,T_{\mu\nu}=0\,,$		(164)

where we used the definition of the stress tensor in equation (140). After comparing with (155) we see that we need to find a functional $S_{gravity}[g_{\mu\nu}]$ such that when varying with respect to the metric $g_{\mu\nu}$ we will find,

\displaystyle\frac{\delta S_{gravity}}{\delta g^{\mu\nu}}=\frac{1}{2\kappa}% \sqrt{-g}\,\left(R_{\mu\nu}-\frac{1}{2}g_{\mu\nu}\,R\right)\,.

(165)

The proposal we will check is that the correct gravitational action is the Einstein-Hilbert action,

\displaystyle S_{gravity}[g_{\mu\nu}]=S_{EH}=\frac{1}{2\kappa}\int d^{n}x\,% \sqrt{-g}\,R\,.

(166)

We start by performing the variation

\displaystyle\delta S_{EH}=\frac{1}{2\kappa}\int d^{n}x\,\sqrt{-g}\left[g^{\mu% \nu}R_{\mu\nu}\,\delta\ln\sqrt{-g}+R_{\mu\nu}\delta g^{\mu\nu}+g^{\mu\nu}\,% \delta R_{\mu\nu}\right]\,.

(167)

In order to treat the first term, we note the matrix identity

\displaystyle\delta\ln\det M=\sum_{\alpha\beta}(M^{-1})_{\alpha\beta}\delta M_% {\beta\alpha}\,,

(168)

which when we apply for the case of the metric we find

\displaystyle\delta\ln\sqrt{-g}=-\frac{1}{2}g_{\mu\nu}\,\delta g^{\mu\nu}\,.

(169)

The final term we would like to consider is the last one containing the variation of the Ricci tensor which can be written in terms of the variation of the Riemann tensor

\displaystyle\delta R_{\mu\nu}=\delta R^{\lambda}{}_{\mu\lambda\nu}\,.

(170)

This might seem daunting but it is much easier than it seems if we don’t vary with respect to the metric yet. Instead of doing that we will take a step back and we will consider the variation of the Riemann tensor with respect to the connection coefficients, before varying those with respect to the metric. If we do that, we can write

\displaystyle\delta R^{\lambda}{}_{\mu\rho\nu}=\nabla_{\rho}\delta\Gamma^{% \lambda}_{\nu\mu}-\nabla_{\nu}\delta\Gamma^{\lambda}_{\rho\mu}\,.

(171)

This is a meaningful expression since the variation $\delta\Gamma^{\lambda}_{\mu\nu}$ can be thought of as a difference of connections and according to our discussion around equation (78). It is then meaningful to consider its covariant derivative and for the last term in the variation (167) we can write

\displaystyle g^{\mu\nu}\delta R_{\mu\nu}=\nabla_{\lambda}\left(g^{\mu\nu}% \delta\Gamma^{\lambda}_{\mu\nu}-g^{\lambda\rho}\delta\Gamma^{\beta}_{\beta\rho% }\right)\,,

(172)

showing that it is a total derivative and therefore a surface term which cannot enter the equations of motion and which we can drop. At the end of the day we have that

\displaystyle\delta S_{EH}

\displaystyle=\frac{1}{2\kappa}\int d^{n}x\,\sqrt{-g}\left[-\frac{1}{2}g_{\mu% \nu}R+R_{\mu\nu}\right]\delta g^{\mu\nu}\,,

(173)

giving the desired result.

5 Black Holes

5.1 The Schwarzschild solution

In this section we will look for the simplest solutions in Einstein’s theory in the absence of matter fields. In this case, we will have to look for Ricci flat spacetimes with,

\displaystyle R_{\mu\nu}=0\,.

(174)

More specifically we will look for spacetimes with spherical symmetry. One such spacetime we know already and which is a solution of (174) is Minkowski space which we can write in polar coordinates according to,

$\displaystyle ds^{2}$	$\displaystyle=-dt^{2}+dx^{2}+dy^{2}+dz^{2}$
	$\displaystyle=-dt^{2}+dr^{2}+r^{2}\,\left(d\theta^{2}+\sin^{2}\theta\,d\varphi% ^{2}\right)$
	$\displaystyle=-dt^{2}+dr^{2}+r^{2}\,d\Omega_{2}^{2}\,.$	(175)

In the above metric we have introduced the metric of unit radius two dimensional sphere

\displaystyle d\Omega_{2}^{2}=d\theta^{2}+\sin^{2}\theta\,d\varphi^{2}\,,

(176)

which is invariant under three dimensional Euclidean rotations.

In order to find more general spherically symmetric solutions, we will keep this part of the metric but generalise the rest of it,

\displaystyle ds^{2}=\gamma_{tt}(t,r)\,dt^{2}+2\gamma_{tr}(t,r)\,dt\,dr+\gamma% _{rr}(t,r)\,dr^{2}+\gamma_{\Omega\Omega}(t,r)\,d\Omega_{2}^{2}\,.

(177)

The above metric is the most general we can write which preserves spherical symmetry. We can do slightly better than this to write down something simpler by exploiting coordinate transformations of the form,

	$\displaystyle t$	$\displaystyle\rightarrow t(t^{\prime},r^{\prime}),$
	$\displaystyle r$	$\displaystyle\rightarrow t(t^{\prime},r^{\prime}),$		(178)

to set $\gamma_{tr}=0$ and $\gamma_{\Omega\Omega}=r^{2}$ . We now have a simpler metric of the form,

\displaystyle ds^{2}=-e^{2\alpha(t,r)}\,dt^{2}+e^{2\beta(t,r)}\,dr^{2}+r^{2}\,% d\Omega_{2}^{2}\,,

(179)

and the task is to find the functions $\alpha(t,r)$ and $\beta(t,r)$ which solve Einstein’s equation (174).

The non-trivial components of the Ricci tensor read

$\displaystyle R_{00}$	$\displaystyle=\ddot{\beta}+\dot{\beta}^{2}-\dot{\alpha}\dot{\beta}+e^{2(\alpha% -\beta)}\,\left[\alpha^{\prime\prime}+\alpha^{\prime 2}-\alpha^{\prime}\beta^{% \prime}+\frac{2}{r}\alpha^{\prime}\right]\,,$
$\displaystyle R_{11}$	$\displaystyle=-\alpha^{\prime\prime}+\alpha^{\prime 2}-\alpha^{\prime}\beta^{% \prime}-\frac{2}{r}\alpha^{\prime}+e^{2(\beta-\alpha)}\,\left[\ddot{\beta}+% \dot{\beta}^{2}-\dot{\alpha}\dot{\beta}\right]\,,$
$\displaystyle R_{01}$	$\displaystyle=\frac{2}{r}\dot{\beta}\,,\qquad R_{22}=e^{-2\beta}\,\left[r(% \beta^{\prime}-\alpha^{\prime})-1\right]+1\,,$
$\displaystyle R_{33}$	$\displaystyle=\sin^{2}\theta\,R_{22}\,.$	(180)

In the above equations we have used the notation $\dot{f}(t,r)=\partial_{t}f(t,r)$ and $f^{\prime}(t,r)=\partial_{r}f(t,r)$ .

We now examine what Einstein’s equations imply for our functions,

\displaystyle R_{01}=0\Rightarrow\dot{\beta}(t,r)=0\Rightarrow\beta=\beta(r)\,,

(181)

showing that $\beta$ is only a function of the radial coordinate $r$ . We now consider the time derivative,

\displaystyle\partial_{t}R_{22}=0\Rightarrow\partial_{t}\partial_{r}\alpha(t,r% )=0\Rightarrow\alpha(t,r)=f(r)+g(t)\,,

(182)

and therefore the function $\alpha$ can only be the sum of two functions $f$ and $g$ that depend only on the radial and time coordinates respectively. However, if we look back at our original ansatz in equation (179) we can see that we can absorb the function $g(t)$ in a time redefinition according to,

\displaystyle d\tilde{t}=e^{g(t)}\,dt=d\left(\int dt\,e^{g(t)}\right)\,.

(183)

We can therefore set $g(t)=0$ without any loss of generality. We now examine the linear combination,

\displaystyle e^{2(\beta-\alpha)}R_{00}+R_{11}=\frac{2}{r}\,\left(\alpha^{% \prime}+\beta^{\prime}\right)=0\Rightarrow\alpha(r)=-\beta(r)+c\,,

(184)

with $c$ a constant which we can also set to zero via a transformation of the time coordinate.

The final equation which we have not solved yet is,

\displaystyle R_{22}=0\Rightarrow e^{2\alpha}\left[2r\alpha^{\prime}+1\right]=% 1\Rightarrow e^{2\alpha}=1+\frac{\mu}{r}\,,

(185)

where $\mu$ is a constant of integration for the ordinary differential equation we solved. The final form of the metric is then

\displaystyle ds^{2}=-\left(1+\frac{\mu}{r}\right)\,dt^{2}+\left(1+\frac{\mu}{% r}\right)^{-1}\,dr^{2}+r^{2}\,d\Omega_{2}^{2}\,.

(186)

What we showed above is rather striking. All we imposed in our original ansatz (179) was spherical symmetry but Einstein’s equations constrained both functions $\alpha$ and $\beta$ to be independent of time. This is also known as Birkhoff’s theorem.. More precisely speaking, any time dependence would simply be artificial coming from time dependent coordinate transformations. The final result is a static spacetime which can be geometrically characterised by,

Definition 5.1.

A metric with a timelike Killing vector $K^{\mu}$ is called stationary. If $\nabla_{\left[\mu\right.}K_{\left.\nu\right]}=0$ it is called static.

We now want to go back to our solution (186) and understand it better. By taking the limit $r\to\infty$ we see that the metric approaches Minkowski space time in spherical coordinates as in equation (5.1). Therefore, when looked from far away, this suggests that there is a pointlike object sitting at the origin, curving the spacetime. Since its effect seem to become weak at infinity, it makes sense to examine the deviation of $g_{00}$ from Minkowski spacetime and treat it as a Newtonian gravitational potential according to the discussion of section 3.2

\displaystyle g_{00}\approx-1-2\Phi\Rightarrow\Phi=\frac{\mu}{2r}\,.

(187)

It is natural then to want to interpret the solution (186) as the spacetime created by a point-like object of mass $M$ sitting at the origin space. This suggests the identification

\displaystyle\mu=-2G_{N}M\,.

(188)

After this, our metric takes the form,

\displaystyle ds^{2}=-\left(1-\frac{2G_{N}M}{r}\right)\,dt^{2}+\left(1-\frac{2% G_{N}M}{r}\right)^{-1}\,dr^{2}+r^{2}\,d\Omega_{2}^{2}\,,

(189)

which is known as the Schwarzschild black hole solution.

5.2 Motion around a black hole

In this subsection we would like to examine the motion of a free particle moving inside the Schwarzschild black hole background of equation (189). We will do this by examining the geodesic equation (117) in its affine parametrisation with $\alpha=0$ . As we argued in section 3.1, we can equivalently study the action (123) in the background (189),

	$\displaystyle L^{\prime}$	$\displaystyle=\int\,d\tau\,\left[-\left(f(r(\tau))\,\dot{t}^{2}(\tau)+f^{-1}(r% (\tau))\,\dot{r}^{2}\theta(\tau)+r^{2}(\tau)\left(\dot{\theta}^{2}(\tau)+\sin^% {2}\theta(\tau)\,\dot{\phi}^{2}(\tau)\right)\right)\right]\,,$
	$\displaystyle f(r)$	$\displaystyle=1-\frac{2M}{r}\,.$		(190)

Due to the large symmetry of the background we won’t use them much but it is good to list the equations of motion for completeness:

	$\displaystyle\frac{d}{d\tau}\left[\left(1-\frac{2M}{r}\right)\dot{t}\right]=0% \,,\quad\frac{d}{d\tau}\left[r^{2}\,\sin^{2}\theta\,\dot{\phi}\right]=0\,,% \quad-\frac{d}{d\tau}\left(r^{2}\dot{\theta}\right)+r^{2}\sin\theta\,\cos% \theta\,\dot{\phi}^{2}=0\,,$
	$\displaystyle 2\,\frac{d}{d\tau}\left[-\left(1-\frac{2M}{r}\right)^{-1}\,\dot{% r}\right]-\frac{2M}{r^{2}}\dot{t}^{2}-\left(1-\frac{2M}{r}\right)\frac{2M}{r^{% 2}}\dot{r}^{2}+2r\left(\dot{\theta}^{2}+\sin^{2}\theta\dot{\phi}^{2}\right)=0$		(191)

We now recall that for the Schwarzschild space we have the Killing vectors corresponding to time translations and spatial rotations:

$\displaystyle T_{(t)}$	$\displaystyle=\partial_{t}\,,$
$\displaystyle L_{(1)}$	$\displaystyle=\partial_{\phi}\,,$
$\displaystyle L_{(2)}$	$\displaystyle=-\cos\phi\,\partial_{\theta}+\sin\phi\,\cot\theta\,\partial_{% \phi}\,,$
$\displaystyle L_{(3)}$	$\displaystyle=\sin\phi\,\partial_{\theta}+\cos\phi\,\cot\theta\,\partial_{\phi% }\,.$	(192)

According to our discussion on conserved quantities in section 3.4, to each Killing vector we can associate a conserved quantity:

$\displaystyle E$	$\displaystyle=-(T_{(t)})_{\mu}\,\dot{x}^{\mu}=(1-\frac{2M}{r})\,\dot{t}\,,$
$\displaystyle L$	$\displaystyle=(L_{(1)})_{\mu}\,\dot{x}^{\mu}=r^{2}\,\sin^{2}\theta\,\dot{\phi}\,,$
$\displaystyle L_{x}$	$\displaystyle=(L_{(2)})_{\mu}\,\dot{x}^{\mu}=-\cos\phi\,\dot{\theta}\,r^{2}+% \cos\phi\,\cot\theta\,L\,,$
$\displaystyle L_{y}$	$\displaystyle=(L_{(3)})_{\mu}\,\dot{x}^{\mu}=\sin\phi\,\dot{\theta}\,r^{2}+% \sin\phi\,\cot\theta\,L\,.$	(193)

The first conserved quantity is the energy of the particle which is conserved due to the time translation symmetry of the background. The last three conserved quantities form the vector of the angular momentum which is conserved both in magnitude and direction. Depending on our initial conditions we can without loss of generality choose it to be along the $z$ -axis. We can do this by choosing a frame in which we initially have $\dot{\theta}=0$ and $\theta=\phi/2$ . This gives $L_{x}=L_{y}=0$ and their conservation is telling us that these will remain constant throughout the whole motion and therefore $\dot{\theta}$ and $\theta$ will not change their values.

The conservation laws are effectively solving the first three equations we have listed in equation (5.2). We would therefore still need to solve the last one which is related to the variation of the action (5.2) with respect to the radial coordinate. However, this won’t be necessary as there is another “conservation law” related to the choice of the affine parameter. As we discussed in section 3.1 in the affine parametrisation the norm of the tangent vector of the geodesic remains constant and therefore,

$\displaystyle-\left(1-\frac{2M}{r}\right)\,\dot{t}^{2}+\left(1-\frac{2M}{r}% \right)^{-1}\,\dot{r}^{2}+r^{2}\left(\dot{\theta}^{2}+\sin^{2}\theta\,\dot{% \phi}^{2}\right)$	$\displaystyle=-\varepsilon\,,$
$\displaystyle\left(1-\frac{2M}{r}\right)^{-1}\,\left(-E^{2}+\dot{r}^{2}\right)% +\frac{L^{2}}{r^{2}}$	$\displaystyle=-\varepsilon\Rightarrow,$
$\displaystyle\frac{1}{2}E^{2}=\frac{1}{2}\dot{r}^{2}+V_{eff}(r)$	$\displaystyle\,,$	(194)

where we used the conservation quantities in equation (5.2) and we have also defined,

\displaystyle V_{eff}(r)=\frac{1}{2}\varepsilon-\varepsilon\frac{M}{r}+\frac{L% ^{2}}{2r^{2}}-\frac{ML^{2}}{r^{3}}\,.

(195)

We will be calling this function the effective potential as the equation determining the radial variable looks similar to that of a non-relativisteic particle moving in central effective potential $V_{eff}(r)$ . In fact, the leading terms as we take the $r\to\infty$ limit are identical to that of a particle moving around an object of mass $M$ ,

\displaystyle V_{eff}(r)\approx\frac{1}{2}\varepsilon-\varepsilon\frac{M}{r}+% \frac{L^{2}}{2r^{2}}\,,

(196)

and this is not entirely surprising given our discussion in section 3.2 and the fact that as we move to infinity gravity becomes weak. Loosely speaking, the contribution of General Relativity is the extra term in (196) that goes like $1/r^{3}$ and becomes dominant at small distances, where gravity becomes strong.

Some significant remarks about it are:

•

It is bounded from above
•

For massless, light-like particles with $\varepsilon=0$ there is an circular orbit at $r=r_{0}$ . We have that

$\displaystyle V_{eff}^{\prime}(r_{c})=0\Rightarrow,-\frac{L^{2}}{r_{0}^{3}}+% \frac{3ML^{2}}{r_{0}^{4}}=0\Rightarrow r_{0}=3M\,,$ (197)

and to check its stability we compute,

$\displaystyle V^{\prime\prime}(r_{0})=-\frac{L^{2}}{M^{4}3^{4}}<0\,,$ (198)

showing that it is unstable.
•

For massive particles with $\varepsilon=1$ we have two circular orbits at $r=r_{\pm}$ with

$\displaystyle V^{\prime}_{eff}(r_{\pm})=0\Rightarrow\frac{M}{r_{c}^{2}}-\frac{% L^{3}}{r_{c}^{3}}+\frac{3ML^{2}}{r_{c}^{4}}=0\Rightarrow$

$\displaystyle r_{\pm}=\frac{1}{2M}\left(L^{2}\pm\sqrt{L^{4}-12M^{2}L^{2}}% \right)\,.$ (199)

This is different when compared to the non-relativistic motion of particles. However, by checking the sign of the second derivative of the effective potential, we can conclude that only the orbit with $r=r_{+}$ is stable.

We will now examine the spherical region at $r=2M$ , where the coordinate system we chose to construct the Schwarzschild solution (189) is singular. As we will see this is precisely where the event horizon of our black hole is. We will do this by shooting a ray of light, with $\varepsilon=0$ in equation (196), directly towards the center of our black hole. For this motion, we will need to have zero angular momentum and therefore $\dot{\phi}=0$ .

We have that,

$\displaystyle\frac{dr}{d\tau}$ $\displaystyle=-E\,\Rightarrow$ (200)

$\displaystyle\frac{dr}{dt}\frac{dt}{d\tau}$ $\displaystyle=-E\,\Rightarrow$

$\displaystyle\frac{dr}{dt}$ $\displaystyle=-\left(1-\frac{2M}{r}\right)\,.$ (201)

In the second equation we have simply used the conservation laws in (5.2). We see two strikingly different pictures depending on how we examine the trajectory of the ray of light. Equation (200) has a simple solution $r=r_{0}-R\,\tau$ and this is telling us that after letting pass enough proper time, the ray of light will cross the sphere at $r=2M$ without something dramatic happening. On the other hand, if time is measured by the $t$ coordinate of our coordinate system, the ray of light slows down as it approaches the horizon and we will never see it crossing it. As observers sitting on the outside, we will never be able to send information inside the event horizon and actually see it happening.

	$\displaystyle V^{\prime}_{eff}(r_{\pm})=0\Rightarrow\frac{M}{r_{c}^{2}}-\frac{% L^{3}}{r_{c}^{3}}+\frac{3ML^{2}}{r_{c}^{4}}=0\Rightarrow$
	$\displaystyle r_{\pm}=\frac{1}{2M}\left(L^{2}\pm\sqrt{L^{4}-12M^{2}L^{2}}% \right)\,.$		(199)

$\displaystyle\frac{dr}{d\tau}$	$\displaystyle=-E\,\Rightarrow$	(200)
$\displaystyle\frac{dr}{dt}\frac{dt}{d\tau}$	$\displaystyle=-E\,\Rightarrow$
$\displaystyle\frac{dr}{dt}$	$\displaystyle=-\left(1-\frac{2M}{r}\right)\,.$	(201)