Notes on General Relativity

Dr A. Donos

1 Introduction

In this course we will discuss Einstein’s theory for a classical theory of gravity. The wish list for a successful theory of classical gravity has two ingredients:

  1. 1.

    How does gravity affect probe particles & fields

  2. 2.

    How does mass (energy) density produce a gravitational field

1.1 (Why not) Newton’s gravity

The first attempt belongs to Newton according to whom gravitational effects are the result of a field that fills out the whole space. To answer the first question 1, Newton proposed the following theoretical assumptions:

  • The work

    W=ABFg𝑑x, (1)

    done by the gravitational force Fg on a particle that moves between two points is independent of the path i.e. it is a conservative. In modern language, Stokes’ theorem suggests that the vector Fg is proportional to the gradient of a scalar function Φ, the gravitational potential of the field. The constant of proportionality defines the gravitational mass mg and Fg=-mgΦ.

  • The gravitational mass mg would in general be different from the mass of inertia m that enters Newton’s second law

    ma=-mgΦ. (2)
  • It is an experimental fact that all objects accelerate in the same way when inside the same gravitational field and therefore they must have m=mg. This is known as the equivalence principle and therefore

    a=-Φ. (3)

In Newton’s language, the aim of the second item in our wish list for a gravitational theory wants to answer how Φ is determined given a gravitational mass density ρ depends both on the spatial coordinates xi as well as on time t. This is determined after solving the elliptic equation

2Φ(xi,t)=4πGNρ(xi,t), (4)

where GN is Newton’s gravitational constant. The general solution of equation (4) takes the form

Φ(x,t)=3G(x,x)ρ(x,t)d3x, (5)

with G(x,x) the Green’s function for the boundary value problem we need to solve. After imposing the reasonable requirement that a distribution ρ(xi,t) of compact support should have a vanishing gravitational potential at infinity, we find that

G(x,x)=-14π1|x-x|. (6)

The above discussion is actually a theory of gravity for us. However, by looking at equation (5) we see that a small, local change in the distribution ρ(xi,t) will instantaneously change the gravitational potential Φ(xi,t) everywhere in space. In other words, the information about the change δρ(xi,t) is transmitted everywhere in space without any delay. Thinking in loose terms, this is the result of the absence of time derivatives from equation (4) which would make the gravitational potential behave more like a wave rather than something static and rigid.

This is very different from what we have learned Maxwell’s theory where a local change in the charge density ρe(xi,t) would first result in radiation that would later settle down to static electric and magnetic fields. In the next section we will examine Maxwell’s theory and try to extract the invariants of space and time, thinking of it as a fundamental theory.

As we will see later in the course Einstein’s resolution was rather elegant and groundbreaking answer:

  1. 1.

    There is no gravitational force! Spacetime is not flat and all particles move on straight lines in a curved spacetime.

  2. 2.

    The curvature of spacetime is determined by the state of matter inside it.

The hope is that the two answers above will start making sense by the end of the course.

1.2 Maxwell’s Theory and Symmetries of Spacetime

In Maxwell’s theory, particles of electric charge q which move at a velocity v inside an electric field E and magnetic field B experience the force,

FE/M=qE+qv×B. (7)

The first term is the Coulomb force while the second one is the Lorentz one.

On the other hand, an electric charge density ρe and current density J will act as sources for the electromagnetic field. This is described by Maxwell’s equations which read,

E=4πρe,B=0
×E+1ctB=0,×B-1ctE=4πcJ, (8)

with c the speed of light. The above set of equations provides us the second item on our wish list for a classical theory of electromagnetism. An important feature of Maxwell’s theory is that it contains time derivatives of the electric and magnetic fields. The first physical consequence of this fact is that the way time derivatives enter in (1.2) allow for the existence of time dependent electric and magnetic fields in the form of waves in empty space with ρe=0 and J=0.

The second important consequence of the time derivatives is that small local changes δρe and δJ in the sources are not transmitted instantaneously everywhere in space through the generated fields. At a more fundamental level, apart from spatial rotations and translations, equations (1.2) are also left invariant under the novel transformation,

x=γ(x-vt),y=y,z=z,ct=γ(ct-vcx),
Ex=Ex,Ey=γ(Ey-vBz),Ez=γ(Ez+vBy),
Bx=Bx,By=γ(By+vc2Ez),Bz=γ(Bz-vc2Ey),
γ=(1-v2/c2)-1/2. (9)

This transformation is parametrised by a velocity parameter v with |v|<c and represents a Lorentz boost in the x direction. Similar transformations can be written for the Lorentz boosts in the y and z directions.

This is a rather radical transformation from Newton’s point of view as it also transforms time as if it was a coordinate. This is exactly the lesson we are learning from electromagnetism, t should be on equal footing with the spatial coordinates x, y and z.

1.3 Special Relativity

In the previous subsection we discovered Lorentz boosts as symmetries of Maxwell’s theory. Einstein appreciated this message and considered Lorentz boosts as a fundamental symmetry of empty spacetime. As we emphasised already, this promotes (or downgrades) time to yet another coordinate. We will therefore now use the notation xμ for the coordinates of a point in spacetime with μ=0,,3 where x0=ct, x1=x, x2=y and x3=z.

The full set of symmetries is now rotations, translations as well as Lorentz boosts and can be written as,11 1 We will be using the standard Einstein summation convention for repeated indices i.e. αμβμα0β0+α1β1++αdβd in d+1 dimensions.

xμ=Λμxνν+βμ. (10)

The (11) tensor22 2 If you are not familiar with this terminology, you can think of Λμν simply as a 4×4 matrix for the time being. We will define tensors more carefully in section 2. Λμν parametrises the Lorentz boosts as well as the standard group of spatial rotations SO(3). This is the Lorentz group of transformations. The vector βμ simply represents a space time translation which together with the Lorentz group form the Poincare group.

For the Lorentz boost of equation (1.2) we can write

Λμ=ν(coshϕ-sinhϕ00-sinhϕcoshϕ0000100001), (11)

with ϕ=tanh-1(v/c). A purely spatial rotation takes the form

Λμ=ν(100000Rij0), (12)

with Rji the matrix representation of a three dimensional Euclidean rotation. For a rotation in the x-y plane by an angle φ we have

Λμ=ν(10000cosφsinφ00-sinφcosφ00001). (13)

We see that in total we have three parameters for the Lorentz boosts in the different directions along with the three parameters for the Euclidean rotations. Therefore, a generic element of the Lorentz group in four dimensions is fully specified by six parameters, this is the dimension of the group. This makes the Poincare group which includes space time translations a ten dimensional group.

The above shows that the “Euclidean” distance between points,

ΔsE2=Δx2+Δy2+Δy2, (14)

is only left invariant under a subset of the Lorentz group, the standard spatial rotations. However, now that we are thinking of the full Lorentzian group as a fundamental symmetry, we need to do better than that and come up with a distance between points which is independent of the frame we choose.

We are then looking for a matrix ημν such that the number

Δs2=ΔxμημνΔxν, (15)

is invariant under the Lorentz transformations (10). This requirement gives

Δs2=ΔxρηρνΔxν =ΔxπηπμΔxμ
ΔxρηρνΔxν =ΔxρΛπηπμρΛμΔνxν
ηρν =ΛπηπμρΛμ,ν (16)

where in the last line we used that the equality has to hold for any vector Δxμ. Demanding that the matrix ημν is such that the last equation holds for any Lorentz transformation Λνμ fixes

ημ=ν(-1000010000100001). (17)

Our four dimensional spacetime, equipped with the “inner product” αβημναμβν is called Minkowski space.

The distance (14) which is invariant under Euclidean rotations is strictly positive for non-zero vectors. However, the Lorentz invariant norm (15) can have either sign and can also be equal to zero for certain non-zero space time vectors. For any vector Vμ with a negative norm |V|2=ημνVμVν=-l2 one can find a frame in which in which the vector has coordinates

Vμ=(l000), (18)

after performing a Lorentz transformation. In other words, there is always a frame in which a negative norm vector points only in the time direction. Similarly, for any vector xμ with positive norm |V|2=ημνVμVν=l2, we can find a frame in which it takes the form

Vμ=(0l00), (19)

and points in a spatial direction. Lastly, for a zero norm vector we can always find a frame in which

Vμ=(ll00), (20)

pointing at the trajectory of a light ray. For this reason we call positive norm vectors space-like, negative norm ones time-like and non-trivial zero norm vectors light-like.

Now that we have constructed an invariant distance between two points, we might wonder about an appropriate definition for the invariant length L[xμ] of a space time curve xμ(λ) with parameter λ. I fact, it has to be a functional of the form

L[xμ]=λiλf𝑑λg(Vμ(λ)), (21)

by using its tangent vector,

Vμ=x˙μ(λ)ddλxμ(λ). (22)

In order to do this, we want to impose the following two important geometrical restrictions on our defintion of L[xμ] which has to be invariant under,

  • Lorentz transformations xμ(λ)=Λμxνν(λ),

  • curve reparametrisations of the form λ=f(τ) with f a monotonically increasing function.

The first requirement is trivially satisfied by making the function g only a function of the norm of the tangent vector i.e. g(Vμ)=h(ημνVμVν). On the other hand, under a curve reparametrisation equation (21) gives

L[xμ]=τiτf𝑑τf˙(τ)h((f˙(τ))-2ημνddτxμ(f(τ))ddτxν(f(τ))), (23)

where λi,f=f(τi,f). However, if the definition (21) is parametrisation independent, the above should also be equal to,

L[xμ]=τiτf𝑑τh(ημνddτxμ(f(τ))ddτxν(f(τ))). (24)

This is satisfied by the choice h(ημνVμVν)=|ημνVμVν| and therefore,

L[xμ]=λiλf𝑑λ|ημνVμVν|. (25)

Finally, we would like to discuss the relativistic free particle which is supposed to move on a straight line in space time. We can give two different definitions for what we mean by a straight line:

  • A straight line is a curve xμ(λ) whose tangent vector x˙μ(λ) remains constant33 3 In fact, we will later see that this definition is not the most general. and therefore x¨μ(λ)=0. This is a local definition.

  • A straight line xμ(λ) extremises the functional L[xμ]. This is a global definition.

One can easily show that the curve xμ(λ)=Vμλ+βμ with Vμ and βμ constant vectors satisfies both conditions and it is a straight line. This is precisely what a free relativistic particle wants to do, it will move on a straight line.

According to Einstein’s theory of gravity, Minkowski spacetime that we discussed in this section represents empty space. Einstein’s elegant answers to our wishlist boils down to the following:

  1. 1.

    There is no gravitational force, all particles move on straight lines.

  2. 2.

    Matter changes the geometry of spacetime, changing it from Minkowski to something with curvature.

The aim of the rest of the course is to make sense of the above two statements.

2 Elements of Differential Geometry

In order to discuss curved spacetimes and the physics of fields on them, we will have to introduce certain aspects of differential geometry. In the following subsections we will discuss manifolds, which are relevant the spacetimes themselves. In order to describe the geometry of our manifolds and physical fields which will live inside them, we will also need to introduce vectors, 1-forms and higher rank tensors. The next important ingredient to understand curvature and physical laws is differentiation in curved spacetimes. Finally, we will discuss the spacetime metric and curvature via the Riemann tensor which is the main object in Einstein’s General Relativity.

2.1 Manifolds and functions

A manifold is a minimal structure that we will need in order to describe our spacetimes in terms of coordinates systems. Before defining what a manifold is, we will need a notion of coordinate systems which we define according to,

Definition 2.1.

Given a topological space M and an open subset UM, a coordinate system or chart on U is a the pair (U,ϕ) with ϕ “1-1” map ϕ:Un. The integer n is called the dimension of U.

A coordinate system allows us to describe the points of an open set by uniquely assigned a set of number to each one of them i.e. for a point pU we can write ϕ(p)=(x1,x2,,xn). Given a function f:U, we can consider the composition fϕ-1:n. Based on this composition we can decide whether this composition is differentiable in the standard sense of analysis in many variables. This clearly depends on the chart and the properties of the function ϕ. This implies that in order to make the question of differentiability more geometrical, we need to impose certain restrictions on our charts. Moreover, we need a sense in which differentiability of functions will be coordinate system independent. This is more of a restriction on the coordinate systems we are allowed to consider. For this reason we came up with the idea of a differentiable manifold,

Definition 2.2.

A differentiable manifold of dimension n is a topological space44 4 Strictly speaking Hausdorff and second countable but we won’t need too much of this information in our course. with a collection of charts (Uα,ϕα) such that:

  1. 1.

    The union of all the open sets in the collection of charts covers the set M, i.e. αUα=M

  2. 2.

    If UαUβ then the function ϕαϕβ-1:nn is infinitely differentiable

  3. 3.

    The collection of all charts (Uα,ϕα) is maximal.

Notice that the number of dimensions n does not change as we move around our manifold. The first condition makes sure that for any point pM, we can find a coordinate system that describes it along with its immediate topological neighbourhood. The second condition gives meaning to functions which are differentiable at a point pM without any reference to a coordinate system. To see this, consider a point pM which belongs to the open sets pUγM and that the function f:M is differentiable with respect to a specific chart (Uδ,ϕδ) at p i.e. the function (fϕδ-1)(xμ) is differentiable at xμ=ϕδ(p). We can now consider a different coordinate system (Uϵ,ϕϵ) with pUϵ and wonder about the differentiability of (fϕϵ-1)(yμ) at yμ=ϕϵ(p) which is the same point from the point of view of M. For this reason we can write fϕϵ-1=(fϕδ-1)(ϕδϕϵ-1) and also note that fϕδ-1:n while ϕδϕϵ-1:nn. We therefore have the composition of two functions of which we know that the first is differentiable by hypothesis while the second one is differentiable by the second item in the definition 2.2.

Even though the notion of differentiability is coordinate system invariant in a differential manifold, the partial derivatives of a function themselves are coordinate dependent. Suppose that we have two coordinate systems (U1,ϕ1) and (U2,ϕ2) and a point pU1U2. We will call the coordinates ϕ1(p)={xμ} and the coordinates ϕ2(p)={yμ}. If f is a differentiable function at p, we can also construct the functions f^=fϕ1 and f~=fϕ2. By using the identity fϕ2-1=fϕ1-1ϕ1ϕ2-1 we can write that f~(yμ)=f^(xν(yμ)) giving the relation

yif~(yμ)=jxjyixjf^(xν(yμ)), (26)

where we used the chain rule for partial derivatives. The above is true for any differentiable function f giving,

yi=xjyixj, (27)

where we used Einstein’s convention for repeated indices. The transformation (27) is an important property of how partial derivatives transform under coordinate transformations and will show up frequently in our discussions.

2.2 Vectors and 1-forms

In the previous subsection we discussed partial functions and their partial derivatives. In this section we will introduce the notion of vectors on a general manifold. In order to define vectors we will have to think of them as being defined pointwise. The case you might be most familiar with is flat space, including Minkowski space. In flat space, vectors can be defined in a much more straightforward way since the vector space, tangent at each point is isomorphic to the space itself. On a more general manifold M, we can define a vector V at a pM according to:

Definition 2.3.

A vector V is a linear map V:C(p) which satisfies the Leibniz rule of differentiation. As such, for any two real numbers a,b and differentiable functions f,g:M we must have,

  1. 1.

    V(af+bg)(p)=aV(f)(p)+bV(g)(p)

  2. 2.

    V(fg)(p)=f(p)V(g)(p)+g(p)V(f)(p).

Moreover, at any point pM we can define the addition of two vectors V and W according to

(V+W)(g)=V(g)+W(g) (28)

for any function g. It is relatively straightforward to check that the set of all vectors at a point pM form a linear space over real numbers, the tangent space Tp.

At this point all this might seem too abstract and an example might be helpful. Given a curve γ:M with parameter λ and a point γ(λ0), a natural definition is its tangent vector Vp. According to the definition 2.3 we need to define it by giving the image of each differentiable function f at γ(λ0) under Vp.

Example 2.1.

The tangent vector VTp of a curve γ(λ) at a point p=γ(λ0) is defined so that fo any fC1(p),

V(f)=ddλ(fγ)(λ0). (29)

Notice that this definition is coordinate independent and the vector is really a geometric object. We can do a bit more if we give a coordinate system (U,ϕ) such that pU and. We can write

V(f) =ddλ((fϕ-1)(ϕγ))(λ0)=dxμdλ|λ=λ0fxμ|xμ=xμ(λ0)
=(dxμdλ|λ=λ0μ)f|xμ=xμ(λ0), (30)

for any function f. We see that, given a coordinate system, the tangent vector is expressed as a linear combination of partial derivatives,

V=Vμ(p)μ,Vμ(p)=dxμdλ|λ=λ0. (31)

It is worth comparing this result to what we had in Minkowski space in equation (22).

This result is much more general, any linear combination of the form V=bμμ is a vector as it satisfies all the properties of the definition 2.3. What is not obvious but very important is that, given a specific coordinate system, its partial derivatives form a basis for Tp, the coordinate basis {μ}. Therefore, the dimensions of Tp are equal to the dimensions of the manifold M and dimTp=dimM=n.

Suppose that pU1U2 and (U1,ϕ1), (U2,ϕ2) two coordinate systems with {xμ}=ϕ1(p) and {yμ}=ϕ2(p). The corresponding basis for Tp are {xν} and {yμ}. Equation (27) is telling us precisely how the basis of partial derivatives {μ} transforms under a coordinate transformation

{yμ}={yμ(xν)}(ϕ2ϕ1-1)({xν}).

The basic requirement is that the vector V is geometric and independent of the basis we write it in. This suggests that the components of the vector have to be such that

V=Vixi=Vjyj. (32)

However, according to the transformation rule (27) we can write

V=Vixi=Viyjxiyj, (33)

and by comparison to the previous equation we have that the components of a vector transform according to

Vj=yjxiVi. (34)

So far, we have discussed the coordinate basis {μ} of the tangent space Tp. As for any n dimensional vector space, there are more general basis {eν} that we can use. This is again the statement that vectors are geometric objects, independent of the basis we use to describe them.

We will now move forward with the definition of the cotangent vectors,

Definition 2.4.

The space of all linear functionals ω:Tp is called the cotangent space Tp. If V,WTp and a,b then

ω(aV+bW)=aω(V)+bω(W). (35)

In linear algebra terms, the cotangent space Tp is the dual space of the tangent space Tp. The dual space is itself a vector space under the addition rule

(aω+bψ)(V)=aω(V)+bψ(V), (36)

for all a,b, vectors VTp and 1-forms ω,ψTp. As a theorem from linear algebra we know that dimTp=dimTp suggesting that a basis {θμ} will consist of n 1-forms and which we can use to write any 1-form as a linear combination ω=ωμθμ with ωμ the components of the 1-form ω.

Definition 2.5.

Given a basis {eμ} for the tangent space Tp, there is a special {θμ} basis, the dual basis, of Tp that we can always choose such that θμ(eν)=δνμ, with δνμ being Kronecker’s delta. For the case where we have a coordinate basis and eν=xν, the notation for the dual basis is θμ=dxμ.

The dual basis is very convenient to work with since if V=Vνeν and ω=ωμθμ we have,

ω(V)=ω(Vνeν)=Vνω(eν)=Vνωμθμ(eν)=Vνωμδνμ=Vνων. (37)

As we have seen, the dual basis {dxμ} is tied to a coordinate basis {xν} and the latter transforms according to (27). It is natural to ask how the dual basis transforms as well as the components of a 1-form under coordinate transformations. This will be fixed by the requirement that the number ω(V) has to be independent of the coordinate system, for any VTp. In equation (32) we have seen how to write a vector V in in different coordinate systems. Correspondingly for the 1-form ω, we can write

ω=ωνdxν=ωμdyμ. (38)

Base on the above expressions and also (37) we can write

ω(V)=ωνVν=ωμVν=ωμyμxνVν, (39)

where we used the transformation rule (34). Since the above has to be true for any vector V, we conclude that we must have,

ωμ=xνyμων. (40)

Moreover, equation (38) implies that the dual basis vectors have to transform according to

dyμ=yμxνdxν. (41)

We will conclude this section with an observation that will allow us to define higher rank tensors. As we saw, the 1-forms are the dual vectors of the tangent vectors. One might wonder what would happen if we now tried to define the dual Tp of the dual vector space Tp. The simple answer is that Tp is isomorphic to Tp and we wouldn’t gain something from doing that.

However, this is telling something important about what follows in the next section. To see how this works we need to discuss the isomorphism :TpTp. We need to find a map that takes a vector V and maps is to a functional of 1-forms (V). This can be simply defined through,

(V)(ω)=ω(V), (42)

for any ωTp. One can show that the converse is also true, i.e. for any QTp one can find a VTp such that Q=(V) and therefore the map is bijective. From now on we will be using the notation,

V(ω)(V)(ω), (43)

showing that we can write

ω(V)=V(ω), (44)

in a meaningful way. For the basis vectors in particular we can also write

eν(θμ)=θμ(eν)=δνμ. (45)

2.3 Higher rank tensors

In the previous section we discussed vectors and 1-forms. One of the striking conclusions was that we can see 1-forms as functionals of vectors and vectors as functions of 1-forms! We might wonder whether there is more general structures of similar logic. The answer is that vectors and 1-forms are special kinds of tensors, of type (10) and (01) respectively. More generally we can define,

Definition 2.6.

An (rs) tensor is a map T:(Tp)r(Tp)s which is linear in all of its arguments i.e. it is multilinear. More specifically it takes

Given the basis {eμ} for Tp and {θμ} for Tp we can write the numbers,

Tν1νsμ1μr=T(θμ1,,θμr,eν1,,eνs), (46)

and as we will see these will be the components of our tensor. To appreciate their importance, consider a general set of s vectors ViTp, i=1,,s and r 1-forms ωjTp, j=1,,r written in terms of our basis,

Vi=(Vi)μeμ,ωj=(ωj)μθμ. (47)

We now plug them in as arguments to our (rs) tensor T to obtain,

T(ω1,,ωr,V1,,Vs) =T((ω1)μ1θμ1,,(ωr)μrθμr,(V1)ν1eν1,,(Vs)νseνs)
=(ω1)μ1(ωr)μr(V1)ν1(Vs)νsT(θμ1,,θμr,eν1,,eνs)
=(ω1)μ1(ωr)μr(V1)ν1(Vs)νsTν1νsμ1μr, (48)

where in the second line we used the fact that T is linear in all of its arguments. The above shows that if we know the numbers Tν1νsμ1μr, we know all the information about the tensor T we need to evaluate it on any set of 1-forms and vectors. Using the property (45) we can write,

T=Tν1νsμ1μreμ1eμrθν1θνs, (49)

since the above expression will give the same result with equation (2.3) after plugging in the argument,

T(ω1,,ωr,V1,,Vs)=T(ω1ωrV1Vs). (50)

The above considerations apply for any choice of basis {eν} and its dual {θμ}. We will now specialise to a coordinate basis {xν} and its dual {dxμ} to ask how would the components of the tensor transform under a coordinate transformation xμ=xμ(yν). We can write,

T =Tν1νsμ1μrxμ1xμrdxν1dxνs
=Tν1νsμ1μryμ1yμrdyν1dyνs (51)

and after using the transformation rules (27) and (41) we obtain,

Tν1νsμ1μr=i=1ryμixμij=1sxνjyνjTν1νsμ1μr. (52)

We can now introduce a couple of operations that we can do with tensors to produce new ones. The first one is the tensor product:

Definition 2.7.

Given a (pq) tensor S and a (rs) tensor W, we can define a (p+rq+s) tensor T=SW. Given q+s vectors ViTp, i=1,,q+s and p+r 1-forms ωjTp, j=1,,p+r the tensor product T takes the value,

T(ω1,,ωp+r,V1,,Vq+s)=
    S(ω1,,ωp,V1,,Vq)W(ωp+1,,ωp+r,Vq+1,,Vq+s). (53)

One can easily check that the higher rank tensor we construct in this ways is indeed a tensor since it satisfies the definition 2.6. In terms of components, we can write

Tν1,,νq+sμ1,,μp+r=Sν1,,νqμ1,,μpWνq+1,,νq+sμp+1,,μp+r. (54)

The second operation we can define produces tensors of lower rank and is called a contraction,

Definition 2.8.

Given a rank (pq) tensor T we can construct a (p-1q-1) tensor S according to

S(ω1,,ωp-1,V1,,Vq-1)=T(ω1,θ(μ),,ωp-1,V1,,e(μ),,Vq-1), (55)

where the index (μ) is being summed over and {θ(μ)} is the dual basis of {e(ν)}. Notice that we have have pq different ways of performing the contraction producing pq different tensors S.

In terms of components, starting from a (22) tensor Tμναβ we can produce the four inequivalent (11) tensors

TλμλαTλμαλTμλαλTμλλα. (56)

2.4 Differentiation

From a physics point of view, all classical physical laws are expressed as differential equations. It is crucial to understand differentiation on a differentiable manifold, in a way that is independent of coordinate systems. In this sense, a derivative should map tensors to tensors of different rank.

2.4.1 External differentiation of n-forms

Let’s first consider a form field Vμ described in a particular coordinate system {xν}. It is indeed very tempting to try and defined a tensor whose components are simply given by the partial derivatives xνVμ. We are certainly allowed to do this but the question is whether this object can be promoted to a (02) tensor. If this definition was really independent of the coordinate system, then someone using the coordinate system {yμ=yμ(xν)} would have to write the “components” yλVρ with Vρ the components of the original 1-form in their coordinate system.

Using the transformation rules (27) and (40) we have,

yλVρ=xνyλxσyρxνVσ+Vσ2xσyλyρ. (57)

The last term in the equation above spoils the transformation rule (52) for r=0 and s=2 and therefore we don’t have a tensor. Notice that the “naughty” term is symmetric in the free indices λ and ρ. Therefore, if we instead consider the object with the two indices antisymmetrised,

(dV)νμ2[xνVμ]=xνVμ-xμVν, (58)

the symmetric term in the transformation will drop out and we will be left with

[yλVρ]=xνyλxσyρ[xνVσ], (59)

which is exactly what we want in order to have a (02) tensor. In writing the above, we have introduced the symbol d which maps 1-forms to antisymetric (02) tensors.

The second , somewhat simpler, object we want to examine is the transformation rule of partial derivatives xμf of a function f. We already know from the rule (27) that under a change of coordinates

yμf=xνyμxνf (60)

and this certainly agrees with the rule (52) for r=0 and s=1. In a given coordinate system, we can therefore think of the partial derivatives of a function as the components of a (01) tensor or 1-form. Using the coordinate basis, we can write,

df=xνfdxν. (61)

We have once again used the symbol d which maps functions to 1-forms.

We saw that the exterior derivative maps functions to 1-forms and 1-forms to a (02) tensor which is antisymmetric in its indices. As we will see shortly, antisymmetric objects are special when partial differentiation is concerned. For this reason we define,

Definition 2.9.

A p-form is a tensor wμ1,,μp of rank (0p) which is antisymmetric under the exchange of any two of its adjacent indices. For the set of all p-forms defined on the differentiable manifold M we use the symbol Λp(M).

Under this definition, the exterior derivative maps a 1-form to a 2-form. More generally, the exterior derivative of a p-form wμ1,,μp,

(dw)μ1,,μp+1=(p+1)[μ1wμ2,,μp+1], (62)

which is obviously antisymmetric in all of its indices due to the antisymmetrisation and therefore a p+1 form. One can check that this object indeed transforms as a (0p+1) tensor under coordinate transformations. The above shows that we can think of the exterior derivative as a map d:Λp(M)Λp+1(M). To make things unified, we can think of functions as 0-forms which are mapped to 1-forms by the exterior derivative.

A natural question to ask is what happens when we apply the exterior derivative on a p-form w twice. The definition (62) is telling us that we would have to antisymmetrise the partial derivatives which would give zero. To see why this is so, we can consider the second exterior derivative of a function,

(d(df))μν=μ(df)ν-ν(df)μ=μνf-νμf=0, (63)

since the partial derivatives of a smooth function commute with each other.

2.4.2 The covariant derivative

In the previous subsection we saw that we define a derivative of p-forms by using only their partial derivatives. This is certainly important as we did not have to introduce any additional structure. However, we would still like to be able to have a notion of differentiation for general tensors which are not necessarily forms. For this reason we will have to introduce the covariant derivative:

Definition 2.10.

A covariant derivative is a map from (pq) tensors to (pq+1) tensors such that:

  1. 1.

    (T+S)=T+S

  2. 2.

    (TS)=(T)S+T(S)

  3. 3.

    It commutes with contraction

  4. 4.

    If f is a function on M then f=df

The above definition fixes the way that a covariant derivative acts on functions. As we will show shortly, its full action is completely determined by a set of of numbers with three indices Γνμλ. These are called the connection coefficients which are such that

e(μ)=Γνμλe(λ)θ(ν) (64)

telling us precisely how the covariant derivative acts on our chosen basis of the tangent space Tp.

In order to discuss the action of the covariant derivative on a general vector field V, we write it in terms of the basis vectors as V=Vμe(μ) and we think of the components Vμ as functions. Using the first two properties along with the fourth one we can write

V=(Vμe(μ))=e(μ)dVμ+Vμe(μ). (65)

In a coordinate basis we know that we can write dVμ=νVμdxν and after using the definition (64) we have

V =νVμμdxν+ΓνμλVμλdxν
=(νVμ+ΓνλμVλ)μdxν. (66)

In the last equation we only renamed the repeated indices. From the above we can read off the components

μVν=μVν+ΓμλνVλ, (67)

in a coordinate basis.

It might not be immediately obvious but the above properties also fix the action of the covariant derivative on 1-forms. To see this, we consider the contraction ϕ=Vνων which is a scalar function and we know that

μϕ=μϕ=μVνων+Vνμων. (68)

However, we know that the covariant derivative has to commute with contraction and we can also write,

μϕ =μVνων+Vνμων
=(μVν+ΓμλνVλ)ων+Vνμων, (69)

where we used equation (67). By simply comparing the two expressions for the covariant derivative of ϕ and by demanding that they hold for any vector V we conclude that,

μων=μων-Γμνλωλ. (70)

The next thing we would like to know is how the covariant derivative acts on the dual basis vectors θ(μ). In order to decide about that we will rethink about the covariant derivative on a general 1-form. This time we are going to write this as

ω =(ωμθ(μ))=ωμθ(μ)+ωμθ(μ)
=νωμθ(ν)θ(μ)+ωμθ(μ). (71)

By comparing the above with equation (70) we see that we must have,

θ(λ)=-Γμνλθ(ν)θ(μ). (72)

The above result allows us to write the covariant derivative of any tensor. By writing the tensor in terms of the basis as in (49) we find that,

λTν1,,νqμ1,,μp= λTν1,,νqμ1,,μp+Γλρμ1Tν1,,νpρμ2,,μp++ΓλρμpTν1,,νpμ1,,ρ
-Γλν1ρTρ,,νqμ1,,μp--ΓλνqρTν1,,ρμ1,,μp. (73)

When we defined the connection coefficients in equation (64), we chose to not call the symbol Γνμλ a tensor, despite the fact that it carries three indices. However, it does transform under a coordinate transformation and we want to know the way this is happening. The basic requirement is that the covariant derivative of a vector transforms as a (11) tensor. In a coordinate system different from the one used in e.g. (67) we would write something similar

μV~ν=μV~ν+Γ~μλνV~λ. (74)

By using the transformation rules (27) and (34) we can write

μV~ν=xρxμxνxσρVσ+xρxμ2xνxρxσVσ. (75)

Insisting that we should have

μV~ν=xρxμxνxσρVσ, (76)

for any vector V we obtain,

Γ~μλν=xσxλxρxμ[xνxγΓσργ-2xνxρxσ]. (77)

We see that the second term in the square bracket prevents the connection coefficients from forming a tensor but it works exactly in a way that makes the covariant derivative of a tensor another tensor.

It is worth noticing that the “naughty” term above is independent of the connection coefficients themselves. This is telling us that if we had two different connection coefficients Γμνλ and Γ¯μνλ, their difference

Sμνλ=Γμνλ-Γ¯μνλ, (78)

would be a (12) tensor. A particular choice is Γ¯μνλ=Γνμλ and then the tensor

Tμνλ=Γμνλ-Γνμλ, (79)

is the torsion tensor.

We know that partial derivatives of smooth functions commute. For the covariant derivative on the other hand we have that

[μ,ν]f=μνf-νμf=Tμνλλf, (80)

which is not necessarily zero. For the rest of the course we will assume that our covariant derivatives will have zero torsion.

We will now consider the commutator of derivatives on vectors and we write,

[μ,ν]Vλ=μνVλ-νμVλ. (81)

Using the covariant differentiation rule (2.4.2) for p=1 and q=1 we can write,

μνVλ=μνVλ+ΓμρλνVρ-ΓμνρρVλ. (82)

After a little algebra one can show the surprising fact that the right hand side doesn’t contain any derivatives of the vector field55 5 If we had not restricted our connection coefficients to have zero torsion the final result would read [μ,ν]Vλ=RλVρρμν-TμνρρVλ. In fact, it defines the Riemann tensor,

[μ,ν]Vλ=RλVρρμν, (83)

with

Rλ=ρμνμΓνρλ-νΓμρλ+ΓμσλΓνρσ-ΓνσλΓμρσ. (84)

Regarding the commutator of derivatives, we can ask the same question for 1-forms giving the result

[μ,ν]ωλ=-Rρωρλμν. (85)

2.5 The metric

In previous sections we have discussed the tangent space Tp which is a linear space. A natural additional structure to consider is the inner product in Tp for all points pM. This can be done by introducing the metric:

Definition 2.11.

The metric gμν is a (02) tensor which is symmetric in its indices gμν=gνμ and is non-degenerate with g=det(gμν)0.

The inner product of two vectors V,WTp is simply then g(V,W)=g(W,V)=gμνVνWμ. In the context of general relativity we will relax the usual positivity of the inner product. We define:

Definition 2.12.

The signature is the difference between the number of positive and negative eigenvalues of the metric. If all the eigenvalues are positive the manifold is called Riemannian. If one of the eigenvalues is negative, the metric is called pseudo-Riemannian.

Notice that it is meaningful to discuss the signature of the spacetime globally. The fact that the metric is a non-degenerate tensor prevents the flip of any of the signs of its eigenvalues as we navigate through the manifold. According to relativity, spacetime will simply be a pseudo-Riemannian manifold. In the case of empty spacetime this was simply flat space equipped with the Minkowski metric (17). Similarly to Minkowski space, we call a vector Vμ to be timelike if g(V,V)<0, null or lightlike if g(V,V)=0 and spacelike if g(V,V)>0.

In terms of notation, we will often write the components of the metric tensor in the form,

ds2=gμν(xλ)dxμdxν. (86)

Moreover, since the metric is a non-degenerate tensor, we can define the inverse (20) tensor gμν such that,

gμλgλν=δνμ. (87)

Now that we have the notion of length for the vectors in Tp, we can use our experience from special relativity in section 1 to define the length L[γ] of a curve γ:M. In a general manifold our requirements for the definition of L[γ] are that:

  • It has to be invariant under coordinate transformations

  • It has to be invariant under reparametrisations

The above are both satisfied by the definition

Definition 2.13.

The length L[γ] of a curve γ:M with parameter λ and tangent vector x˙μ is,

L[γ]=λiλf|gμν(xρ(λ))x˙μ(λ)x˙ν(λ)|𝑑λ. (88)

The above expression is manifestly invariant under coordinate transformations. Just like in the case of Minowski space in equation (21), the square root is there to take care of reparametrisation invariance.

The final fact we want to mention is true for any vector space with an inner product. In the context of differential geometry we refer to it as the “lowering” and “raising” of indices. The metric provides a natural isomorphism between Tp and Tp. This is simply the linear map ϕ:TpTp such that

(ϕ(V))(W)=g(V,W), (89)

which indeed makes ϕ(V) a well defined linear functional acting on the random vector W. In components we see that

ϕ(V)ν=VνgνμVμ. (90)

From now on we don’t need to write ϕ(V), we will only write Vμ with its index down. Moreover, we can lower and raise the indices of any tensor e.g.

Tμνλ=gσλTμν,σTμ=ρσgνρTμν.σ (91)

2.6 Parallel transport and the Levi-Civita connection

An important concept in defining straight lines is that of parallel transporting a vector (or a tensor) along a particular curve. From a physics point of view, we can understand free motion as the motion in which the tangent vector (or velocity) does not change along a spacetime curve. Before moving to the physics aspects of this statement, we might want to better understand parallel transport itself.

This is particularly intuitive and technically easy to understand in Minkowski space when using cartesian coordinates. Suppose that in that case we have a curve xμ(λ) with parameter λ. Suppose also that along the curve we also have a vector Aμ(λ) which remains parallel to itself as we move along the curve. It is natural to write that the condition for parallel transport in this case is simply

ddλAμ(λ)=0. (92)

In order to make further progress and be able to generalise this statement to a curved space, we consider a vector field Wμ(xν) such that when it is restricted on our curve, it is equal to the vector Aμ(λ). In order words the vector field Wμ is such that

Wμ(xν(λ))=Aμ(λ). (93)

The parallel transport condition then reads

x˙ννWμ=0VννWμ=0 (94)

where Vν is the tangent vector to our curve. More generally, we will be frequently using the fact that along a curve xμ(λ) we can always replace

Vμμddλ. (95)

From the above we see that in a general manifold with connection ν it makes sense to replace this condition by

VννWμ=0. (96)

For a tensor Tν1,,νqμ1,,μp that is parallel transported along a curve γ with tangent Vμ it makes sense to have,

VλλTν1,,νqμ1,,μp=0. (97)

In the previous section we introduced the notion of an inner product between vectors and vector fields. It is natural to ask the question of what happens with the inner product when we parallel transport two vector fields Aμ and Bμ along a curve γ with tangent vector Vμ. We would certainly expect that we should have a constant inner product along γ provided that

VλλAμ=0,VλλBμ=0. (98)

However, what we have instead is,

Vλλg(A,B)=Vλλ(gμνAμBν)=VλAμBνλgμν, (99)

which is not zero in general unless the covariant derivative of gμν vanishes. It is only then that we can make sense of parallel transport of vectors while maintaining their inner product.

Definition 2.14.

The metric singles out a unique torsion free covariant derivative. The metric compatible or Levi-Civita connection defined by requiring

λgμν=0. (100)

In fact, as we will show, we can express the Levi-Civita connection Γμνλ in terms of the metric components. In order to do this, we write the defining condition three times with the three indices interchanged,

λgμν =λgμν-Γλμσgσν-Γλνσgμσ=0
μgλν =μgλν-Γλμσgσν-Γμνσgλσ=0
νgλμ =νgλμ-Γλνσgσμ-Γμνσgλσ=0. (101)

By subtracting the bottom two equations and from the top one we find,

Γμνρ=12gρλ(μgνλ+νgλμ-λgμν), (102)

which is a unique solution as promised. These are called the Christoffel symbols.

2.7 Curvature

In this section we will examine curvature via the Riemann tensor that we defined in equation (84). We will do this for the Levi-Civita connection that we introduced in the previous section with connection coefficients given by the Christoffel symbols in equation (102). We now define Rρσμν=gρλRλσμν for which there are four algebraic properties which hold by construction:

  • Rρσμν=-Rσρμν,

  • Rρσμν=-Rρσνμ,

  • Rρ[σμν]=0, (or Rρσμν+Rρνσμ+Rρμνσ=0),

  • Rρσμν=Rμνρσ.

It is worth noting that the last identity is not independent of the first three. The above identities significantly reduce the number of independent components of the Riemann tensor.

For a generic (04) tensor in n-dimensions there are n4 independent components as there as n different choices for each index. However, the Riemann tensor satisfies three independent symmetry properties under the exchange of its indices and that significantly constrains the independent ones. From the first two properties, we see that it is antisymmetric under the exchange of the first two and the last two indices. This is telling us that for each pair of these indices there are n(n-1)/2 independent choices. By momentarily ignoring the third constraint this is telling us that we would have n2(n-1)2/4 independent components. Since the fourth property is not independent, we would only have to count how many constraints we need to impose due to the third property and subtract from the n2(n-1)2/4 components allowed from the first two.

In order to count these constraints we need to find how many choices we have for the first index ρ, how many choices we have for the antisymmetric group of three indices [σμν] and multiply them. For the ρ index we simply have n choices. For the group of the three anti-symmetrised indices, we know that they all have to be different from each other, otherwise we would get something trivial due to the antisymmetry. Simple combinatorics then suggest that we have (n3)=n!3!(n-3)! choices. Putting everything together, we conclude that the Riemann tensor has

n2(n-1)24-nn!3!(n-3)!=112n2(n2-1), (103)

independent components.

We see that in one dimension the Riemann tensor is always trivial which is obvious from the fact that it has at least one antisymmetric pair of indices and there is no room to select different indices in one dimension. A more geometric way to see this is to notice that even if we have a non-trivial metric in dimensions ds2=f(x)dx2, we can always find a coordinate transformation y=y(x) to bring the metric to the flat form ds2=dy2. We are particularly interested in the four dimensional case where we find that the Riemann tensor has 20 independent components.

Apart from the algebraic properties, the Riemann tensor satisfies the second Bianchi identity

[λRρσ]μν=0, (104)

which is a set of differential identities.

The Riemann tensor is going to be the object that will be important in any theory of gravity. It is built out of derivatives of the metric in a way that it forms a tensor and it is therefore geometric. In loose terms, it is going to be the kinetic term in the equations of motion of the metric which is a dynamical object itself. Even though the Riemann tensor seems to be an object which contains all the coordinate independent information, Einstein’s General Relativity is based on the Ricci tensor, which is defined by

Definition 2.15.
Rμν=Rλ,μλν (105)

which is symmetric Rμν=Rνμ and has only n(n+1)/2 independent components in n dimensions.

Another useful quantity we can define characterising the curvature of our geometry is the Ricci scalar,

Definition 2.16.
R=gμνRμν. (106)

Finally, by using the above objects, we can define the Einstein tensor,

Definition 2.17.
Gμν=Rμν-12gμνR, (107)

for which the Bianchi identity (104) implies

μGμν=0. (108)

This identity is one of the most important ones in this section. We will have the chance to appreciate it when we discuss Einstein’s equations for gravity in later sections.

2.8 Integration on manifolds

Several quantities like the action and conserved charges in physics require a notion of integration. Something quite tempting is to define the integral of a function according to,

I=ϕ1(U)dnxf(xμ), (109)

for a given coordinate system (U,ϕ1) with coordinate xμ. Someone that prefers to work in a different coordinate system (U,ϕ2) with coordinates yν=yν(xμ) should be able to use the same definition and find the same number. In their coordinate system they would write,

I=ϕ2(U)dnyf(yν)=ϕ1(U)dnx|yνxμ|f(yν(xμ))I. (110)

The above shows that we need to come up with something slightly more sophisticated than integrating functions in a particular coordinate system. The quantity that shows up and spoils things for us is the Jacobian determinant of the coordinate transformation.

The next thing we want to try is to integrate n-forms on an n dimensional manifold. Before doing that, let’s see how an n-form transforms under coordinate transformations in more detail. Because an n-form ωμ1,,μn is totally antisymmetric, all the components are parametrised by a single number according to

ωμ1,,μn=Ωεμ1,,μn, (111)

where εμ1,,μn is the totally antisymmetric symbol with ε0,,n-1=1. In a different coordinate system with coordinates yμ we will have that

ωμ1,,μn=Ωεμ1,,μn. (112)

However, we know from the transformation rules (52) for p=0 and q=n that we must have

ωμ1,,μn =xμ1yμ1xμnyμnωμ1,,μn
Ωεμ1,,μn =xμ1yμ1xμnyμnΩεμ1,,μn
Ω =|xμyν|Ω. (113)

We see that the components of n-forms transform with the inverse of the Jacobian determinant. This allows us to write the integral of an n-form

I=Mωϕ(M)dnxΩ(xμ), (114)

and the previous discussion shows that the outcome of integration will be coordinate system independent.

We see that even though we cannot define the integral of a scalar function, we still define the integrals of n-forms. However, we would still like to find a notion in which we can integrate functions. Before attempting that, we need to mention a universal n-form which exists for all manifolds equipped with a metric gμν. This is the volume form vol with components,

volμ1,,μn=|g|εμ1,,μn, (115)

where g is the determinant of the metric tensor. For a pseudo-Riemannian manifold the absolute value can be removed by introducing a minus sign i.e. |g|=-g.

After this definition, we can think of the integral of a scalar function f as

I=Mfvol=ϕ(M)dnx-g(xμ)f(xμ). (116)

3 Free Particles and Fields

3.1 Free particles

As we mentioned in section 1, Einstein’s idea about gravity is that we experience gravity by moving on straight lines inside a non-trivial curved background. One might try to use two seemingly different definitions for when a curve γ:M is a straight line. The two proposals are that a straight line:

  • extremises its length L[γ] as defined in equation (25)

  • has its tangent vector Vμ parallel transported along its trajectory i.e. there is no acceleration and we have to satisfy the geodesic equation

    VμμVν=αVν, (117)

    with α some scalar function.

We might be surprised by the potentially non-trivial right hand side in equation (117). After all, when we defined parallel transport in equation (97) the right hand side was trivial. The key point is that equation (97) is invariant under reparametrisations of the curve since those do not alter the tensor that is being transported. The difference in this case is that the tensor that is parallel transported coincides with the tangent vector and that depends on the parametrisation of the curve.

To see this, suppose that we change the parameter λ of our curve to τ according to λ=f(τ). The corresponding tangent vectors are Vμ=ddλxμ(λ) and Yμ=ddτxμ(τ). By using the chain rule, we can show that these are related by,

Yμ=fVμ. (118)

If we satisfy the geodesic equation (117) in the τ parametrisation with α=β then by using the replacement rule (95) we can write,

YμμYν =βYν
fVμμ(fVν) =βfVν
f′′Vν+(f)2VμμVν =βfVν (119)
VμμVν =fβ-f′′(f)2Vν, (120)

from where we read off that α=fβ-f′′(f)2 for the λ parametrisation and therefore α depends on our parameter choice. This is telling us that we can choose it so that α=0 in the geodesic equation (117).

Definition 3.1.

The parametrisation for which the geodesic equation (117) has α=0 is called affine.

The geodesic equation (117) when written in coordinates reads,

VμμVν+ΓμλνVμVλ =αVν
x¨ν(λ)+Γμλν(xρ(λ))x˙μ(λ)x˙λ(λ) =αx˙ν(λ), (121)

where we used the replacement rule (95). In order to show the equivalence between the two proposals for the trajectory of a free particle, we can also extremise the length (25) with respect to the coordinate xμ(λ) of the curve to find the same equation with,

α=ddλln|gμνx˙μx˙ν|. (122)

Notice that we can show the above expression for α starting from the geodesic equation (117) and contracting both sides with the vector Vμ.

From the above expression for α we see that in the affine parametrisation the norm of the tangent vector remains invariant throughout the whole motion of the particle.

A question we would like to address is whether there is an action principle for the curve xμ(τ) from which we can obtain the geodesic equation (117) in an affine parametrisation with α=0. One can easily check that the action

L[xμ]=𝑑τgμν(xλ(τ))Vμ(τ)Vν(τ), (123)

yields the correct equations of motion. As we might had expected, this is no longer invariant under reparametrisations as the parameter has been chosen to be affine.

3.2 The Newtonian limit

A natural question to ask is what happens in a limit where our particles move at small velocity66 6 For a particle which is not “moving” at all in space in our coordinate system we would have x˙i=0 and x˙0=c. with x˙ix˙0 in a very weak, almost static gravitational background which is very close to being Minkowski spacetime. This is suggesting that background metric is infinitesimally close to Minkowski space. The metric we can write then takes the form,

gμν=ημν+hμν, (124)

with hμν a small correction. Moreover, since our background is almost static, we practically assume that the time partial derivatives are much smaller than the spatial ones i.e. 0hμνihμν.

We now examine the geodesic motion of our free particle by choosing an affine parameter τ in equation (3.1),

x¨i+Γμνix˙μx˙ν= 0
x¨i+Γ00ix˙0x˙0= 0
x¨i+Γ00ic2= 0. (125)

In the above equation we dropped the subleading terms which contain x˙i and we approximated x˙0c which is true for slowly moving particles. By using equation (102) for the Christoffel symbols we can write,

Γ00i =12gij(0g0j+0g0j-jg00)
Γ00i -12ηijjh00, (126)

where we dropped all the partial derivatives in time since our background is almost static.

Setting h00=-2Φ, we have the equation of motion

d2dτ2xi=-c2iΦd2dt2xi=-iΦ (127)

which is very similar to Newton’s equation of the motion of a massive particle in a gravitational potential Φ.

At this point need to stress that the equivalence principle we discussed in section 1 was crucial to find this kind of agreement. If the mass of inertia was not the same with the gravitational mass, Newton’s law would give an acceleration that would depend on them and agreement with Einstein’s theory would not be possible.

3.3 Classical fields and the stress tensor

In the previous section we dealt with free particles starting from our intuition in flat space and in cartesian coordinates. The key step was to replace,

VμμTVμμT. (128)

For a general field theory, the steps to understand it in curved spacetimes is very similar, the rule is to replace that partial derivatives of Minkowski space in cartesian coordinates by covariant derivatives,

μμ. (129)

The simplest example we can think of is a scalar field ϕ with a potential V(ϕ). In Minkowski spacetime with cartesian coordinates the equation of motion reads

μμϕ-V(ϕ)=0. (130)

In a more general spacetime this should be replaced by

μμϕ-V(ϕ)=0. (131)

Note that this has nothing to do with curvature, it is all about writing the correct equation of motion in a coordinate free fashion.

For the case of the Maxwell field Aμ, the equations of motion in the cartesian coordinates of Minkowski read,

μFμν =0,
Fμν =μAν-νAμ. (132)

According to our previous discussion these should be replaced by

μFμν =0,
Fμν =μAν-νAμ=μAν-νAμ. (133)

Turning now our attention to the action for e.g. the scalar field we have that

Sscalar=dnx(-12μϕμϕ-V(ϕ))dnx-g(-12μϕμϕ-V(ϕ)), (134)

while for the Maxwell we have

SMaxwell=dnx(-14FμνFμν)dnx-g(-14FμνFμν). (135)

A good exercise is to show that the extremisation of the actions (134) and (135) indeed produces the equations of motion (131) and (3.3) respectively.

From the above we see that, for a fixed background metric, the action of e.g. a scalar field ϕ is also a functional of the metric S[gμ,ν,ϕ] according to

Smatter[gμ,ν,ϕ]=dx(gμν,ϕ,μϕ), (136)

with being the Lagrangian density. The equation of motion for the scalar is then given by

δSmatterδϕ(xμ)=0. (137)

We now want to do something that might look weird.

For the time being, the metric does not satisfy any equations of motion, it is just a fixed background. We now want to vary the action with respect to both the field ϕ as well as the background metric to find,

δSmatter[gμν,ϕ]=dnx[δSδgμν(xλ)δgμν(xλ)+δSδϕ(xλ)δϕ(xλ)]. (138)

After evaluating the above variation on a solution of the field equations of motion the second term drops out and we find,

δSmatter[gμν,ϕ]=12dnx-gTmatterμνδgμν, (139)

where we defined the stress-enrgy tensor,

Tmatterμν(xλ)=21-g(xλ)δSmatterδgμν(xλ), (140)

which is symmetric in its indices. This is an important tensor and as we will show it always satisfies

μTmatterμν=0. (141)

In order to show this, we consider a small change of coordinates,

xμ=xμ-εξμ(xν)xμ=xμ+εξμ(xν), (142)

with 0<ε1. We now want to write an expression for the change of the metric components δgμν(xλ) up to order 𝒪(ε). In order to do this we start from the general rule (52) for p=0 and q=2:

gμν(xλ) =xσxμxρxνgρσ(xλ)
=(δμσ+εμξσ)(δνρ+ενξρ)gσρ(xλ)
gμν(xλ)+ε(gσνμξσ+gμρνξρ)
=gμν(xλ+εξλ)+ε(gσνμξσ+gμρνξρ)
gμν(xλ)+ε(ξσσgμν+gσνμξσ+gμρνξρ)
δgμν =ε(ξσσgμν+gσνμξσ+gμρνξρ)
δgμν =ε 2(μξν). (143)

We saw that the variation of the on-shell action with respect to the metric yields the stress tensor in equation (139), independently of the variation we are performing. We will now consider the variation generated by the small change of coordinates (142). Since the action is defined through a coordinate independent integral, for this particular variation we must have,

δSmatter[gμν,ϕ] =0
εdnx-gTmatterμν(μξν) =0
dnx-gξνμTmatterμν =0, (144)

for any smooth vector ξμ and therefore we must have (141).

3.4 Symmetries and conservation laws

Under an infinitesimal coordinate transformation (142), the change of metric is given by (3.3). This naturally leads to the discussion of symmetries in general spacetimes. A small coordinate transformation is a symmetry if it leaves the metric invariant i.e. when δgμν=0. In this case the coordinate transformation is generated by a Killing vector Kμ such that

(μKν)=0. (145)

Conversely, if the background metric is such that it allows the existence of a Killing vector Kμ satisfying (145), it is invariant under the infinitesimal coordinate transformation

xμ=xμ-εKμ(xν). (146)

In section 1 we discussed Minkowski spacetime and its symmetries which form for the Poincare group. In our current language Minkowski space in four dimensions has the metric

ds2=ημνdxμdxν=-(dx0)2+(dx1)2+(dx2)2+(dx3)2, (147)

admitting the Killing vectors associated to:

  • Translations, T(μ)=μ,

  • Lorentz transformations, L(μν)=xμν-xνν .

In the notation we used above the indices in the brackets on the left hand sides label different vectors, they are not spacetime indices.

In order to see how the existence of space-time symmetries imply the existence of conserved quantities we will consider a free particle moving along a geodesic xμ(τ) with affine parameter τ. If Vμ is the tangent vector and Kμ is a Kiling vector, we consider the scalar quantity Q=VμKμ. In order to show that this is conserved along the motion of the particle we examine the derivative,

ddτQ=VννQ=Vνν(VμKμ)=Kμ(VννVμ)+VμVμ(μKν)=0. (148)

The first term in the last equation is zero because of the geodesic motion (117) while the second term is zero according to (145) since Kμ is Killing.

Therefore, depending on the Killing vectors that our background might possess we can write different conserved quantities. Following closely the terminology from classical mechanics time translations are associated to energy while spatial translations are associated to linear momentum. Finally, spatial rotations are associate to angular momentum.

Turning our attention to classical field theory, the existence of a conserved quantity is associated conserved currents Jμ which satisfy the current conservation equation

μJμ=01-gμ(-gJμ)=0. (149)

The equality between the two expressions can be shown by simply using the expression (67) for the covariant derivative of a vector. The conserved quantity can then be constructed by integrating the time component of the conserved current along a surface with constant x0,

Q(x0)=𝑑xn-1-gJ0. (150)

To se that this is constant in x0 we consider the time derivative

ddx0Q(x0)=𝑑xn-10(-gJ0)=-𝑑xn-1i(-gJi)=0, (151)

where we used the current conservation equation (149) to express the time derivative in terms of spatial ones and the fact that the integral of a divergence reduces to a surface integral at infinity, where our fields all become trivial.

The task now is now to construct a conserved current Jμ, given a field theory and a Killing vector Kμ. The statement is that for each Killing vector we can construct a conserved current given by the contraction

Jμ=KνTmatterμν. (152)

We now want to check its divergence,

μJμ=μ(KνTmatterμν)=KνμTmatterμν+Tmatterμν(μKν)=0, (153)

which indeed vanishes. In the derivation above we used equation (141) which is true independently of the matter content of the theory. Moreover, in order to write the second term in its symmetrised form we used that the stress tensor is symmetric. Finally we used the Killing equation (145) to put the second term equal to zero.

Once again we can associate momentum and energy to the stress energy tensor Tmatterμν. It therefore makes sense to think of it as the potential sources in the equations for the metric which in Newton’s language would be the analogue of the gravitational potential.

4 Einstein’s Theory

In this section we will discuss Einstein’s answer to the second item on our wish list for a theory of gravity. In previous section we discussed how gravity affects the motion of a particle and the state of matter fields. Here we will see how matter affects the spacetime curvature.

4.1 The equations of motion of gravity

An attempt to write an equation of motion for the metric in the presence of matter fields would be something of them form

Wμν=κTμν. (154)

The tensor Wμν will be constructed from derivatives of the metric and κ will be a constant proportionality that we will later fix by taking a weak gravity limit and demanding that we recover Newton’s gravity.

A first guess would be to say that Wμν is simply the Ricci tensor that we wrote in definition 2.15. However, that would be inconsistent as the right hand is simply the stress tensor which is by construction divergence free. This leads us to consider the Einstein tensor in definition 2.17 which has the desired property. We therefore have that a consistent equation for gravity has the form

Rμν-12gμνR=κTμν. (155)

By contracting with the metric, we find that

R-2R=κgμνTμν=κTμμR=-κTμ.μ (156)

This allows us to write Einstein’s equation in the form

Rμν=κ(Tμν-12gμνTλ)λ. (157)

In the remaining section we will carry out the necessary approximations to compare with Newton’s (4) after the identification we made in subsection 3.2 for the gravitational potential Φ. We recall that in the Newtonian limit we are close to Minkowski spacetime with a small correction according to the metric (124). For the matter fields, we will assume that they are close to being static and TijT00. This gives us the leading order trace

Tμμ-T00. (158)

Writing the equation 00 component of Einstein’s equation (157) we obtain

R00κ2T00. (159)

We now recall the definition 2.15 and the expression for the Riemann tensor (84) to write

R00=Rj0j0 =jΓ00j-0Γj0j+𝒪(h2)
=jΓ00j+𝒪(x0)-12iih00=iiΦ. (160)

In the approximations above we dropped all the terms which are higher order in the perturbation hμν as well as the time derivatives which are small in the Newtonian limit. In the final step we used our identification for the gravitational potential from section 3.2. The last step is to identify the energy density with the mass density in the non-relativistic limit T00=ρ giving,

iiΦ=κ2ρ. (161)

This dictates the identification,

κ=8πGN. (162)

4.2 An action principle for gravity

In this subsection we will write down an action which yields Einstein’s equation of motion (155). The right hand side is the stress tensor which we know that we can get from simply varying the matter sector of the action with respect to the metric. This suggests that we can write the full action as a sum

S[gμν,ϕ]=Sgravity[gμν]+Smatter[gμν,ϕ]. (163)

After varying with respect to gμν we gave the equation of motion

δSgravityδgμν+δSmatterδgμν=0
δSgravityδgμν-12-gTμν=0, (164)

where we used the definition of the stress tensor in equation (140). After comparing with (155) we see that we need to find a functional Sgravity[gμν] such that when varying with respect to the metric gμν we will find,

δSgravityδgμν=12κ-g(Rμν-12gμνR). (165)

The proposal we will check is that the correct gravitational action is the Einstein-Hilbert action,

Sgravity[gμν]=SEH=12κdnx-gR. (166)

We start by performing the variation

δSEH=12κdnx-g[gμνRμνδln-g+Rμνδgμν+gμνδRμν]. (167)

In order to treat the first term, we note the matrix identity

δlndetM=αβ(M-1)αβδMβα, (168)

which when we apply for the case of the metric we find

δln-g=-12gμνδgμν. (169)

The final term we would like to consider is the last one containing the variation of the Ricci tensor which can be written in terms of the variation of the Riemann tensor

δRμν=δRλ.μλν (170)

This might seem daunting but it is much easier than it seems if we don’t vary with respect to the metric yet. Instead of doing that we will take a step back and we will consider the variation of the Riemann tensor with respect to the connection coefficients, before varying those with respect to the metric. If we do that, we can write

δRλ=μρνρδΓνμλ-νδΓρμλ. (171)

This is a meaningful expression since the variation δΓμνλ can be thought of as a difference of connections and according to our discussion around equation (78). It is then meaningful to consider its covariant derivative and for the last term in the variation (167) we can write

gμνδRμν=λ(gμνδΓμνλ-gλρδΓβρβ), (172)

showing that it is a total derivative and therefore a surface term which cannot enter the equations of motion and which we can drop. At the end of the day we have that

δSEH =12κdnx-g[-12gμνR+Rμν]δgμν, (173)

giving the desired result.

5 Black Holes

5.1 The Schwarzschild solution

In this section we will look for the simplest solutions in Einstein’s theory in the absence of matter fields. In this case, we will have to look for Ricci flat spacetimes with,

Rμν=0. (174)

More specifically we will look for spacetimes with spherical symmetry. One such spacetime we know already and which is a solution of (174) is Minkowski space which we can write in polar coordinates according to,

ds2 =-dt2+dx2+dy2+dz2
=-dt2+dr2+r2(dθ2+sin2θdφ2)
=-dt2+dr2+r2dΩ22. (175)

In the above metric we have introduced the metric of unit radius two dimensional sphere

dΩ22=dθ2+sin2θdφ2, (176)

which is invariant under three dimensional Euclidean rotations.

In order to find more general spherically symmetric solutions, we will keep this part of the metric but generalise the rest of it,

ds2=γtt(t,r)dt2+2γtr(t,r)dtdr+γrr(t,r)dr2+γΩΩ(t,r)dΩ22. (177)

The above metric is the most general we can write which preserves spherical symmetry. We can do slightly better than this to write down something simpler by exploiting coordinate transformations of the form,

t t(t,r),
r t(t,r), (178)

to set γtr=0 and γΩΩ=r2. We now have a simpler metric of the form,

ds2=-e2α(t,r)dt2+e2β(t,r)dr2+r2dΩ22, (179)

and the task is to find the functions α(t,r) and β(t,r) which solve Einstein’s equation (174).

The non-trivial components of the Ricci tensor read

R00 =β¨+β˙2-α˙β˙+e2(α-β)[α′′+α2-αβ+2rα],
R11 =-α′′+α2-αβ-2rα+e2(β-α)[β¨+β˙2-α˙β˙],
R01 =2rβ˙,R22=e-2β[r(β-α)-1]+1,
R33 =sin2θR22. (180)

In the above equations we have used the notation f˙(t,r)=tf(t,r) and f(t,r)=rf(t,r).

We now examine what Einstein’s equations imply for our functions,

R01=0β˙(t,r)=0β=β(r), (181)

showing that β is only a function of the radial coordinate r. We now consider the time derivative,

tR22=0trα(t,r)=0α(t,r)=f(r)+g(t), (182)

and therefore the function α can only be the sum of two functions f and g that depend only on the radial and time coordinates respectively. However, if we look back at our original ansatz in equation (179) we can see that we can absorb the function g(t) in a time redefinition according to,

dt~=eg(t)dt=d(𝑑teg(t)). (183)

We can therefore set g(t)=0 without any loss of generality. We now examine the linear combination,

e2(β-α)R00+R11=2r(α+β)=0α(r)=-β(r)+c, (184)

with c a constant which we can also set to zero via a transformation of the time coordinate.

The final equation which we have not solved yet is,

R22=0e2α[2rα+1]=1e2α=1+μr, (185)

where μ is a constant of integration for the ordinary differential equation we solved. The final form of the metric is then

ds2=-(1+μr)dt2+(1+μr)-1dr2+r2dΩ22. (186)

What we showed above is rather striking. All we imposed in our original ansatz (179) was spherical symmetry but Einstein’s equations constrained both functions α and β to be independent of time. This is also known as Birkhoff’s theorem.. More precisely speaking, any time dependence would simply be artificial coming from time dependent coordinate transformations. The final result is a static spacetime which can be geometrically characterised by,

Definition 5.1.

A metric with a timelike Killing vector Kμ is called stationary. If [μKν]=0 it is called static.

We now want to go back to our solution (186) and understand it better. By taking the limit r we see that the metric approaches Minkowski space time in spherical coordinates as in equation (5.1). Therefore, when looked from far away, this suggests that there is a pointlike object sitting at the origin, curving the spacetime. Since its effect seem to become weak at infinity, it makes sense to examine the deviation of g00 from Minkowski spacetime and treat it as a Newtonian gravitational potential according to the discussion of section 3.2

g00-1-2ΦΦ=μ2r. (187)

It is natural then to want to interpret the solution (186) as the spacetime created by a point-like object of mass M sitting at the origin space. This suggests the identification

μ=-2GNM. (188)

After this, our metric takes the form,

ds2=-(1-2GNMr)dt2+(1-2GNMr)-1dr2+r2dΩ22, (189)

which is known as the Schwarzschild black hole solution.

5.2 Motion around a black hole

In this subsection we would like to examine the motion of a free particle moving inside the Schwarzschild black hole background of equation (189). We will do this by examining the geodesic equation (117) in its affine parametrisation with α=0. As we argued in section 3.1, we can equivalently study the action (123) in the background (189),

L =𝑑τ[-(f(r(τ))t˙2(τ)+f-1(r(τ))r˙2θ(τ)+r2(τ)(θ˙2(τ)+sin2θ(τ)ϕ˙2(τ)))],
f(r) =1-2Mr. (190)

Due to the large symmetry of the background we won’t use them much but it is good to list the equations of motion for completeness:

ddτ[(1-2Mr)t˙]=0,ddτ[r2sin2θϕ˙]=0,-ddτ(r2θ˙)+r2sinθcosθϕ˙2=0,
2ddτ[-(1-2Mr)-1r˙]-2Mr2t˙2-(1-2Mr)2Mr2r˙2+2r(θ˙2+sin2θϕ˙2)=0 (191)

We now recall that for the Schwarzschild space we have the Killing vectors corresponding to time translations and spatial rotations:

T(t) =t,
L(1) =ϕ,
L(2) =-cosϕθ+sinϕcotθϕ,
L(3) =sinϕθ+cosϕcotθϕ. (192)

According to our discussion on conserved quantities in section 3.4, to each Killing vector we can associate a conserved quantity:

E =-(T(t))μx˙μ=(1-2Mr)t˙,
L =(L(1))μx˙μ=r2sin2θϕ˙,
Lx =(L(2))μx˙μ=-cosϕθ˙r2+cosϕcotθL,
Ly =(L(3))μx˙μ=sinϕθ˙r2+sinϕcotθL. (193)

The first conserved quantity is the energy of the particle which is conserved due to the time translation symmetry of the background. The last three conserved quantities form the vector of the angular momentum which is conserved both in magnitude and direction. Depending on our initial conditions we can without loss of generality choose it to be along the z-axis. We can do this by choosing a frame in which we initially have θ˙=0 and θ=ϕ/2. This gives Lx=Ly=0 and their conservation is telling us that these will remain constant throughout the whole motion and therefore θ˙ and θ will not change their values.

The conservation laws are effectively solving the first three equations we have listed in equation (5.2). We would therefore still need to solve the last one which is related to the variation of the action (5.2) with respect to the radial coordinate. However, this won’t be necessary as there is another “conservation law” related to the choice of the affine parameter. As we discussed in section 3.1 in the affine parametrisation the norm of the tangent vector of the geodesic remains constant and therefore,

-(1-2Mr)t˙2+(1-2Mr)-1r˙2+r2(θ˙2+sin2θϕ˙2) =-ε,
(1-2Mr)-1(-E2+r˙2)+L2r2 =-ε,
12E2=12r˙2+Veff(r) , (194)

where we used the conservation quantities in equation (5.2) and we have also defined,

Veff(r)=12ε-εMr+L22r2-ML2r3. (195)

We will be calling this function the effective potential as the equation determining the radial variable looks similar to that of a non-relativisteic particle moving in central effective potential Veff(r). In fact, the leading terms as we take the r limit are identical to that of a particle moving around an object of mass M,

Veff(r)12ε-εMr+L22r2, (196)

and this is not entirely surprising given our discussion in section 3.2 and the fact that as we move to infinity gravity becomes weak. Loosely speaking, the contribution of General Relativity is the extra term in (196) that goes like 1/r3 and becomes dominant at small distances, where gravity becomes strong.

Some significant remarks about it are:

  • It is bounded from above

  • For massless, light-like particles with ε=0 there is an circular orbit at r=r0. We have that

    Veff(rc)=0,-L2r03+3ML2r04=0r0=3M, (197)

    and to check its stability we compute,

    V′′(r0)=-L2M434<0, (198)

    showing that it is unstable.

  • For massive particles with ε=1 we have two circular orbits at r=r± with

    Veff(r±)=0Mrc2-L3rc3+3ML2rc4=0
    r±=12M(L2±L4-12M2L2). (199)

    This is different when compared to the non-relativistic motion of particles. However, by checking the sign of the second derivative of the effective potential, we can conclude that only the orbit with r=r+ is stable.

    We will now examine the spherical region at r=2M, where the coordinate system we chose to construct the Schwarzschild solution (189) is singular. As we will see this is precisely where the event horizon of our black hole is. We will do this by shooting a ray of light, with ε=0 in equation (196), directly towards the center of our black hole. For this motion, we will need to have zero angular momentum and therefore ϕ˙=0.

    We have that,

    drdτ =-E (200)
    drdtdtdτ =-E
    drdt =-(1-2Mr). (201)

    In the second equation we have simply used the conservation laws in (5.2). We see two strikingly different pictures depending on how we examine the trajectory of the ray of light. Equation (200) has a simple solution r=r0-Rτ and this is telling us that after letting pass enough proper time, the ray of light will cross the sphere at r=2M without something dramatic happening. On the other hand, if time is measured by the t coordinate of our coordinate system, the ray of light slows down as it approaches the horizon and we will never see it crossing it. As observers sitting on the outside, we will never be able to send information inside the event horizon and actually see it happening.