1  Scalar fields and vectors

1.1 Introduction

This term we will study how to generalise ordinary calculus to functions/maps of the form \[ f\,:\,\Real^m \to \Real^n, \quad m,n \in \mathbb{N}. \] The objects of interest are vectors, hence this subject is usually called vector calculus (or sometimes multivariable calculus).

Depending on the choice of \(m\) and \(n\), there are different applications. We will mostly stick to 3-d space, but even this gives many possibilities. Maps from \(\Real\to\Real^3\) describe curves, while those from \(\Real^2\to\Real^3\) describe surfaces. Once we have a curve or a surface, we will be able to use these as domains over which to integrate either scalar fields, which are maps \(\Real^3\to\Real\), or vector fields, which are maps \(\Real^3\to\Real^3\). We will also study differentiation of scalar and vector fields, and see how the Fundamental Theorem of Calculus generalises to higher dimensions.

The subject is interesting in its own right, but this course is also a basic foundation for many Level III and IV courses in both Pure and (especially) Applied Mathematics.

In this first topic we will review what you have already learnt about scalar fields in Calculus I, and about vectors in Linear Algebra I. We will also learn how to use index notation for simpler manipulation of vector equations.

1.2 Revision: Scalar fields

A scalar field on \(\Real^n\) is a map \(f\,:\,\Real^n\to\Real.\)

In Calculus I you saw examples on \(\Real\) and \(\Real^2\). This term we will also need scalar fields on \(\Real^3\). Here we will review the important concepts from Calculus and give some 3-d examples.

1.2.1 Visualisation

We can visualise a scalar field \(f\) on \(\Real^2\) as the height of a surface, \(z=f(x,y).\) In this case the level sets \(f(x,y)=c\) for \(c\in\Real\) are called contours of \(f.\)

The contours are curves given by \(f(x,y)=c\) for some (fixed) \(c\in\Real\). Here, \[ \sin\sqrt{x^2 + y^2} = c \qquad \implies x^2 + y^2 = \arcsin^2{c}. \] So the contours are circles (of radius \(\arcsin{c}\)), centred at \((x,y)=(0,0)\). Here is my sketch:

Sketch of contours.

For scalar fields on \(\Real^3\), we can no longer view the field itself as a surface. Moreover, the level sets \(f(x,y,z)=c\) are no longer curves but are themselves 2-d surfaces, sometimes called isosurfaces.

These are concentric spheres \(x^2 + y^2 + z^2 = c\), centered on the origin. Notice that each of these isosurfaces is a surface, but \(z(x,y)\) is only defined implicitly, and in general is not a single-valued function of \(x, y\).

Sketch of spherical isosurface.

Calculating isosurfaces from a more complicated scalar field on a computer is a non-trivial computational problem.

1.2.2 Differentiation

The partial derivatives of \(f\) are encoded in its gradient vector, which for \(f:\Real^3\to\Real\) has the form \[\begin{eqnarray} \nabla f = \ddy{f}{x}\eb_1 + \ddy{f}{y}\eb_2 + \ddy{f}{z}\eb_3 = \left(\ddy{f}{x}, \ddy{f}{y}, \ddy{f}{z}\right)^\top. \end{eqnarray}\] Each component of \(\nabla f\) tells us the rate of change of \(f\) if we move in the corresponding coordinate direction.

If we want to know the rate of change of \(f\) in a general direction, not necessarily aligned with one of the coordinate axes, then we take the scalar product \(\hat{\nb}\cdot\nabla f\), where \(\hat{\nb}\) is a unit vector pointing in the required direction. This is called a directional derivative of \(f\).

Definition of theta.

Note that \[ \hat{\nb}\cdot\nabla f = |\hat{\nb}||\nabla f|\cos\theta = |\nabla f|\cos\theta, \] where \(\theta\) is the angle between \(\hat{\nb}\) and \(\nabla f\). The direction with largest increase of the function is therefore given by \(\theta=0\), showing that \(\nabla f\) points in the direction of fastest increase of \(f\), and its magnitude \(|\nabla f|\) is the rate of increase in that direction.

This property can be used to define \(\nabla f\) without the need to choose particular coordinates.

The partial derivatives are \[ \ddy{f}{x} = \frac{x\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}, \qquad \ddy{f}{y} = \frac{y\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}, \] so the gradient is \[\begin{align*} \nabla f &= \frac{x\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}\eb_1 + \frac{y\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}\eb_2\\ &= \frac{\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}(x\eb_1 + y\eb_2)\\ &= \frac{\cos\sqrt{x^2 + y^2}}{\sqrt{x^2 + y^2}}\xb. \end{align*}\] This points in the radial direction, so is orthogonal to the contours of \(f\) which we showed previously to be circles centered on the origin. We can illustrate this vector field by showing arrows at selected points:

Arrows showing gradient vector field.

In fact, \(\nabla f\) is always orthogonal to the contours (in 2-d), or isosurfaces (in 3-d). This is because the contours/isosurfaces by definition correspond to directions \(\hat{\nb}\) with zero directional derivative, so \[ |\nabla f|\cos\theta = 0\qquad \implies \theta=\pm\frac{\pi}{2}. \]

Here we have \(\nabla f = 2x\eb_1 + 2y\eb_2 + 2z\eb_3 = 2\xb\). This points radially outward from the origin, so is normal to the isosurfaces because they were spheres centered on the origin. Notice that the vector \(\nabla f\) points towards increasing \(f\), i.e. radially outward.

1.3 Revision: Vectors in \(\mathbb{R}^3\)

Recall from Linear Algebra I that a vector in \(\Real^3\) (or \(\Real^2\)) is a quantity with both magnitude and direction, unlike a scalar which has only magnitude.

Components of a vector.

A vector in \(\Real^3\) is specified by its three components \(a_1\), \(a_2\), \(a_3\) with respect to the standard basis \(\{\eb_1, \eb_2, \eb_3\}\). We will write \[ \ab = a_1\eb_1 + a_2\eb_2 + a_3\eb_3 \] or (when it is clear that we are dealing with the standard basis) alternatively \[ \ab = \begin{pmatrix} a_1\\ a_2\\ a_3 \end{pmatrix} = (a_1, a_2, a_3)^\top. \]

When writing by hand, boldface is unavailable so we usually denote vectors by underlining: \(\underline{a}\) (although some people prefer a horizontal arrow on top: \(\vec{a}\)). Pure mathematicians often just write \(a\) for a vector without any special indication. However, this can lead to confusion for beginners so in this course you must underline (or put arrows on) your vectors.

Of course, if we change the basis, the components will change, but they are still describing the same geometrical object.

Coordinate rotation.

We have \[ \begin{pmatrix} a_1'\\ a_2'\\a_3' \end{pmatrix} = \begin{pmatrix} \cos\theta & \sin\theta & 0\\ -\sin\theta & \cos\theta & 0\\ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} a_1\\ 0\\0 \end{pmatrix} \] so \(\ab = a_1\cos\theta\eb_1' - a_1\sin\theta\eb_2'\).

In fact, one can define a vector by the way in which its components change under a coordinate rotation. Namely \(a_i'=R_{ij}a_j\) where \(R\) is any rotation matrix. Although we won’t develop it further in this course, this viewpoint will be important in many later courses that generalise vectors to tensors, e.g., Differential Geometry, Fluid Mechanics, General Relativity, …

We can also have a change of coordinates that doesn’t amount to a simple rotation of the Cartesian frame. Polar coordinates are the most familiar example of such an alternative basis. In this course you will meet the 3-d versions: cylindrical and spherical polar coordinates.

The magnitude/length of a vector is given in terms of its components by Pythagoras’s Theorem: \[ |\ab| = \sqrt{a_1^2 + a_2^2 + a_3^2}. \] Stretching a vector (changing the magnitude but not direction) corresponds to multiplying by a scalar: \[ \lambda\ab = \lambda a_1\eb_1 + \lambda a_2\eb_2 + \lambda a_3\eb_3. \]

We add two vectors by the “triangle rule”:

Triangle rule for addition.

Assuming both vectors are expressed in the standard basis, this just corresponds to adding each component: \[ \ab + \bb = (a_1 + b_1)\eb_1 + (a_2 + b_2)\eb_2 + (a_3 + b_3)\eb_3. \]

1.3.1 The scalar product

The simplest way to multiply two vectors in \(\Real^3\) is the scalar product (or dot product), which is defined geometrically as \[ \ab\cdot\bb = |\ab||\bb|\cos\theta, \] where \(\theta\) is the angle between the two vectors.

It doesn’t matter whether \(\theta\) is the smaller or larger angle, because \(\cos(2\pi - \theta)=\cos\theta\).

Scalar product.

Note that \(|\bb|\cos\theta\) is the component of \(\bb\) along \(\ab\), and vice versa.

We can see geometrically that the component of \(\bb+\cb\) along \(\ab\) is the sum of the components of \(\bb\) and \(\cb\) along \(\ab\):

Linearity of scalar product.

It follows that \[\begin{align} \ab\cdot\bb &= (a_1\eb_1 + a_2\eb_2 + a_3\eb_3)\cdot(b_1\eb_1 + b_2\eb_2 + b_3\eb_3)\\ &= a_1b_1\eb_1\cdot\eb_1 + a_2b_2\eb_2\cdot\eb_2 + a_3b_3\eb_3\cdot\eb_3 \quad\textrm{[using orthogonality of $\eb_i$ and $\eb_j$ when $i\neq j$]}\\ &= a_1b_1 + a_2b_2 + a_3b_3. \end{align}\]

So the scalar product has the same component formula in any orthonormal basis – in particular the cylindrical or spherical polar coordinates that you will meet later.

Note that the component formula for the scalar product works for any \(\Real^n\), and can be used to define the notions of angle and length for \(n>3\).

1.3.2 The vector product

The other way to multiply two vectors in \(\Real^3\) (but not any other \(\Real^n\)) is the vector product (or cross product), defined as \[ \ab\times\bb = |\ab|\,|\bb|\sin\theta\,\hat{\nb}, \] where \(\hat{\nb}\) is a unit vector perpendicular to both \(\ab\) and \(\bb\), in the right-handed sense, and \(\theta\) is the angle from \(\ab\) to \(\bb\) with respect to \(\hat{\nb}\).

Vector product.

It follows from the definition that \(\bb\times\ab = -\ab\times\bb\).

Note that \(|\bb|\sin\theta\) is the component of \(\bb\) perpendicular to \(\ab\). Thus \(\ab\times\ab = \bfzero\). And also the magnitude \(|\ab\times\bb|\) is the area of the parallelogram defined by \(\ab\) and \(\bb\).

Parallelogram from vector product.

As with the scalar product, we can show geometrically that \[ \ab\times(\bb + \cb) = \ab\times\bb + \ab\times\cb. \]

Proof. Let \(\ab\) point into the page. Then the effect of taking the cross product with \(\ab\) of any other vector, say \(\bb,\) is to:

  1. Project \(\bb\) into the page (i.e. \(|\bb|\sin\theta\)).

  2. Rotate through \(90^\circ\) clockwise (i.e. \(|\bb|\sin\theta\hat{\nb}\), choosing the direction \(\hat{\nb}\) to be perpendicular to \(\bb\) as well as \(\ab\)).

  3. Multiply by \(|\ab|\) (i.e. \(|\ab||\bb|\sin\theta\,\hat{\nb}\)).

Doing this to \(\bb\), \(\cb\) and \(\bb+\cb\) gives the following picture:

Linearity of vector product.

In particular, \(\ab\times(\bb+\cb)\) is the same point as \(\ab\times\bb + \ab\times\cb\).


To find the components, consider first \(\eb_1\times\eb_2\). These two vectors have magnitude 1 and are perpendicular, so \(\sin\theta=1\). Moreover, \(\hat{\nb}=\eb_3\) because the coordinate system is right-handed. So \(\eb_1\times\eb_2=\eb_3\). Similarly \(\eb_2\times\eb_3=\eb_1\) and \(\eb_3\times\eb_1=\eb_2\). Thus we have \[\begin{align*} \ab\times\bb &= (a_1\eb_1 + a_2\eb_2 + a_3\eb_3)\times(b_1\eb_1 + b_2\eb_2 + b_3\eb_3)\\ &= a_1b_1\cancel{\eb_1\times\eb_1} + a_1b_2\eb_1\times\eb_2 + a_1b_3\eb_1\times\eb_3 \\ &\qquad + a_2b_1\eb_2\times\eb_1 + a_2b_2\cancel{\eb_2\times\eb_2} + a_2b_3\eb_2\times\eb_3\\ &\qquad\qquad + a_3b_1\eb_3\times\eb_1 + a_3b_2\eb_3\times\eb_2 + a_3b_3\cancel{\eb_3\times\eb3}\\ &= (a_1b_2 - a_2b_1)\eb_1\times\eb_2 + (a_2b_3 - a_3b_2)\eb_2\times\eb_3 + (a_3b_1 - a_1b_3)\eb_3\times\eb_1\\ &= (a_2b_3 - a_3b_2)\eb_1 + (a_3b_1 - a_1b_3)\eb_2 + (a_1b_2 - a_2b_1)\eb_3. \end{align*}\] This is easier to remember by recognising it as a (formal) determinant \[ \ab\times\bb = \begin{vmatrix} \eb_1 & \eb_2 & \eb_3\\ a_1 & a_2 & a_3\\ b_1 & b_2 & b_3 \end{vmatrix}. \]

1.4 Index notation

Remember that the vector equation \[ \cb = \ab + \bb \] is equivalent to the system of three scalar equations \[\begin{align} c_1 &= a_1 + b_1,\\ c_2 &= a_2 + b_2,\\ c_3 &= a_3 + b_3. \end{align}\] It is often surprisingly useful to write vector equations like this in so-called index notation, as \[ c_i = a_i + b_i. \] It is understood that this equation holds for \(i=1, 2\) and \(3\), and that it refers to any of the components in the standard basis. Here \(i\) is called a free index; the choice of letter is arbitrary: we could equally well write \(c_k = a_k + b_k\), as long as all terms in the equation have the same free index.

In this section we will learn the “tricks” that make index notation so useful; these mainly relate to scalar and vector products.

1.4.1 Scalar products

In index notation, we write the scalar product as \[ \ab\cdot\bb = a_jb_j, \] where the repeated index indicates that the term should be summed from \(j=1\) to \(3\) – called the (Einstein) summation convention. This repeated index is called a dummy index, and again the chosen letter doesn’t matter. Note that \(\ab\cdot\bb\) is a scalar, so there is no free index when we write it in index notation.

We write \(\ab\cdot\bb = a_jb_j\) but we have to choose a different letter for the index in \(\cb\cdot\db\), for example \(\cb\cdot\db=c_kd_k\). So \[ (\ab\cdot\bb)(\cb\cdot\db) = a_jb_jc_kd_k. \] Writing it out in full, this means \[ a_jb_jc_kd_k = \left(\sum_{j=1}^3a_jb_j\right)\left(\sum_{k=1}^3c_kd_k\right) = (a_1b_1 + a_2b_2 + a_3b_3)(c_1d_1 + c_2d_2 + c_3d_3). \] If we just wrote \(a_jb_jc_jd_j\), then we wouldn’t know which pairs of vectors were multiplied together and which not.

To avoid ambiguity, it is essential when using index notation that no index appears more than twice in any term.

The index \(j\) is repeated, so we know that it is a dummy index (summed over). So \[ a_jb_ic_j = \left(\sum_{j-1}^3a_jc_j\right)b_i = (\ab\cdot\cb)b_i. \] The index \(i\) is a free index, so we see that \(a_jb_ic_j\) means the \(i\) component of the vector \((\ab\cdot\cb)\bb\). We could write this as \[ [(\ab\cdot\cb)\bb]_i = a_jb_ic_j. \] However, we may never write \((\ab\cdot\cb)\bb = a_jb_ic_j\) because this would be equating a vector to a scalar, which is meaningless nonsense.

Each side is a vector, so in index notation we should have a free index in every term, say \(i\), giving \[ u_i + (\ab\cdot\bb)v_i = |\ab|^2(\bb\cdot\vb)a_i. \] Now we introduce dummy indices for each scalar product, noting that \(|\ab|^2 = \ab\cdot\ab\), so \[ u_i + a_jb_jv_i = a_ja_jb_kv_ka_i. \] In particular, we needed two dummy indices on the right-hand side to avoid ambiguity.

Another way to think of the scalar product \(\ab\cdot\bb\) is as a quadratic form \[ \ab\cdot\bb = (a_1,a_2,a_3)\begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{pmatrix}\begin{pmatrix}b_1\\b_2\\b_3\end{pmatrix} = a_1b_1 + a_2b_2 + a_3b_3. \] In index notation, this may be written \[ \ab\cdot\bb = \delta_{ij}a_ib_j, \] where the components of the identity matrix are given by the Kronecker delta, defined as \[ \delta_{ij} = \begin{cases} 1 & \textrm{if $i=j$},\\ 0 & \textrm{if $i\neq j$}. \end{cases} \] This will often be useful when manipulating expressions in index notation.

There is one dummy (repeated) index \(j\) and a free index \(i\). Summing over \(j\) gives \[ \delta_{ij}a_j = \delta_{i1}a_1 + \delta_{i2}a_2 + \delta_{i3}a_3. \] We can see that the result depends on the value of \(i\). If \(i=1\), then only the first term is non-zero, so \(\delta_{1j}a_j = a_1\). If \(i=2\), then only the second term is non-zero, so \(\delta_{2j}a_j = a_2\). And if \(i=3\), only the third term is non-zero, so \(\delta_{3j}a_j = a_3\). Therefore we conclude that \[ \delta_{ij}a_j = a_i. \] Note that there is still a free index \(i\), as in the starting expression.

Again there is one repeated index \(j\), so summing gives \[ \delta_{ij}\delta_{jk} = \delta_{i1}\delta_{1k} + \delta_{i2}\delta_{2k} + \delta_{i3}\delta_{3k}. \] Now the first term is non-zero only if \(i=k=1\), in which case \(\delta_{i1}\delta_{1k} = \delta_{11}\delta_{11} = 1\). Similarly the second term is non-zero only if \(i=k=2\), giving \(\delta_{22}\delta_{22} = 1\), and the last term is non-zero only for \(i=k=3\), giving \(\delta_{33}\delta_{33} = 1\). So actually, we just have \[ \delta_{ij}\delta_{jk} = \delta_{ik}. \] Again note that we still have the same free indices (\(i\) and \(k\)) as in the initial expression.

Here there are two dummy indices and no free indices, so the answer will be a scalar. Summing over first \(j\) then \(i\) gives \[\begin{align*} \delta_{ij}\delta_{ji} &= \delta_{ii} \quad \textrm{[from the previous example]}\\ &= \delta_{11} + \delta_{22} + \delta_{33}\\ &= 3. \end{align*}\]

1.4.2 Vector products

We can write vector products in index notation by introducing the alternating tensor (or Levi-Civita symbol), \[ \epsilon_{ijk} = \begin{cases} 0 & \textrm{if any of $i,j,k$ are equal},\\ +1 & \textrm{if $(i,j,k)=(1,2,3), (2,3,1)$ or $(3,1,2)$},\\ -1 & \textrm{if $(i,j,k)=(1,3,2), (2,1,3)$ or $(3,2,1)$}. \end{cases} \] This object has 27 components, but only 6 of them are non-zero. You can check that \[ \epsilon_{ijk} = \epsilon_{jki} = \epsilon_{kij}, \] and that \[ \epsilon_{ijk} = -\epsilon_{jik}. \] This allows us to express the vector product in index notation as \[ [\ab\times\bb]_i = \epsilon_{ijk}a_jb_k. \] For example, the first component of the right-hand side is \[\begin{align*} \epsilon_{1jk}a_jb_k &= \cancel{\epsilon_{11k}}a_1b_k + \epsilon_{12k}a_2b_k + \epsilon_{13k}a_3b_k\\ &= \cancel{\epsilon_{121}}a_2b_1 + \cancel{\epsilon_{122}}a_2b_2 + \epsilon_{123}a_2b_3 + \cancel{\epsilon_{131}}a_3b_1 + \epsilon_{132}a_3b_2 + \cancel{\epsilon_{133}}a_3b_3\\ &= \epsilon_{123}a_2b_3 + \epsilon_{132}a_3b_2\\ &= a_2b_3 - a_3b_2. \end{align*}\]

1.4.3 Triple products

There are two types of product that involve three vectors:

  1. The scalar triple product, \(\ab\cdot(\bb\times\cb)\), where the result is a scalar.
  2. The vector triple product, \(\ab\times(\bb\times\cb)\), where the result is a vector.

We have \[\begin{align*} \ab\cdot(\bb\times\cb) &= a_i\epsilon_{ijk}b_jc_k\\ &= \epsilon_{ijk}a_ib_jc_k \quad \textrm{[order doesn't matter for scalars]}\\ &= \epsilon_{kij}a_ib_jc_k\\ &= c_k\epsilon_{kij}a_ib_j\\ &= c_i\epsilon_{ijk}a_jb_k \quad \textrm{[relabelling the dummy indices $(i,j,k)\to(j,k,i)$]}. \end{align*}\] The last step wasn’t strictly necessary, just aesthetically pleasing!

When simplifying more complicated expressions, there is a useful identity for the product of two Levi-Civita symbols that share one index in common.

Lemma 1.1 (Useful formula.) \[ \epsilon_{ijk}\epsilon_{klm} = \delta_{il}\delta_{jm} - \delta_{im}\delta_{jl}. \tag{1.1}\]

Proof. The left hand side is \[ \epsilon_{ijk}\epsilon_{klm} = \begin{cases} +1 & \textrm{if $(klm)$ is an even permutation of $(ijk)$},\\ -1 & \textrm{if $(klm)$ is an odd permutation of $(ijk)$},\\ 0 & \textrm{if $(klm)$ is not a permutation of $(ijk)$}. \end{cases} \] So there are six possibilities: \[ \epsilon_{ijk}\epsilon_{klm} = \delta_{ik}\delta_{jl}\delta_{km} - \delta_{ik}\delta_{jm}\delta_{kl} + \delta_{il}\delta_{jm}\delta_{kk} - \delta_{il}\delta_{jk}\delta_{km} + \delta_{im}\delta_{jk}\delta_{kl} - \delta_{im}\delta_{jl}\delta_{kk}. \] Using \(\delta_{kk}=3\) and \(\delta_{ij}\delta_{jk}=\delta_{ik}\), we find \[\begin{align*} \epsilon_{ijk}\epsilon_{klm} &= \delta_{im}\delta_{jl} - \delta_{il}\delta_{jm} + 3\delta_{il}\delta_{jm} - \delta_{il}\delta_{jm} + \delta_{im}\delta_{jl} - 3\delta_{im}\delta_{jl}\\ &= \delta_{il}\delta_{jm} - \delta_{im}\delta_{jl}. \end{align*}\]


The \(i\) component of the left-hand side is \[\begin{align*} [\ab\times(\bb\times\cb)]_i &= \epsilon_{ijk}a_j[\bb\times\cb]_k,\\ &= \epsilon_{ijk}a_j\epsilon_{klm}b_lc_m\\ &= \epsilon_{ijk}\epsilon_{klm}a_jb_lc_m\\ &= (\delta_{il}\delta_{jm} - \delta_{im}\delta_{jl})a_jb_lc_m \quad \textrm{[using Lemma 1.1]}\\ &= \delta_{il}\delta_{jm}a_jb_lc_m - \delta_{im}\delta_{jl}a_jb_lc_m\\ &= a_j(\delta_{il}b_l)(\delta_{jm}c_m) - a_j(\delta_{jl}b_l)(\delta_{im}c_m)\\ &= a_jb_ic_j - a_jb_jc_i\\ &= (a_jc_j)b_i - (a_jb_j)c_i\\ &= (\ab\cdot\cb)b_i - (\ab\cdot\bb)c_i. \end{align*}\]