2  Partial derivatives

2.1 Functions of several variables

Our course now builds on the calculus that you have learnt in Single Maths B. There you worked with functions of just one variable; in this part of the course we will extend the idea of differentiation and integration to functions of more than one variable. This is both great fun and of fundamental importance to many applications of mathematics.

2.1.1 Examples of functions of several variables

To orient ourselves, let’s just remind ourselves that in ordinary life it is really quite common to have functions of more than one variable.

Examples
  1. Areas and volumes:

    1. The volume \(V\) of a circular cylinder of radius \(r\) (cm) and height \(h\) (cm) is \(V(r,h)=\pi r^{2}h\), which is clearly a function of two variables, \(r\) and \(h\). The cylinder gets larger if we increase \(r\) or \(h\).

    2. A rectangular box with sides \(x,y,x\) has volume \(V(x,y,z) =xyz\), a function of 3 variables.

  2. Heights above a surface: The surface of the earth is a two-dimensional sphere; we call this \(S^{2}\). The position of a point on the earth can be specified by two coordinates \((\phi,\theta)\), which roughly map to the concepts of latitude and longitude. The height above sea level can be expressed as \(h(\theta,\phi)\).

  3. Atmospheric temperature: This is again a function of \(\theta,\phi\) but also (quite obviously) time dependent \(T(t,\theta,\phi)\). You can clearly have things depending on four variables as well, for example the variation of temperature with height \(r\) would be \(T(t,r,\theta,\phi)\).

2.1.2 Graphs of functions

We understand that an equation \(y=f(x)\) describes a curve in a plane. Given \(x\) we can compute \(y\) if we know the function \(f\). The equation \(y=f(x)\) describes how \((x,y)\) moves as we vary \(x\). The relation may be expressed implicitly as \(g(x,y)=0\). Let’s discuss a few examples:

Examples
  1. \(x^{2}+y^{2}=1\) describes a circle of unit radius. \[y^{2}=1-x^{2},\] so \(y=\pm\sqrt{1-x^{2}}\) for \(|x|\leq1\). Thus in the language above, \(g(x,y) = x^2 + y^2 - 1\). We can write two equations of the form \(y=f(x)\), one for the lower half of the circle, and one for the upper half.

  2. Now let’s generalize this to higher dimensions. In three dimensional space, the equation \[x^{2}+y^{2}+z^{2}=25\] describes a sphere of radius 5. This can be inverted for \(z\) as a function of \((x,y)\). This gives \[z^{2}=25-x^{2}-y^{2}\]
    so \(z=\pm\sqrt{25-x^{2}-y^{2}}\) for \(x^{2}+y^{2}\leq25\). Note that for fixed \(y\) it describes circles in \(x,z\) plane. (How big are the circles if \(y=4\), i.e. if we take a slice 4 units from the origin? With \(y=4\), we have \(z=\pm\sqrt{9-x^{2}}\), so we get a circle of radius \(3\).) Now we have two distinct equations of form \(z=f(x,y)\). Then every \(x,y\) gives us two points lying on the sphere, one for each equation.

  3. An equation of the form \(z=f(x,y)\) thus describes a surface. We can think of \(z\) as the height of the surface above the \(x,y\) plane (or the depth below if \(z\) is negative). As in the lower-dimensional case, we can often represent this surface implicitly as \(g(x,y,z) = 0\) for some choice of \(g(x,y,z)\): in the case above we have \[g(x,y,z) = x^2 + y^2 + z^2 - 25\] The function \(f\) may be defined for all \(x,y\) or a restricted set as in the sphere. (What does \(x^{2}+y^{2}>25\) correspond to?)

Thinking about curves and surfaces in higher dimensions, we have two new helpful ideas:

Definition
A level curve (or contour line) is a curve given by taking a horizontal slice of the surface \(g(x,y,z) = 0\); that is, by setting \(z\) to some fixed value. For example, if we choose \(z=c\) it is the set of all \((x,y)\) such that \(g(x,y,c) = 0\). Level curves can be viewed as curves in the \(xy\)-plane by forgetting about the \(z\)-direction.

A section curve is a curve given by instead taking a vertical slice; that is, by freezing one of the other variables. For example, we set \(g(c,y,z)=0\). This gives a curve in either the \(yz\)- or the \(xz\)-plane.

Examples
Contour lines on a map represent height above sea level - lines of constant \(z = h(x,y)\); isobars on a weather map represent lines of constant atmospheric pressure (at sea level) \(p(x,y)\).

2.1.3 Examples of graphs of functions

Using this Desmos link, explore the properties of the graphs of some of these functions:

  1. \(z=5\): for some range of \(x,y\). - a flat roof.

  2. \(z=x^{2}+\sin y\): Defined for finite \(x\) and \(y\). For fixed \(y\) we get a parabola, and for fixed \(x\) we have a \(sin\) curve whose height is shifted.

  3. \(z=\cos(xy)\): Again the function is defined for all \(x,y\). \(z=const\) for \(xy=const\). Contours of constant height would be hyperbolas. Keeping e.g. \(y\) constant gives a cosine in \(x\) with a period determined by \(y\).

  4. \(z=\frac{\sin\sqrt{x^{2}+y^{2}}}{\sqrt{x^{2}+y^{2}}}\) the sombrero: Note the rotational symmetry about the \(z\) axis through the origin. \(z\) has the same value for all \(x^{2}+y^{2}=r^{2}\) with \(r\) constant. This is the equation for a circle in the \(x,y\) plane; the rotational symmetry is quite evident in the picture.

  5. \(z=x^{3}-3xy^{2}\) the monkey saddle: Cubic curve for fixed \(y\) and parabolic for fixed \(x\).

  6. \(z=(x^{2}+y^{2})/a^{2}\) satellite dish: This is a circular paraboloid. Parallel rays are focussed onto the focus of the dish at \((0,0,a)\).

Suggested questions: Q1-2.

2.2 Partial derivatives

We now embark on a program of extending things that we know from single-variable calculus to multiple variables. The first thing we study is the partial derivative.

Definition
Suppose we have a function \(f(x,y)\) of two variables. The partial derivative of \(f\) with respect to \(x\) is the slope when we move in the \(x\) direction but keep \(y\) constant: \[\begin{aligned} \frac{\partial f(x,y)}{\partial x}\,&=\,\lim_{h\rightarrow0}\frac{f(x+h,y)-f(x,y)}{h}, \end{aligned}\] while the partial derivative of \(f\) with respect to \(y\) is the slope of the function when we move in the \(y\) direction but keep \(x\) constant: \[\begin{aligned} \frac{\partial f(x,y)}{\partial y}\,&=\,\lim_{k\rightarrow0}\frac{f(x,y+k)-f(x,y)}{k}. \end{aligned}\]

At a point \((a,b)\), the partial derivative gives the slope of the relevant section curve (i.e. either \(z=f(x,b)\) or \(z=f(a,y)\)). It’s called “partial” because we are only differentiating with respect to one of the variables (not all of them), and it is denoted with \(\partial\), not \(\mathrm{d}\). The result is simply the same as taking the derivative with respect to \(x\) (or \(y\)) and treating \(y\) (or \(x\) respectively) as a constant. For any function \(f(x,y)\) we get \[\begin{aligned} \frac{\partial f}{\partial x} & \text{ by differentiating with respect to x and keeping y constant}\\ \frac{\partial f}{\partial y} & \text{ by differentiating with respect to y and keeping x constant} \end{aligned}\]

Similarly for a function \(f(x_{1}, x_2, \dots, x_{n}):\mathbb{R}^{n}\rightarrow\mathtt{\mathbb{R}}\), the partial derivative \(\frac{\partial f}{\partial x_{i}}\) is the derivative keeping all but \(x_{i}\) constant.

Example
The volume of a can of radius \(r\) and height \(h\) is \(V(r,h)=\pi r^{2}h\).

If we keep the height fixed what is the rate of change of \(V\) relative to the radius \(r\)? We take the partial derivative with respect to \(r\): \[\frac{\partial V}{\partial r}=2\pi r h.\]

Sometimes \(\frac{\partial f}{\partial x}\) is written \(f_{x}\) and \(\frac{\partial f}{\partial y}\) is written \(f_{y}\). Differentiating again we get \[\begin{aligned} \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial x}\right) &= \frac{\partial^{2}f}{\partial x^{2}} \ \ \text{ (also written as $f_{xx}$)}; & \frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x}\right) &= \frac{\partial^{2}f}{\partial y\partial x} \ \ \text{ (also written as $ f_{xy} $)} \\ \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y}\right) &= \frac{\partial^{2}f}{\partial x\partial y} \ \ \text{ (also written as $ f_{yx}$)}; & \frac{\partial}{\partial y}\left(\frac{\partial f}{\partial y}\right) &= \frac{\partial^{2}f}{\partial y^{2}} \ \ \text{ (also written as $f_{yy}$)}\,. \end{aligned}\]

This is a crucial (and rather simple) concept for the rest of the course, so before going any further let’s work out another example.

Example
Consider \(f(x,y) = x^2 y + y^{3}\).

We have \[\pdv{f}{x} = 2x y \qquad \pdv{f}{y} = x^2 + 3 y^2\] Let’s keep going and work out the second partial derivatives: \[\pdv{f}{x,y}\equiv f_{yx} = 2 x \qquad \pdv{f}{y,x} = f_{xy} = 2x\] It looks like in this example it did not matter in which order I took the partial derivatives! It turns out this is always true, provided \(f\) is a sufficiently nice function.

Clairault’s Theorem (or Schwarz’s Theorem):1 Consider a function \(f:\mathbb{R}^{n}\rightarrow\mathtt{\mathbb{R}}\); that is, a real-valued function \(f(x_{1},\dots, x_{n})\) depending on \(n\) variables. If the second-order partial derivatives exist and are continuous on a small open disc centred at a point \(\boldsymbol{a} = (a_1, a_2, \dots, a_n) \in \mathbb{R}^n\), then \[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}} \left(\boldsymbol{a} \right) = \frac{\partial^{2}f}{\partial x_{j}\partial x_{i}}\left(\boldsymbol{a} \right)\,,\] for all \(i, j \in \{1, 2, \dots, n\}\).

Here, “small open disc centred at \(\boldsymbol a\)” means the subset \(\{\boldsymbol x \in \mathbb{R}^n \mid \| \boldsymbol x - \boldsymbol a\| < r\} \subseteq \mathbb{R}^n\) of all points in \(\mathbb{R}^n\) of distance (strictly) less that some fixed \(r > 0\) from \(\boldsymbol a\). The radius \(r\) of the disc can be arbitrarily small: the important thing is that some choice of radius works, not precisely which one. Moreover, the continuity condition is automatically satisfied if the second-order partial derivatives are themselves differentiable (i.e. if the third-order partial derivatives exist).

Clairault’s Theorem holds for all familiar functions: polynomials, trigonometric functions, exponential functions, …. If you’d like to see a relatively simple function for which the second-order partial derivatives do not commute, then check out the function found by Peano here.

We sometimes express Clairault’s Theorem in words by saying that “partial derivatives commute”: the word “commutative” means that the order in which you do the two operations (i.e. partial differentiation with respect to \(x_i\) and with respect to \(x_j\)) does not matter.

More (Somewhat Tedious) Examples

You probably get the idea by now, but here are a few more examples:

💪 Try it out

Calculate all the first and second partial derivatives of:

  1. \(f(x,y)=x^{3}-3xy^{2}\).
  2. \(f(x,y)=\cos y+\sin(xy)\).
  3. \(f(x,y)=x^{2}y+yz+z^{2}x\).

Answers:

  1. We have \[\begin{aligned} f_{x} & = 3x^{2}-3y^{2}\\ f_{y} & = -6xy\\ f_{xx} & = 6x\\ f_{yy} & = -6y\\ f_{xy}=f_{yx} & = -6y. \end{aligned}\]

  2. Let \(f(x,y)=\cos y+\sin(xy)\).

Then \[\begin{aligned} f_{x} & = y\cos(xy)\\ f_{y} & = -\sin y+x\cos(xy)\\ f_{xx} & = -y^{2}\sin(xy)\\ f_{yy} & = -\cos y-x^{2}\sin(xy)\\ f_{xy}=f_{yx} & = \cos(xy)-xy\,\sin(xy). \end{aligned}\]

  1. When \(f(x,y)=x^{2}y+yz+z^{2}x\), we have \[\begin{aligned} f_{x} & = 2xy+z^{2}\\ f_{y} & = z+x^{2}\\ f_{z} & = y+2xz. \end{aligned}\]

Suggested questions: Q3-13.

2.3 Differentials and Directional Derivatives

How can we find the rate of change of a function \(f(x)\) when moving in some arbitrary direction? When we have a single variable, the gradient gives us the infinitesimal change of the function \(\odif{f}\) with an infinitesimal change in \(\odif{x}\), that is we have an equation which looks like this: \[\odif{f}=\odv{f}{x} \odif{x}. \tag{2.1}\] The objects representing small changes, \(\odif{f},\odif{x}\) are called differentials.

Let us now generalise this to the multivariable case. When we have more than one variable we can imagine moving in an arbitrary direction. Let’s consider two variables \((x,y)\) for concreteness and imagine that we move along by \(\odif{x}\) in the \(x\) direction, and \(\odif{y}\) in the \(y\) direction: \[(x,y) \mapsto (x+\odif{x},y + \odif{y}).\]

Definition
Now if we have a function \(f(x,y)\), then the change \(\odif{f}\) is given by the sum of the contributions from each of the directions, and \(\odif{f}\) is called the total differential: \[\odif{f}\,=\,\pdv{f}{x} \odif{x}+\pdv{f}{y} \odif{y}. \tag{2.2}\]

Examples

💪 Try it out
  1. Find the total differential of \(f(x,y)=x^{2}y^{3}+\cos xy\).

Answer: We have \[\begin{aligned} f_{x} & = 2xy^{3}-y\sin xy\\ f_{y} & = 3x^{2}y^{2}-x\sin xy \end{aligned}\] so \[\begin{aligned} \odif{f} & = f_{x}\odif{x}+f_{y}\odif{y}\\ & = \left[2xy^{3}-y\sin xy\right]\odif{x}+\left[3x^{2}y^{2}-x\sin xy\right]\odif{y}. \end{aligned}\]

Example
Let’s look at an example from physics. Recall the First Law of Thermodynamics \[\odif{U}\,=\,T\odif{S}-P\odif{V}\] where the state variables \(S\) and \(V\) are the entropy and volume of a closed homogeneous system, respectively, and where \(U(S,V)\) is the internal energy, \(T(S,V)\) is the temperature and \(P(S,V)\) is the pressure. Comparing with Equation 2.2, we see that \[T\equiv \frac{\partial U}{\partial S} \qquad \text{ and } \qquad P \equiv - \frac{\partial U}{\partial V} \,.\] (Note: in thermodynamics it is often helpful to use an alternative notation for partial derivatives to make it explicit which variable is being held fixed, but we won’t do that here.)

The formal equivalence of the mixed \(S,V\) derivatives gives an interesting relation: \[\frac{\partial T}{\partial V} = \frac{\partial}{\partial V} \left(\frac{\partial U}{\partial S}\right) = \frac{\partial^{2}U}{\partial S \partial V} = \frac{\partial^{2}U}{\partial V \partial S} = \frac{\partial}{\partial S} \left(\frac{\partial U}{\partial V}\right) = - \frac{\partial P}{\partial S} \,.\] The identity \(\frac{\partial T}{\partial V} = - \frac{\partial P}{\partial S}\) is a property of any physical thermodynamic system. Those of you who are physicists will likely encounter it again: it is called a Maxwell relation.

2.3.1 Exact and Inexact differentials

Let us generalise slightly in the case of two variables: the most general differential can be written as \[\odif{f} = a(x,y) \,\odif{x}+b(x,y)\,\odif{y}. \tag{2.3}\] There is something slightly misleading about the notation \(\odif{f}\): it suggests that all differentials can be written as “\(\mathrm{d}\) of a function \(f\)”. As it turns out, this is not true, and there are actually two different kind of differentials: exact and inexact.

Definition
A differential \(\odif{f} = a(x,y) \,\odif{x}+b(x,y)\,\odif{y}\) is said to be exact if there exists a function \(f(x,y)\) such that \(\odif f\) is the total differential of \(f\); that is, if there exists a function \(f(x,y)\) such that \[a(x,y) = f_{x} \quad \text{ and } \quad b(x,y) = f_{y} \,.\] (You should imagine that this means that the functions \(a(x,y)\) and \(b(x,y)\) above have been correctly chosen so that we can integrate to obtain \(f(x,y)\).)

If this cannot be done – i.e. if there does not exist any \(f\) that makes this possible – then \(\odif f\) is an inexact differential.

It should seem intuitively reasonable that a random choice of functions \(a(x,y)\) and \(b(x,y)\) will probably result in an inexact differential.

Examples

💪 Try it out
Show that \(\odif{f}= y\odif{x} + x\odif{y}\) is an exact differential.

Answer:

The question is whether \(f_{x}=y\) and \(f_{y}=x\) can be integrated to find \(f\). We have \[\begin{aligned} f_{x}=y & \implies f(x,y) = xy+A(y), \\ f_{y}=x & \implies f(x,y) = xy+B(x). \end{aligned}\]

Note that in the first line \(A(y)\) can be an arbitrary function of \(y\), but not of \(x\). The second line is the opposite, \(B(x)\) can be an arbitrary function of \(x\) but not of \(y\). These two statements together imply that it must just be a constant, i.e. \(A(y)=B(x) = C\). So \[f(x,y)=xy+C.\] As we have succeeded in finding \(f\), the differential \(\odif{f}\) is indeed exact. (Note that \(f\) is not uniquely defined! It will always be ambiguous up to a constant).

An alternative way to find \(f\) is the following: from \(f_x = y\) we integrate as above to find \(f(x,y) = xy + A(y)\). Now, we can use this expression to compute \(f_y\) and the outcome must agree with \(f_y = x\). That is, we have \[\begin{aligned} f_y &= x & &\text{ (from $\odif f$) } \\ \text{and } \quad f_y &= x + A'(y) & &\text{ (from differentiating $f(x,y) = xy + A(y)$) }, \end{aligned}\] so we can conclude that \(A'(y) = 0\). By integration, this means that \(A(y)\) (a function of one variable!) must be constant, so \(f(x,y)\) must be of the form \(f(x,y) = xy + C\), as before.

💪 Try it out
Show that \(\odif{f}= y\odif{x} - x\odif{y}\) is an inexact differential.

Answer:

Note that the differential \(\odif f\) differs from the previous example only in the sign of the \(\odif y\) component, so you might initially think that this differential should also be exact. Again, the question is whether \(f_{x}=y\) and \(f_{y}= - x\) can be integrated to find \(f\). Suppose that we can do this. Then we have that \[\begin{aligned} f_{x}=y & \implies f(x,y) = xy + A(y), \\ f_{y}=x & \implies f(x,y) = - xy + B(x). \end{aligned}\] As before, note that \(A(y)\) can be an arbitrary function of \(y\), but does not depend on \(x\), while \(B(x)\) can be an arbitrary function of \(x\), but does not depend on \(y\). These two statements together imply that \(B(x) = 2xy + A(y)\), which is nonsense, since we know that the function \(B(x)\) on the left-hand side does not depend on \(y\), while equality with the right-hand side says that it does. Hence, there is no function \(f(x,y)\) whose total differential is \(df\).

💪 Try it out
Show that \(\odif{f}= 3y\odif{x} + x\odif{y}\) is an inexact differential.

Answer:

Let us imagine that \(\odif f\) is exact. Then we would have \[\begin{aligned} f_{x}=3y & \implies f=3xy+A(y)\\ f_{y}=x & \implies f=xy+B(x) \end{aligned}\] For similar reasons to the previous example, the two lines are contradictory, so there is no function \(f(x,y)\) whose total differential is \(df\).

2.3.1.1 Testing for exactness

We may use Clairault’s Theorem (i.e. the commutativity of mixed partials) to test for an exact differential. i.e. suppose that we are given a differential \[\odif{f}=a(x,y)\odif{x}+b(x,y)\odif{y}.\] If this is exact, it means that \(a\equiv f_{x}\) and \(b\equiv f_{y}\) for some choice of \(f(x,y)\). Therefore by Clairault’s Theorem \[a_{y}= f_{xy} = f_{yx}= b_{x}. \tag{2.4}\]

Thus we see that if the differential is exact, then this condition on \(a,b\) is satisfied. We will not prove the converse statement here, but it turns out the converse is true, i.e. if Equation 2.4 is true, then a function \(f(x,y)\) exists2 so that \(a = f_{x}, b = f_{y}\). It should make sense to see how this generalises to more dimensions; for example for three dimensions we would write: \[\odif{f}=a(x,y,z)\odif{x}+b(x,y,z)\odif{y}+c(x,y,z)\odif{z}\,\,,\] and then to check for exactness we need to check all pairs, \(a_{y}=b_{x}\), \(a_{z}=c_{x}\), \(b_{z}=c_{y}\). (Check that this is enough if this is not clear).

Suggested questions: Q14-19.

2.4 The gradient of a function and a first look at vector calculus

Recall that in vector notation, a point is specified by: \(\boldsymbol{x} = x \boldsymbol{i} + y \boldsymbol{j}\). Now let’s again imagine changing the position as \((x,y)\rightarrow(x+\odif{x},\,y+\odif{y})\). Note that we can write this in a vector form as: \[\boldsymbol{x}\rightarrow\boldsymbol{x}+\mathrm{d}\boldsymbol{x}\] where \(\mathrm{d}\boldsymbol{x}\) is the infinitesimal vector \(\mathrm{d}\boldsymbol{x} = dx\;\boldsymbol{i} + \odif{y}\;\boldsymbol{j}\). Note carefully what’s happening here – \(\mathrm{d}\boldsymbol{x}\) is a vector, and we move a distance \(\odif{x}\) in the \(\boldsymbol{i}\) direction and \(\odif{y}\) in the \(\boldsymbol{j}\) direction.

Now recall we had the following expression for the change of the function \(f(x,y)\) \[\odif{f} = \pdv{f}{x}\odif{x}+\pdv{f}{y}\odif{y}\,.\] Let’s write the differential in a fancy vector notation: \[\begin{aligned} \odif{f} &= \pdv{f}{x} \odif{x}+\pdv{f}{y}\odif{y}\nonumber \\ &= \mathrm{d}\boldsymbol{x} \cdot \left( \pdv{f}{x} \boldsymbol{i} + \pdv{f}{y} \boldsymbol{j} \right) \,. \end{aligned} \tag{2.5}\] Here the dot is the usual dot product. In practice, when dealing with vectors we typically suppress the \(\boldsymbol{i}\) and \(\boldsymbol{j}\) notation, writing instead \(\boldsymbol x = (x,y)\), \(\mathrm{d}\boldsymbol{x} = (dx, dy)\), \(\pdv{f}{x} \boldsymbol{i} + \pdv{f}{y} \boldsymbol{j} = \left(\pdv{f}{x},\,\pdv{f}{y}\right)\), etc.

The expression Equation 2.5 now suggests that we define a new vector object \(\boldsymbol{\nabla} f\).

Definition
The “gradient of \(f\)”, (or “grad \(f\)” or “del \(f\)” for short) is given by \[\boldsymbol{\nabla} f=\left(\frac{\partial f}{\partial x},\,\frac{\partial f}{\partial y}\right).\] It is a vector (field) whose components are the partial derivatives of \(f\). Note that the gradient can also be written in terms of basis vectors as \[\boldsymbol{\nabla} f = \frac{\partial f}{\partial x} \boldsymbol{i} + \frac{\partial f}{\partial y} \boldsymbol{j}.\]

It is also convenient to think of \(\boldsymbol{\nabla}\) as an object in its own right – it is a vector differential operator \(\boldsymbol{\nabla}=(\pdv{}{x},\,\frac{\partial}{\partial y})\) and is called del, grad, or nabla.

In terms of the gradient, we can write the expression for the differential of the function \(f\) as: \[\odif{f} = \odif{\boldsymbol{x}} \cdot \boldsymbol{\nabla} f.\]

Let’s introduce one more bit of terminology.

Definition
The directional derivative of \(f\) in the direction of a vector \(\boldsymbol{u}\) is \[\begin{aligned} \nabla_{\boldsymbol{u}} f = \frac{1}{\vert \boldsymbol{u}\vert} \ \boldsymbol{u} \cdot \boldsymbol{\nabla} f. \end{aligned}\] It is a scalar quantity, representing the rate of change of \(f\) in the direction of \(\mathbf{u}\).

2.4.1 What do \(\boldsymbol{\nabla} f\) and \(\nabla_{\boldsymbol{u}} f\) mean?

We now have quite a lot of formalism. Let us work out an example. Consider the “bowl” function \[f(x,y) = x^2 + y^2.\] What is \(\boldsymbol{\nabla} f\)? We have \[\boldsymbol{\nabla} f(x,y) = \left( \pdv{f}{x}, \pdv{f}{y} \right) = (2x, 2y) \qquad \text{ or } \qquad \boldsymbol{\nabla} f = 2x\;\boldsymbol{i} + 2y\;\boldsymbol{j}.\] Let us draw a picture; we see that the gradient points outwards, which is also the direction in which the function \(f(x,y)\) gets bigger and bigger. This is always true – the gradient always points in the direction of greatest increase.

Let us now prove this with equations. For any direction \(\boldsymbol{u}\), the directional derivative is \[\begin{aligned} \nabla_{\boldsymbol{u}} f & = \frac{1}{\vert \boldsymbol{u}\vert} \boldsymbol{u} \cdot \boldsymbol{\nabla} f \\ & = 1 \cdot \vert \boldsymbol{\nabla} f \vert \cos \theta, \end{aligned}\] where \(\theta\) is the angle between the two vectors \(\boldsymbol{u}\) and \(\boldsymbol{\nabla} f\). The directional derivative is therefore a maximum when \(\cos \theta = 1\); that is, when \(\theta = 0\), meaning that \(\boldsymbol{u}\) and \(\boldsymbol{\nabla} f\) point in the same direction – in other words, the gradient always points in the direction of greatest increase, as claimed. Notice, also, that the rate of change of \(f\) in the direction of \(\boldsymbol\nabla f\) (i.e.the maximal rate of change) is \[\nabla_{\boldsymbol\nabla f} f = \frac{1}{\vert \boldsymbol{\nabla} f \vert} \boldsymbol{\nabla} f \cdot \boldsymbol{\nabla} f = \frac{\vert \boldsymbol{\nabla} f \vert^2}{\vert \boldsymbol{\nabla} f \vert} = \vert \boldsymbol{\nabla} f \vert.\]

The function \(f(x,y)\) is, by definition, constant along the level curves (contours) of the surface described by \(z = f(x,y)\). This means that in directions tangent to the level curves the directional derivative should be \(0\) (as \(f(x,y)\) is not changing in this direction); that is, we should have \(\boldsymbol u \cdot \boldsymbol\nabla f = 0\) (i.e. \(\cos\theta=0\)) – in other words the level curves are at right angles to \(\boldsymbol{\nabla} f\).

Moreover, if we look at the level curves (contours) of the surface on the \(xy\)-plane, then the fact that \(\boldsymbol{\nabla} f\) is the direction of greatest increase means that it should always point “up the slope”.

Examples
Let \(f(x,y) = x^2 + y^2\) as before and consider the paraboloid or “bowl” described by \(z = x^2 + y^2\). The gradient of \(f\) is \[\boldsymbol{\nabla} f(x,y) = (2x, 2y).\] The level curves to the surface are all of the form \(x^2 + y^2 = c\). That is, if \(c > 0\) then the level curve is a circle, if \(c = 0\) then the level curve consists of a single point (the origin), while if \(c < 0\) then the level curve is empty. Viewed in the \(xy\)-plane, the level curves look like a collection of concentric circles around the origin (with \(c \geq 0\) being the square of the radius). Let’s focus on the interesting case where the level curves are circles, i.e. where \(c > 0\). As we just learned, the gradient \(\boldsymbol{\nabla} f = (2x, 2y)\) must always be perpendicular to the level curves (circles), and this is clear in this case from drawing a simple sketch. Moreover, the gradient points in the direction (outwards, perpendicular to the level curves) of greatest increase of the function \(f\); that is, the level curves are the circles \(f(x,y) = c\) and, hence, \(f\) increasing is the same as the radius of the circles increasing.

Can we write down unit vectors \(\boldsymbol{\hat{t}}\) tangent to the level curves? By inspection (which means guessing from looking at the pictures, but sounds fancier), we can see that suitable unit vectors are given by \[\boldsymbol{\hat{t}} = \frac{1}{\sqrt{x^2 + y^2}} \left(-y, x\right).\]

💪 Try it out
Find the rate of change of \(f(x,y)=y^{4}+x^{2}y^{2}+x\) at \((0,1)\) in the direction of the vector \(\boldsymbol{i}+2\boldsymbol{j}\).

Answer:

First find \(\boldsymbol{\nabla} f\). We have \[\begin{aligned} f_{x} &= 1+2xy^{2}=1\\ f_{y} &= 4y^{3}+2yx^{2}=4\\ \boldsymbol{\nabla} f &= (1+2xy^{2})\boldsymbol{i}+(4y^{3}+2yx^{2})\boldsymbol{j}, \end{aligned}\] so \(\boldsymbol{\nabla} f(0,1) = \boldsymbol{i}+4\boldsymbol{j}\). Now we need the unit vector in the direction of \(\boldsymbol{i}+2\boldsymbol{j}\). This is \[\begin{aligned} \boldsymbol{\hat{n}} &= (\boldsymbol{i}+2\boldsymbol{j})/(1+2^{2})\\ &= \frac{1}{\sqrt{5}}(\boldsymbol{i}+2\boldsymbol{j}) \end{aligned}\] Therefore, the rate of change of \(f\) at \((0,1)\) in the direction \(\boldsymbol{i}+2\boldsymbol{j}\) is \[\boldsymbol{\hat{n}} \cdot \boldsymbol{\nabla} f(0,1) = \frac{1}{\sqrt{5}}(\boldsymbol{i}+2\boldsymbol{j})\cdot(\boldsymbol{i}+4\boldsymbol{j})\\ = \frac{9}{\sqrt{5}}.\]

💪 Try it out
The temperature on a metal plate is \(T(x,y)=x^{2}e^{-y}\). At the point \((2,1)\), in what direction does the temperature increase most rapidly?

Answer:

First find \(\boldsymbol{\nabla} T\). We have \[\begin{aligned} T_{x} &= 2xe^{-y}=4/e \,,\\ T_{y} &= -x^{2}e^{-y}=-4/e \,,\\ \boldsymbol{\nabla} T = 2xe^{-y}\boldsymbol{i}-x^{2}e^{-y}\boldsymbol{j}, \end{aligned}\] so \(\boldsymbol{\nabla} T (2,1) = \frac{4}{e}(\boldsymbol{i}-\boldsymbol{j})\). Therefore, (a unit vector in) the direction of greatest increase at \((2,1)\) is \[\frac{1}{\sqrt{2}}(\boldsymbol{i}-\boldsymbol{j}),\] and the rate of increase in this direction at \((2,1)\) is \[|\boldsymbol{\nabla} T (2,1)|=\frac{4\sqrt{2}}{e} \,.\]

💪 Try it out
Find the level curve of \(f(x,y)=y^{4}+x^{2}y^{2}+x\) through the point \((0,1)\) and verify that its tangent at this point is orthogonal to \(\boldsymbol{\nabla} f\).

Answer:

The level curves are defined by \(c = f(x,y) = y^{4}+x^{2}y^{2}+x\). At \((0,1)\) you can easily verify that \(f(0,1)=1\), so we must have \(c=1\). Thus, the equation of the level curve is \(y^{4}+x^{2}y^{2}+x=1\). A tangent vector to this level curve at to this point \((0,1)\) has “slope” \(\odv{y}{x}(0,1)\) in the \(xy\)-plane. Hence, differentiating we find \[\begin{aligned} \odv{y}{x}(4y^{3}+2yx^{2})+2xy^{2}+1 & = 0,\\ \text{and, thus, }\ \ \odv{y}{x} (0,1) &= -\frac{1}{4}. \end{aligned}\] So the corresponding unit vector in the \(xy\)-plane (i.e. having this slope) is \[\begin{aligned} \boldsymbol{\hat{p}}=\frac{1}{\sqrt{17}}(4,-1). \end{aligned}\] Next \(\boldsymbol{\nabla} f\) is given by \[\boldsymbol{\nabla} f=(2xy^{2}+1,\,4y^{3}+2yx^{2}),\] so \(\boldsymbol{\nabla} f(0,1) = (1,4)\) and, therefore, \(\boldsymbol{\nabla} f(0,1) \cdot \boldsymbol{\hat{p}}=0\), as desired.

2.4.2 \(\boldsymbol{\nabla} f\) in arbitrary dimensions

We’ve been working in two dimensions, but of course all of the concepts generalise to any number of dimensions. Let \(f(x_1, x_2, \dots, x_n)\) be a function depending on \(n\) variables and let \(\boldsymbol{e}_{1}, \boldsymbol{e}_{2}, \dots, \boldsymbol{e}_{n}\) be the standard basis of \(\mathbb{R}^n\) (i.e. one unit vector along each coordinate axis). Then the gradient of \(f\) is a function \(\boldsymbol{\nabla} f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}\) such that \[\boldsymbol{\nabla} f = \sum_{i=1}^n \pdv{f}{x_i} \, \boldsymbol{e}_i = \left(\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}},\dots ,\,\frac{\partial f}{\partial x_{n}}\right), \tag{2.6}\] and the directional derivative (i.e. rate of change) of \(f\) in the direction of a unit vector \(\boldsymbol{\hat{u}} = \sum_{i=1}^n u_i \, \boldsymbol{e}_i = (u_1, u_2, \dots, u_n)\) is given by \[\nabla_{\boldsymbol{\hat{u}}} f = \boldsymbol{\hat{u}} \cdot \boldsymbol{\nabla} f\,\,.\] For example, the temperature in this room can be written \(T(x,y,z)\) and we can define the rate of change of \(T\) when moving in some arbitrary direction \(\boldsymbol{\hat{u}}\) in the same fashion.

Finally, note that the gradient operator satisfies the following two properties when acting on scalar functions:

  1. Distributivity: \(\boldsymbol{\nabla}(f+g) = \boldsymbol{\nabla} f + \boldsymbol{\nabla} g\).

  2. Product rule: \(\boldsymbol{\nabla} (fg) = (\boldsymbol{\nabla} f) g + f \boldsymbol{\nabla} g\).

Both of these properties should be familiar from ordinary derivatives: they follow from the definition of the gradient in a particular basis, i.e. Equation 2.6.

Suggested questions: Q20-23.


  1. I use this result almost every day of my working life, and did not know until giving these lectures that it had a name.↩︎

  2. Actually this is generally only true locally, i.e. the function \(f(x,y)\) that we build might have some problems if we try to define it in all space. In fact this works only when the spaces we are considering are “simple”; if they have holes in them etc. then we cannot globally define \(f\). This is thus a connection between “topology” (i.e. global properties of spaces) and calculus. It is also a surprisingly important subject to physicists: google “de Rham cohomology” to find out more.↩︎