$$ \usepackage{derivative} \newcommand{\odif}[1]{\mathrm{d}#1} \newcommand{\odv}[2]{\frac{ \mathrm{d}#1}{\mathrm{d}#2}} \newcommand{\pdv}[2]{\frac{ \partial#1}{\partial#2}} \newcommand{\th}{\theta} \newcommand{\p}{\partial} \newcommand{\var}[1]{\text{Var}\left(#1\right)} \newcommand{\sd}[1]{\sigma\left(#1\right)} \newcommand{\cov}[1]{\text{Cov}\left(#1\right)} \newcommand{\cexpec}[2]{\mathbb{E}\left[#1 \vert#2 \right]} $$
2 Partial derivatives
2.1 Functions of several variables
Our course now builds on the calculus that you have learnt in Single Maths B. There you worked with functions of just one variable; in this part of the course we will extend the idea of differentiation and integration to functions of more than one variable. This is both great fun and of fundamental importance to many applications of mathematics.
2.1.1 Examples of functions of several variables
To orient ourselves, let’s just remind ourselves that in ordinary life it is really quite common to have functions of more than one variable.
2.1.2 Graphs of functions
We understand that an equation \(y=f(x)\) describes a curve in a plane. Given \(x\) we can compute \(y\) if we know the function \(f\). The equation \(y=f(x)\) describes how \((x,y)\) moves as we vary \(x\). The relation may be expressed implicitly as \(g(x,y)=0\). Let’s discuss a few examples:
Thinking about curves and surfaces in higher dimensions, we have two new helpful ideas:
2.1.3 Examples of graphs of functions
Using this Desmos link, explore the properties of the graphs of some of these functions:
\(z=5\): for some range of \(x,y\). - a flat roof.
\(z=x^{2}+\sin y\): Defined for finite \(x\) and \(y\). For fixed \(y\) we get a parabola, and for fixed \(x\) we have a \(sin\) curve whose height is shifted.
\(z=\cos(xy)\): Again the function is defined for all \(x,y\). \(z=const\) for \(xy=const\). Contours of constant height would be hyperbolas. Keeping e.g. \(y\) constant gives a cosine in \(x\) with a period determined by \(y\).
\(z=\frac{\sin\sqrt{x^{2}+y^{2}}}{\sqrt{x^{2}+y^{2}}}\) the sombrero: Note the rotational symmetry about the \(z\) axis through the origin. \(z\) has the same value for all \(x^{2}+y^{2}=r^{2}\) with \(r\) constant. This is the equation for a circle in the \(x,y\) plane; the rotational symmetry is quite evident in the picture.
\(z=x^{3}-3xy^{2}\) the monkey saddle: Cubic curve for fixed \(y\) and parabolic for fixed \(x\).
\(z=(x^{2}+y^{2})/a^{2}\) satellite dish: This is a circular paraboloid. Parallel rays are focussed onto the focus of the dish at \((0,0,a)\).
Suggested questions: Q1-2.
2.2 Partial derivatives
We now embark on a program of extending things that we know from single-variable calculus to multiple variables. The first thing we study is the partial derivative.
At a point \((a,b)\), the partial derivative gives the slope of the relevant section curve (i.e. either \(z=f(x,b)\) or \(z=f(a,y)\)). It’s called “partial” because we are only differentiating with respect to one of the variables (not all of them), and it is denoted with \(\partial\), not \(\mathrm{d}\). The result is simply the same as taking the derivative with respect to \(x\) (or \(y\)) and treating \(y\) (or \(x\) respectively) as a constant. For any function \(f(x,y)\) we get \[\begin{aligned} \frac{\partial f}{\partial x} & \text{ by differentiating with respect to x and keeping y constant}\\ \frac{\partial f}{\partial y} & \text{ by differentiating with respect to y and keeping x constant} \end{aligned}\]
Similarly for a function \(f(x_{1}, x_2, \dots, x_{n}):\mathbb{R}^{n}\rightarrow\mathtt{\mathbb{R}}\), the partial derivative \(\frac{\partial f}{\partial x_{i}}\) is the derivative keeping all but \(x_{i}\) constant.
Sometimes \(\frac{\partial f}{\partial x}\) is written \(f_{x}\) and \(\frac{\partial f}{\partial y}\) is written \(f_{y}\). Differentiating again we get \[\begin{aligned} \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial x}\right) &= \frac{\partial^{2}f}{\partial x^{2}} \ \ \text{ (also written as $f_{xx}$)}; & \frac{\partial}{\partial y}\left(\frac{\partial f}{\partial x}\right) &= \frac{\partial^{2}f}{\partial y\partial x} \ \ \text{ (also written as $ f_{xy} $)} \\ \frac{\partial}{\partial x}\left(\frac{\partial f}{\partial y}\right) &= \frac{\partial^{2}f}{\partial x\partial y} \ \ \text{ (also written as $ f_{yx}$)}; & \frac{\partial}{\partial y}\left(\frac{\partial f}{\partial y}\right) &= \frac{\partial^{2}f}{\partial y^{2}} \ \ \text{ (also written as $f_{yy}$)}\,. \end{aligned}\]
This is a crucial (and rather simple) concept for the rest of the course, so before going any further let’s work out another example.
Clairault’s Theorem (or Schwarz’s Theorem):1 Consider a function \(f:\mathbb{R}^{n}\rightarrow\mathtt{\mathbb{R}}\); that is, a real-valued function \(f(x_{1},\dots, x_{n})\) depending on \(n\) variables. If the second-order partial derivatives exist and are continuous on a small open disc centred at a point \(\boldsymbol{a} = (a_1, a_2, \dots, a_n) \in \mathbb{R}^n\), then \[\frac{\partial^{2}f}{\partial x_{i}\partial x_{j}} \left(\boldsymbol{a} \right) = \frac{\partial^{2}f}{\partial x_{j}\partial x_{i}}\left(\boldsymbol{a} \right)\,,\] for all \(i, j \in \{1, 2, \dots, n\}\).
Here, “small open disc centred at \(\boldsymbol a\)” means the subset \(\{\boldsymbol x \in \mathbb{R}^n \mid \| \boldsymbol x - \boldsymbol a\| < r\} \subseteq \mathbb{R}^n\) of all points in \(\mathbb{R}^n\) of distance (strictly) less that some fixed \(r > 0\) from \(\boldsymbol a\). The radius \(r\) of the disc can be arbitrarily small: the important thing is that some choice of radius works, not precisely which one. Moreover, the continuity condition is automatically satisfied if the second-order partial derivatives are themselves differentiable (i.e. if the third-order partial derivatives exist).
Clairault’s Theorem holds for all familiar functions: polynomials, trigonometric functions, exponential functions, …. If you’d like to see a relatively simple function for which the second-order partial derivatives do not commute, then check out the function found by Peano here.
We sometimes express Clairault’s Theorem in words by saying that “partial derivatives commute”: the word “commutative” means that the order in which you do the two operations (i.e. partial differentiation with respect to \(x_i\) and with respect to \(x_j\)) does not matter.
More (Somewhat Tedious) Examples
You probably get the idea by now, but here are a few more examples:
Suggested questions: Q3-13.
2.3 Differentials and Directional Derivatives
How can we find the rate of change of a function \(f(x)\) when moving in some arbitrary direction? When we have a single variable, the gradient gives us the infinitesimal change of the function \(\odif{f}\) with an infinitesimal change in \(\odif{x}\), that is we have an equation which looks like this: \[\odif{f}=\odv{f}{x} \odif{x}. \tag{2.1}\] The objects representing small changes, \(\odif{f},\odif{x}\) are called differentials.
Let us now generalise this to the multivariable case. When we have more than one variable we can imagine moving in an arbitrary direction. Let’s consider two variables \((x,y)\) for concreteness and imagine that we move along by \(\odif{x}\) in the \(x\) direction, and \(\odif{y}\) in the \(y\) direction: \[(x,y) \mapsto (x+\odif{x},y + \odif{y}).\]
Examples
2.3.1 Exact and Inexact differentials
Let us generalise slightly in the case of two variables: the most general differential can be written as \[\odif{f} = a(x,y) \,\odif{x}+b(x,y)\,\odif{y}. \tag{2.3}\] There is something slightly misleading about the notation \(\odif{f}\): it suggests that all differentials can be written as “\(\mathrm{d}\) of a function \(f\)”. As it turns out, this is not true, and there are actually two different kind of differentials: exact and inexact.
It should seem intuitively reasonable that a random choice of functions \(a(x,y)\) and \(b(x,y)\) will probably result in an inexact differential.
Examples
2.3.1.1 Testing for exactness
We may use Clairault’s Theorem (i.e. the commutativity of mixed partials) to test for an exact differential. i.e. suppose that we are given a differential \[\odif{f}=a(x,y)\odif{x}+b(x,y)\odif{y}.\] If this is exact, it means that \(a\equiv f_{x}\) and \(b\equiv f_{y}\) for some choice of \(f(x,y)\). Therefore by Clairault’s Theorem \[a_{y}= f_{xy} = f_{yx}= b_{x}. \tag{2.4}\]
Thus we see that if the differential is exact, then this condition on \(a,b\) is satisfied. We will not prove the converse statement here, but it turns out the converse is true, i.e. if Equation 2.4 is true, then a function \(f(x,y)\) exists2 so that \(a = f_{x}, b = f_{y}\). It should make sense to see how this generalises to more dimensions; for example for three dimensions we would write: \[\odif{f}=a(x,y,z)\odif{x}+b(x,y,z)\odif{y}+c(x,y,z)\odif{z}\,\,,\] and then to check for exactness we need to check all pairs, \(a_{y}=b_{x}\), \(a_{z}=c_{x}\), \(b_{z}=c_{y}\). (Check that this is enough if this is not clear).
Suggested questions: Q14-19.
2.4 The gradient of a function and a first look at vector calculus
Recall that in vector notation, a point is specified by: \(\boldsymbol{x} = x \boldsymbol{i} + y \boldsymbol{j}\). Now let’s again imagine changing the position as \((x,y)\rightarrow(x+\odif{x},\,y+\odif{y})\). Note that we can write this in a vector form as: \[\boldsymbol{x}\rightarrow\boldsymbol{x}+\mathrm{d}\boldsymbol{x}\] where \(\mathrm{d}\boldsymbol{x}\) is the infinitesimal vector \(\mathrm{d}\boldsymbol{x} = dx\;\boldsymbol{i} + \odif{y}\;\boldsymbol{j}\). Note carefully what’s happening here – \(\mathrm{d}\boldsymbol{x}\) is a vector, and we move a distance \(\odif{x}\) in the \(\boldsymbol{i}\) direction and \(\odif{y}\) in the \(\boldsymbol{j}\) direction.
Now recall we had the following expression for the change of the function \(f(x,y)\) \[\odif{f} = \pdv{f}{x}\odif{x}+\pdv{f}{y}\odif{y}\,.\] Let’s write the differential in a fancy vector notation: \[\begin{aligned} \odif{f} &= \pdv{f}{x} \odif{x}+\pdv{f}{y}\odif{y}\nonumber \\ &= \mathrm{d}\boldsymbol{x} \cdot \left( \pdv{f}{x} \boldsymbol{i} + \pdv{f}{y} \boldsymbol{j} \right) \,. \end{aligned} \tag{2.5}\] Here the dot is the usual dot product. In practice, when dealing with vectors we typically suppress the \(\boldsymbol{i}\) and \(\boldsymbol{j}\) notation, writing instead \(\boldsymbol x = (x,y)\), \(\mathrm{d}\boldsymbol{x} = (dx, dy)\), \(\pdv{f}{x} \boldsymbol{i} + \pdv{f}{y} \boldsymbol{j} = \left(\pdv{f}{x},\,\pdv{f}{y}\right)\), etc.
The expression Equation 2.5 now suggests that we define a new vector object \(\boldsymbol{\nabla} f\).
It is also convenient to think of \(\boldsymbol{\nabla}\) as an object in its own right – it is a vector differential operator \(\boldsymbol{\nabla}=(\pdv{}{x},\,\frac{\partial}{\partial y})\) and is called del, grad, or nabla.
In terms of the gradient, we can write the expression for the differential of the function \(f\) as: \[\odif{f} = \odif{\boldsymbol{x}} \cdot \boldsymbol{\nabla} f.\]
Let’s introduce one more bit of terminology.
2.4.1 What do \(\boldsymbol{\nabla} f\) and \(\nabla_{\boldsymbol{u}} f\) mean?
We now have quite a lot of formalism. Let us work out an example. Consider the “bowl” function \[f(x,y) = x^2 + y^2.\] What is \(\boldsymbol{\nabla} f\)? We have \[\boldsymbol{\nabla} f(x,y) = \left( \pdv{f}{x}, \pdv{f}{y} \right) = (2x, 2y) \qquad \text{ or } \qquad \boldsymbol{\nabla} f = 2x\;\boldsymbol{i} + 2y\;\boldsymbol{j}.\] Let us draw a picture; we see that the gradient points outwards, which is also the direction in which the function \(f(x,y)\) gets bigger and bigger. This is always true – the gradient always points in the direction of greatest increase.
Let us now prove this with equations. For any direction \(\boldsymbol{u}\), the directional derivative is \[\begin{aligned} \nabla_{\boldsymbol{u}} f & = \frac{1}{\vert \boldsymbol{u}\vert} \boldsymbol{u} \cdot \boldsymbol{\nabla} f \\ & = 1 \cdot \vert \boldsymbol{\nabla} f \vert \cos \theta, \end{aligned}\] where \(\theta\) is the angle between the two vectors \(\boldsymbol{u}\) and \(\boldsymbol{\nabla} f\). The directional derivative is therefore a maximum when \(\cos \theta = 1\); that is, when \(\theta = 0\), meaning that \(\boldsymbol{u}\) and \(\boldsymbol{\nabla} f\) point in the same direction – in other words, the gradient always points in the direction of greatest increase, as claimed. Notice, also, that the rate of change of \(f\) in the direction of \(\boldsymbol\nabla f\) (i.e.the maximal rate of change) is \[\nabla_{\boldsymbol\nabla f} f = \frac{1}{\vert \boldsymbol{\nabla} f \vert} \boldsymbol{\nabla} f \cdot \boldsymbol{\nabla} f = \frac{\vert \boldsymbol{\nabla} f \vert^2}{\vert \boldsymbol{\nabla} f \vert} = \vert \boldsymbol{\nabla} f \vert.\]
The function \(f(x,y)\) is, by definition, constant along the level curves (contours) of the surface described by \(z = f(x,y)\). This means that in directions tangent to the level curves the directional derivative should be \(0\) (as \(f(x,y)\) is not changing in this direction); that is, we should have \(\boldsymbol u \cdot \boldsymbol\nabla f = 0\) (i.e. \(\cos\theta=0\)) – in other words the level curves are at right angles to \(\boldsymbol{\nabla} f\).
Moreover, if we look at the level curves (contours) of the surface on the \(xy\)-plane, then the fact that \(\boldsymbol{\nabla} f\) is the direction of greatest increase means that it should always point “up the slope”.
2.4.2 \(\boldsymbol{\nabla} f\) in arbitrary dimensions
We’ve been working in two dimensions, but of course all of the concepts generalise to any number of dimensions. Let \(f(x_1, x_2, \dots, x_n)\) be a function depending on \(n\) variables and let \(\boldsymbol{e}_{1}, \boldsymbol{e}_{2}, \dots, \boldsymbol{e}_{n}\) be the standard basis of \(\mathbb{R}^n\) (i.e. one unit vector along each coordinate axis). Then the gradient of \(f\) is a function \(\boldsymbol{\nabla} f:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}\) such that \[\boldsymbol{\nabla} f = \sum_{i=1}^n \pdv{f}{x_i} \, \boldsymbol{e}_i = \left(\frac{\partial f}{\partial x_{1}}, \frac{\partial f}{\partial x_{2}},\dots ,\,\frac{\partial f}{\partial x_{n}}\right), \tag{2.6}\] and the directional derivative (i.e. rate of change) of \(f\) in the direction of a unit vector \(\boldsymbol{\hat{u}} = \sum_{i=1}^n u_i \, \boldsymbol{e}_i = (u_1, u_2, \dots, u_n)\) is given by \[\nabla_{\boldsymbol{\hat{u}}} f = \boldsymbol{\hat{u}} \cdot \boldsymbol{\nabla} f\,\,.\] For example, the temperature in this room can be written \(T(x,y,z)\) and we can define the rate of change of \(T\) when moving in some arbitrary direction \(\boldsymbol{\hat{u}}\) in the same fashion.
Finally, note that the gradient operator satisfies the following two properties when acting on scalar functions:
Distributivity: \(\boldsymbol{\nabla}(f+g) = \boldsymbol{\nabla} f + \boldsymbol{\nabla} g\).
Product rule: \(\boldsymbol{\nabla} (fg) = (\boldsymbol{\nabla} f) g + f \boldsymbol{\nabla} g\).
Both of these properties should be familiar from ordinary derivatives: they follow from the definition of the gradient in a particular basis, i.e. Equation 2.6.
Suggested questions: Q20-23.
I use this result almost every day of my working life, and did not know until giving these lectures that it had a name.↩︎
Actually this is generally only true locally, i.e. the function \(f(x,y)\) that we build might have some problems if we try to define it in all space. In fact this works only when the spaces we are considering are “simple”; if they have holes in them etc. then we cannot globally define \(f\). This is thus a connection between “topology” (i.e. global properties of spaces) and calculus. It is also a surprisingly important subject to physicists: google “de Rham cohomology” to find out more.↩︎