5  Coordinates and differentiability

5.1 Orthogonal curvilinear coordinates

We have seen that volume integrals can be evaluated by changing to a new coordinate system \(\xb(u,v,w)\). More broadly, it can be useful to define a basis for the \((u,v,w)\) coordinate system and express vector fields in this basis.

Recall that the tangent vectors in the new coordinates are given by \(\displaystyle\ddy{\xb}{u}, \ddy{\xb}{v}, \ddy{\xb}{w}\).

Each tangent vector is tangent to a curve on which the other two coordinates are constant.

Tangent vectors and coordinate surfaces.

The coordinates \((u,v,w)\) are called orthogonal curvilinear if the three tangent vectors are mutually orthogonal at every point. We can then define an orthonormal basis \(\{\eb_u, \eb_v, \eb_w\}\) by normalising: \[ \eb_u = \frac{1}{h_u}\ddy{\xb}{u}, \quad \eb_v = \frac{1}{h_v}\ddy{\xb}{v}, \quad \eb_w = \frac{1}{h_w}\ddy{\xb}{w}, \] where \[ h_u = \left|\ddy{\xb}{u}\right|, \quad h_v = \left|\ddy{\xb}{v}\right|, \quad h_w = \left|\ddy{\xb}{w}\right| \] are called the scale factors.

The scale factors tell you how the \((u,v,w)\) coordinate system is stretched in space compared to Cartesian. The (actual) edge lengths of an infinitesimal cube in \((u,v,w)\) coordinates are as follows:

Infinitesimal cube.

We have \(\xb(x,y,z) = x\eb_1 + y\eb_2 + z\eb_3\), so \[ \ddy{\xb}{x} = \eb_1, \quad \ddy{\xb}{y}=\eb_2, \quad \ddy{\xb}{z}=\eb_3. \] Clearly any pair of these tangent vectors are orthogonal. The scale factors are \[ h_x = \left|\ddy{\xb}{x}\right| = 1, \quad h_y=\left|\ddy{\xb}{y}\right| = 1, \quad h_z=\left|\ddy{\xb}{z}\right| = 1. \] (So the basis \(\{\eb_1, \eb_2, \eb_3\}\) is already orthonormal.)

We have \(\xb(r,\theta,z) = r\cos\theta\eb_1 + r\sin\theta\eb_2 + z\eb_3\), for \(r\geq 0\) and \(\theta\in[0,2\pi]\), so \[\begin{align*} &\ddy{\xb}{r} = \cos\theta\eb_1 + \sin\theta\eb_2,\\ &\ddy{\xb}{\theta} = -r\sin\theta\eb_1 + r\cos\theta\eb_2,\\ &\ddy{\xb}{z} = \eb_3. \end{align*}\] Check that these are orthogonal: \[\begin{align*} &\ddy{\xb}{r}\cdot\ddy{\xb}{\theta} = -r\cos\theta\sin\theta + r\sin\theta\cos\theta=0,\\ &\ddy{\xb}{r}\cdot\ddy{\xb}{z} = 0,\\ &\ddy{\xb}{\theta}\cdot\ddy{\xb}{z} = 0. \end{align*}\] The scale factors are \[\begin{align*} &h_r = \sqrt{\cos^2\theta + \sin^2\theta} = 1,\\ &h_\theta = \sqrt{r^2\cos^2\theta + r^2\sin^2\theta} = r,\\ &h_z = 1, \end{align*}\] so the unit vectors are \[\begin{align*} \eb_r &= \frac{1}{h_r}\ddy{\xb}{r} = \cos\theta\eb_1 + \sin\theta\eb_2,\\ \eb_\theta &= \frac{1}{h_\theta}\ddy{\xb}{\theta} = -\sin\theta\eb_1 + \cos\theta\eb_2,\\ \eb_z &= \frac{1}{h_z}\ddy{\xb}{z} = \eb_3. \end{align*}\]

Unit vectors in cylindrical coordinates.

Although they are mutually orthogonal, the basis vectors in a general orthogonal curvilinear coordinate system are not constant because their direction varies.

Any vector field can be expressed in the new basis.

From the previous example, we have that \[\begin{align*} \eb_r &= \frac{1}{h_r}\ddy{\xb}{r} = \cos\theta\eb_1 + \sin\theta\eb_2,\\ \eb_\theta &= \frac{1}{h_\theta}\ddy{\xb}{\theta} = -\sin\theta\eb_1 + \cos\theta\eb_2,\\ \eb_z &= \frac{1}{h_z}\ddy{\xb}{z} = \eb_3. \end{align*}\] However, for this problem we need to express \(\{\eb_1,\eb_2,\eb_3\}\) in terms of \(\{\eb_r,\eb_\theta,\eb_z\}\). Solving simultaneously gives \[\begin{align*} \eb_1 &= \cos\theta\eb_r - \sin\theta\eb_\theta,\\ \eb_2 &= \sin\theta\eb_r + \cos\theta\eb_\theta,\\ \eb_3 &= \eb_z, \end{align*}\] so \[ \fb(\xb) = -r\sin\theta(\cos\theta\eb_r - \sin\theta\eb_\theta) + r\cos\theta(\sin\theta\eb_r + \cos\theta\eb_\theta) = r\eb_\theta. \]

We can derive expressions for the various differential operators.

Proposition 5.1 If \((u,v,w)\) are orthogonal curvilinear coordinates, with \(\fb(\xb)\) a vector field and \(\phi(\xb)\) a scalar field, then

\((i)\;\;\) \(\displaystyle \nabla \phi = \frac{1}{h_u}\ddy{\phi}{u}\eb_u + \frac{1}{h_v}\ddy{\phi}{v}\eb_v + \frac{1}{h_w}\ddy{\phi}{w}\eb_w\),

\((ii)\;\;\) \(\displaystyle \nabla\cdot\fb = \frac{1}{h_uh_vh_w}\left[\ddy{}{u}(h_vh_wf_u) + \ddy{}{v}(h_uh_wf_v) + \ddy{}{w}(h_uh_vf_w) \right]\),

\((iii)\;\;\) \(\displaystyle \nabla\times\fb = \frac{1}{h_uh_vh_w}\begin{vmatrix}h_u\eb_u & h_v\eb_v & h_w\eb_w\\ \displaystyle\ddy{}{u} & \displaystyle\ddy{}{v} & \displaystyle\ddy{}{w}\\ h_uf_u & h_vf_v & h_wf_w\end{vmatrix}\),

\((iv)\;\;\) \(\displaystyle \nabla^2 \phi = \frac{1}{h_uh_vh_w}\left[\ddy{}{u}\left(\frac{h_vh_w}{h_u}\ddy{\phi}{u}\right) + \ddy{}{v}\left(\frac{h_uh_w}{h_v}\ddy{\phi}{v}\right) + \ddy{}{w}\left(\frac{h_uh_v}{h_w}\ddy{\phi}{w}\right)\right].\)

Proof. \((i)\) Gradient. The \(u\)-component of \(\nabla\phi\) is \[ \eb_u\cdot\nabla \phi = \left(\frac{1}{h_u}\ddy{\xb}{u}\right)\cdot\nabla \phi = \frac{1}{h_u}\ddy{x_j}{u}\ddy{\phi}{x_j} = \frac{1}{h_u}\ddy{\phi}{u}, \] where the last step used the chain rule. The other components are similar.

\((ii)\) Divergence. This is derived the same way as our original Cartesian expression – consider an infinitesimal cube aligned with \(\eb_u\), \(\eb_v\), \(\eb_w\).

Infinitesimal cube for deriving divergence.

The volume of this cube is \[\begin{align*} |V| &= \int_0^\delta\int_0^\delta\int_0^\delta\left|\ddy{\xb}{u}\cdot\left(\ddy{\xb}{v}\times\ddy{\xb}{w}\right)\right|\,du\,dv\,dw\\ &= \int_0^\delta\int_0^\delta\int_0^\delta h_uh_vh_w\Big|\eb_u\cdot(\eb_v\times\eb_w)\Big|\,du\,dv\,dw\\ &= \int_0^\delta\int_0^\delta\int_0^\delta h_uh_vh_w\Big|\eb_u\cdot(\pm\eb_u)\Big|\,du\,dv\,dw\\ &= \int_0^\delta\int_0^\delta\int_0^\delta h_uh_vh_w\,du\,dv\,dw\\ &= \delta^3 h_uh_vh_w \quad \textrm{[for $h_u, h_v, h_w$ evaluated at some point in the cube]}. \end{align*}\] Therefore \[ \nabla\cdot\fb = \lim_{|V|\to 0}\frac{1}{|V|}\oint_S\fb\cdot\dS = \lim_{\delta\to 0}\frac{1}{h_uh_vh_w\delta^3}\sum_{i=1}^6\int_{S_i}\fb\cdot\dS. \] The integral over the right-hand face gives \[ \int_{S_1}\fb\cdot\dS = \int_0^\delta\int_0^\delta\fb\cdot\eb_u \left|\ddy{\xb}{v}\times\ddy{\xb}{w}\right|\,dv\,dw = \int_0^\delta\int_0^\delta f_u h_vh_w\,dv\,dw=\delta^2h_vh_wf_u, \] evaluated at some point \(\xb_1\) on \(S_1\). Overall, we get \[\begin{align*} \nabla\cdot\fb &= \lim_{\delta\to 0}\frac{1}{h_uh_vh_w}\left( \frac{h_vh_wf_u\Big|_{\xb_1} - h_vh_wf_u\Big|_{\xb_2}}{\delta}\right.\\ &\qquad \left. + \frac{h_uh_wf_v\Big|_{\xb_3} - h_uh_wf_v\Big|_{\xb_4}}{\delta} + \frac{h_uh_vf_w\Big|_{\xb_5} - h_uh_vf_w\Big|_{\xb_6}}{\delta}\right),\\ \end{align*}\] which gives the required expression.

\((iii)\) Curl. We can derive the \(\eb_w\) component of \(\nabla\times\fb\) by considering a square in the \(uv\)-plane:

Infinitesimal loop for deriving curl.

The area inside the loop is \[\begin{align*} |A| &= \int_0^\delta\int_0^\delta\left|\ddy{\xb}{u}\times\ddy{\xb}{v}\right|\,du\,dv\\ &= \int_0^\delta\int_0^\delta h_uh_v\Big|\eb_u\times\eb_v\Big|\,du\,dv\\ &= \int_0^\delta\int_0^\delta h_uh_v\Big|\pm\eb_w\Big|\,du\,dv\\ &= \int_0^\delta\int_0^\delta h_uh_v\,du\,dv\\ &= \delta^2 h_uh_v \quad \textrm{[for $h_u,h_v$ evaluated at some point inside $C$]}. \end{align*}\]

So from the definition of curl, \[ \eb_w\cdot(\nabla\times\fb) = \lim_{|A|\to 0}\frac{1}{|A|}\oint_C\fb\cdot\,d\xb = \lim_{\delta\to 0}\frac{1}{\delta^2h_uh_v}\sum_{i=1}^4\int_{C_i}\fb\cdot\,d\xb. \] Evaluating the line integral along \(C_1\) gives \[\begin{align*} \int_{C_1}\fb\cdot\,d\xb &= \int_0^\delta\fb\cdot\ddy{\xb}{u}\,du\\ &= \int_0^\delta\fb\cdot(h_u\eb_u)\,du\\ &= \int_0^\delta h_u f_u\,du\\ &= \delta h_uf_u \quad \textrm{[evaluated at some point $\xb_1$ on $C_1$].} \end{align*}\] Overall, we get \[\begin{align*} \eb_w\cdot(\nabla\times\fb) &= \lim_{\delta\to 0}\frac{1}{h_uh_v}\left(\frac{h_uf_u\Big|_{\xb_1} - h_uf_u\Big|_{\xb_2}}{\delta} + \frac{h_vf_v\Big|_{\xb_3} - h_vf_v\Big|_{\xb_4}}{\delta} \right)\\ &= \frac{1}{h_uh_v}\left[\ddy{}{u}(h_vf_v) - \ddy{}{v}(h_uf_u)\right]. \end{align*}\] The \(\eb_u\) and \(\eb_v\) components are obtained analogously.

\((iv)\) Laplacian. [See Tutorial Sheet.]


From earlier we have \(h_r=1\), \(h_\theta=r\), \(h_z = 1\), so from Proposition 5.1 \((iii)\) we have \[ \nabla\times\fb = \frac{1}{h_rh_\theta h_\phi}\begin{vmatrix} h_r\eb_r & h_\theta\eb_\theta & h_z\eb_z\\ \displaystyle\ddy{}{r} & \displaystyle\ddy{}{\theta} & \displaystyle\ddy{}{z}\\ h_rf_r & h_\theta f_\theta & h_z f_z \end{vmatrix} = \frac{1}{r}\begin{vmatrix} \eb_r & r\eb_\theta & \eb_z\\ \displaystyle\ddy{}{r} & \displaystyle\ddy{}{\theta} & \displaystyle\ddy{}{z}\\ 0 & r^2 & 0 \end{vmatrix} = \frac{1}{r}\ddy{}{r}(r^2)\eb_z = 2\eb_z. \] This agrees with what we found in Section 4.1 (with \(\omega=1\)).

5.2 Differentiability

A function \(\fb:\Real^n\to\Real^m\) is differentiable at \(\ab\in\Real^n\) if there exists an \(m\times n\) matrix \(D\fb(\ab)\) called the Jacobian matrix (or derivative) such that \[ \lim_{|\hb|\to 0}\frac{\Big(\fb(\ab+\hb) - \fb(\ab) \Big) - [D\fb(\ab)]\hb}{|\hb|} = \bfzero. \]

Think of this as saying that \[ \fb(\ab+\hb) = \fb(\ab) + [D\fb(\ab)]\hb + O(|\hb|^2), \] meaning that \(\fb\) is well-approximated by a linear function near to \(\ab\).

To see the form of \(D\fb(\ab)\), start with some special cases.

\((i)\;\;n=m=1\). Then \(\fb\) is a single-variable function \(f(x)\) if \(Df(a)\in\Real\) exists it must satisfy \[ \lim_{h\to 0}\frac{\Big(f(a+h)-f(a)\Big) - hDf(a)}{h} = 0\quad \implies Df(a)=\lim_{h\to 0}\frac{f(a+h)-f(a)}{h} = f'(a). \]

\((ii)\;\;m=1\). Then \(\fb\) is a scalar field \(f(\xb)\), so \(Df(\ab)\) is a \(1\times n\) matrix (a row vector). If it exists it must satisfy \[ [Df(\ab)]\hb = (v_1 \ldots v_n)\begin{pmatrix}h_1\\\vdots\\h_n\end{pmatrix} = v_1h_1 + v_2h_2 + \ldots + v_nh_n. \] To find the \(x\)-component, \(v_1\), let \(\hb=h\eb_1\) so that \[ \lim_{h\to 0}\frac{\Big(f(\ab+h\eb_1) - f(\ab)\Big) - v_1h}{h} = 0 \quad \implies v_1 = \lim_{h\to 0}\frac{f(\ab+h\eb_1) - f(\ab)}{h} = \left.\ddy{f}{x}\right|_{\ab}. \] It follows that when \(f\) is a scalar field, the Jacobian matrix reduces to \[ Df(\ab) = \nabla f\Big|_{\ab} = \left(\left.\ddy{f}{x_1}\right|_{\ab} \ldots \left.\ddy{f}{x_n}\right|_{\ab} \right). \]

For a general function \(\fb:\Real^n\to\Real^m\), the Jacobian matrix has the form \[ D\fb(\ab) = \begin{pmatrix} \displaystyle\left.\ddy{f_1}{x_1}\right|_{\ab} & \displaystyle\left.\ddy{f_1}{x_2}\right|_{\ab} & \ldots & \displaystyle\left.\ddy{f_1}{x_n}\right|_{\ab}\\ \displaystyle\left.\ddy{f_2}{x_1}\right|_{\ab} & \displaystyle\left.\ddy{f_2}{x_2}\right|_{\ab} & \ldots & \displaystyle\left.\ddy{f_2}{x_n}\right|_{\ab}\\ \vdots & \vdots & \ddots & \vdots\\ \displaystyle\left.\ddy{f_m}{x_1}\right|_{\ab} & \displaystyle\left.\ddy{f_m}{x_2}\right|_{\ab} & \ldots & \displaystyle\left.\ddy{f_m}{x_n}\right|_{\ab} \end{pmatrix}. \]

A vector field \(\fb(\xb)\) on \(\Real^3\) corresponds to \(n=m=3\). The divergence is the trace of \(Df(\ab)\), while the curl is made up of the off-diagonal entries.

Warning: except for \(n=m=1\), the fact that the partial derivatives exist does not imply that a scalar field \(f(\xb)\) is differentiable.

From the definition of partial differentiation, we have \[ \left.\ddy{f}{x}\right|_{(0,0)} = \lim_{h\to 0}\frac{f(h,0)-f(0,0)}{h} = 0, \qquad \left.\ddy{f}{y}\right|_{(0,0)} = \lim_{h\to 0}\frac{f(0,h)-f(0,0)}{h} = 0. \] So both partial derivatives exist at \((0,0)\), and \(Df(0,0) = (0,0)\). Here is a plot of \(z=f(x,y)\):

But suppose we take \(\hb\) to be some other direction, say \(\hb=h(\eb_1+\eb_2)\). Then \[ \lim_{|\hb|\to 0}\frac{\Big(f(\bfzero + \hb) - f(\bfzero)\Big)- \bfzero\cdot\hb}{|\hb|} = \lim_{h\to 0}\frac{(f(\hb) - \bfzero)}{\sqrt{2}|h|} = \lim_{h\to 0}\frac{h^{2/3}}{\sqrt{2}|h|} = \infty. \] So \(f\) is not differentiable at \((0,0)\), even though the partial derivatives exist there.

A function \(\fb : \Real^n\to\Real^m\) is continuously differentiable (or a \(C^1\) function) at \(\ab\in\Real^n\) if all of its partial derivatives exist and are continuous at \(\ab\).

Theorem 5.1 If a function \(\fb:\Real^n\to\Real^m\) is continuously differentiable in a neighbourhood of \(\ab\in\Real^n\), then \(\fb\) is differentiable at \(\ab\).

Proof. Omitted.


We have \[ \textrm{continuous partials} \implies \textrm{differentiable} \implies \textrm{partials exist}, \] but neither of the converse statements are true.

(i) It is continuously differentiable where \(\displaystyle\ddy{f}{x}, \ddy{f}{y}\) are continuous.

For \(x>2\), we have \(f(x,y)=y(x-2)\) so \(\displaystyle\ddy{f}{x}=y, \ddy{f}{y}=x-2.\)

For \(x<2\), we have \(f(x,y)=-y(x-2)\) so \(\displaystyle\ddy{f}{x}=-y, \ddy{f}{y}=-x+2.\)

For \(x=2\), we need to be more careful. We have \[ \left.\ddy{f}{x}\right|_{x=2} = \lim_{h\to 0}\frac{f(2+h,y) - f(2,y)}{h} = \lim_{h\to 0}\frac{y|h|}{h}. \] For \(y\neq 0\), the limits for \(h<0\) and \(h>0\) are different, so the limit does not exist and this partial derivative does not exist for \(x=2\), \(y\neq 0\). If \(y=0\) then the limit exists and is equal to \(0\). However, this partial derivative is not continuous because it does not exist anywhere except a single point on the line \(x=2\).

Therefore, \(f\) is continuously differentiable everywhere except the line \(x=2\).

(ii) By Theorem 5.1, we immediately know that \(f\) is differentiable everywhere except on the line \(x=2\). For \(x=2\) and \(y\neq 0\), it cannot be differentiable because \(\nabla f\) does not exist. But it might be differentiable at \((2,0)\). Note that \[ \left.\ddy{f}{y}\right|_{(2,0)} = \lim_{h\to 0}\frac{f(2,y+h)-f(2,y)}{h} = 0, \quad \textrm{so} \quad Df(2,0) = \bfzero. \] Writing \(\hb=h_1\eb_1 + h_2\eb_2\), we have \[ \lim_{|\hb|\to 0}\frac{\Big(f(2\eb_1+\hb) - f(2\eb_1)\Big)-\bfzero\cdot\hb}{|\hb|} = \lim_{|\hb|\to 0}\frac{h_2|h_1|}{|\hb|}. \] But \[ \left|\frac{h_2|h_1|}{|\hb|}\right| \leq \left|\frac{|\hb|^2}{|\hb|}\right| = |\hb| \quad \implies \lim_{|\hb|\to 0}\frac{h_2|h_1|}{|\hb|} = 0. \] Therefore \(f\) is also differentiable at \((2,0)\).

5.3 Inverse functions

A function \(\fb:\Real^n\to\Real^n\) has an inverse \(\fb^{-1}\) if and only if \[ \fb^{-1}\big(\fb(\xb)\big) = \xb \quad \textrm{and} \quad \fb\big(\fb^{-1}(\ub)\big)=\ub \quad \textrm{for all $\xb, \ub$}. \]

Note that we need \(m=n\) here.

The inverse exists if and only if \(A\) is invertible, in which case \(\fb^{-1}(\xb)=A^{-1}\xb\).

If \(\fb(\xb)\) is nonlinear, then finding a global inverse is hard. However, if \(\fb\) is differentiable, then \[ \fb(\ab+\hb) = \fb(\ab) + [D\fb(\ab)]\hb + O(|\hb|^2), \] so \(\fb\) looks like a linear function near to \(\ab\), with matrix \(D\fb(\ab)\). It follows that we can find a local inverse if and only if \(D\fb(\ab)\) is invertible.

Theorem 5.2 (Inverse Function Theorem) A continuously differentiable function \(\fb:\Real^n\to\Real^n\) has a local differentiable inverse near \(\ab\) if \(\det[D\fb(\ab)]\neq 0\).

Proof. Omitted (need to justify that the higher-order terms behave themselves!). Note that the derivative of \(\fb^{-1}\) will be the inverse matrix \([D\fb(\ab)]^{-1}\).


For \(n=1\), Theorem Theorem 5.2 says that a function \(f(x)\) has a differentiable inverse at \(a\) if \(f'(a)\neq 0.\) This makes sense graphically if you recall that \(y=f^{-1}(x)\) is the reflection of \(y=f(x)\) in the line \(y=x\).

Reflection in y=x.

A continuous inverse function \(f^{-1}(x)\) exists everywhere in this example, but \(f^{-1}\) is not differentiable at the point where \(f'(x)=0\).

You may also recall from Calculus I that where \(f'(x)\neq 0\), we have \(\displaystyle\frac{\mathrm{d}}{\mathrm{d}x}(f^{-1}) = \frac{1}{f'(x)}\).

Note that this is just the polar coordinate parametrisation, \(\xb(r,\theta)\), mapping a semi-infinite region in the \(r\theta\)-plane to the \(xy\)-plane:

The polar coordinate mapping.

The Jacobian matrix is \[ D\fb(r,\theta) = \begin{pmatrix} \displaystyle\ddy{f_1}{r} & \displaystyle\ddy{f_1}{\theta}\\ \displaystyle\ddy{f_2}{r} & \displaystyle\ddy{f_2}{\theta} \end{pmatrix} = \begin{pmatrix} \cos\theta & -r\sin\theta\\ \sin\theta & r\cos\theta \end{pmatrix}. \] To apply Theorem 5.2, calculate the determinant: \[ \det[D\fb(r,\theta)] = \begin{vmatrix} \cos\theta & -r\sin\theta\\ \sin\theta & r\cos\theta \end{vmatrix} = r. \] This is non-zero for \(r\neq 0\), so Theorem 5.2 guarantees that there is a differentiable inverse mapping \((x,y)\to(r,\theta)\) everywhere except at the origin. Graphically, that is the point where the grid is singular and \(\theta\) is not defined.

Note that Theorem 5.2 doesn’t give us a formula for \(\fb^{-1}\). However, we do know its derivative: \[ D\fb^{-1}(x,y) = [D\fb(r,\theta)]^{-1} = \begin{pmatrix} \cos\theta & -r\sin\theta\\ \sin\theta & r\cos\theta \end{pmatrix}^{-1} = \frac{1}{r}\begin{pmatrix} r\cos\theta & r\sin\theta\\ -\sin\theta & \cos\theta \end{pmatrix}. \] Therefore \[ \ddy{r}{x} = \cos\theta = \frac{x}{r}, \quad \ddy{r}{y} = \sin\theta = \frac{y}{r}, \quad \ddy{\theta}{x} = -\frac{\sin\theta}{r} = -\frac{y}{r^2}, \quad \ddy{\theta}{y} = \frac{\cos\theta}{r} = \frac{x}{r^2}. \] In fact, the inverse function is \(\displaystyle \fb^{-1}(x,y) = (r,\theta) = \left(\sqrt{x^2+y^2}, \arctan\left(\frac{y}{x}\right)\right).\)

[You can check that the derivatives of this function agree with those we just calculated.]

A function \(\fb:\Real^n\to\Real^n\) is called orientation preserving if \(\det[D\fb(\xb)]>0\), or orientation reversing if \(\det[D\fb(\xb)]<0\).

We calculated \(\det[D\fb(r,\theta)]=r>0\), so \(\fb\) is orientation-preserving. What this means is that, for example, a square in the \(r\theta\)-plane maintains the order of its vertices after the mapping:

Illustration that mapping is orientation-preserving.

Let \(U,V\) be two subsets of \(\Real^n\). A function \(\fb:U\to V\) is called a diffeomorphism if (i) it is bijective and (ii) its inverse function \(\fb^{-1}:V\to U\) is also differentiable. The domains \(U,V\) are said to be diffeomorphic.

Schematic of a diffeomorphism.

Think of a diffeomorphism as a “non-folding distortion” of a grid on \(U\). In fluid mechanics, for example, this represents the flow of a fluid – reversing the motion moves all of the fluid particles back to where they started.

Here is a diffeomorphism of the unit disk (to itself) given by \[ \fb(r,\theta)= \Big(r^2 \cos\Big[\theta + 2\pi r^3(1-r)\Big], r^2\sin\Big[\theta + 2\pi r^3(1-r)\Big]\Big). \]

A diffeomorphism of the unit disk.

The set of all diffeomorphisms from some domain \(U\) to itself form a group, called \(\mathrm{Diff}(U)\). Those that leave points on the boundary unchanged (as in the unit disk example just above) form a subgroup.

5.4 Implicit functions

Suppose a curve is given implicitly by \(f(x,y)=0\). When can we solve for \(y=g(x)\)?

If we assume that \(y=g(x)\), then \[\begin{align*} \frac{\mathrm{d}}{\mathrm{d}x}f\big(x,g(x)\big) = 0 \quad &\iff \ddy{f}{x} + g'(x)\ddy{f}{y} = 0 \quad \textrm{[by the Chain Rule]}\\ &\iff g'(x) = -\displaystyle\ddy{f}{x}\bigg/\displaystyle\ddy{f}{y}. \end{align*}\] So there is a problem when \(\displaystyle\ddy{f}{y}=0\).

Let \(f(x,y)=x^2+y^2-1\). Then \(\displaystyle\ddy{f}{y} = 2y\). So we can describe the circle by a single-valued, differentiable function \(y=g(x)\) only for \(y\neq 0\).

Circle.

The derivative of the implicit function is \[ g'(x) = -\ddy{f}{x}\bigg/\ddy{f}{y} = -\frac{2x}{2y} = -\frac{x}{y}. \]

[For \(y>0\) we have \(g(x)=\sqrt{1-x^2}\), whereas in \(y<0\) we have \(g(x)=-\sqrt{1-x^2}\) (and you can check that \(g'(x)=-x/y\) in either case). But there is no differentiable function \(g(x)\) that works around either \((-1,0)\) or \((1,0)\).]

This idea generalises.

Theorem 5.3 (Implicit Function Theorem) Given a continuously differentiable function \(\fb:\Real^{n+\textcolor{red}{m}}\to\Real^\textcolor{red}{m}\) with \(\xb\in\Real^n\) and \(\yb\in\Real^\textcolor{red}{m}\), solutions to \(\fb(\xb,\yb)=\bfzero\) near a point \((\xb,\yb)=\ab\) can be realised as an implicit function \[ \yb = \gb(\xb) \quad \textrm{if} \quad \det[D_{\yb}\fb(\ab)] \neq 0. \] Moreover, this local solution is unique and differentiable with \(D_{\xb}\gb(\ab) = -[D_\yb\fb(\ab)]^{-1}D_{\xb}\fb(\ab).\)

Note: the meaning of these “partial” Jacobian matrices means that only the columns corresponding to \(\xb\) or \(\yb\) are included – for example, \[ D_{\yb}\fb(\ab) = \begin{pmatrix} \displaystyle\left.\ddy{f_{1}}{y_1}\right|_{\ab} & \ldots & \displaystyle\left.\ddy{f_{1}}{y_m}\right|_{\ab}\\ \vdots & \ddots & \vdots\\ \displaystyle\left.\ddy{f_{m}}{y_1}\right|_{\ab} & \ldots & \displaystyle\left.\ddy{f_{m}}{y_m}\right|_{\ab} \end{pmatrix}. \]

Although it tells you the derivative \(D_{\xb}\gb(\ab)\), Theorem 5.3 does not tell you the value of the function \(\gb(\ab)\) itself.

Proof. Omitted.


Here we have \[ \xb=(x,y), \quad \yb=(z), \quad f(\xb,\yb) = (x^2 + y^2 + z^2-1). \] So \(n=2\) and \(m=1\). Therefore \(D_{\yb}f(\ab)\) is a \(1\times 1\) matrix (a scalar!) given by \[ D_{\yb}f(\ab) = \left.\ddy{f}{z}\right|_{\ab} = 2z \quad \implies \det{D_{\yb}f(\ab)} = 2z. \] So by Theorem 5.3, we can write the equation for the sphere as \(z=g(x,y)\) provided \(z\neq 0\). (When \(z=0\), Theorem 5.3 is inconclusive.)

For \(z\neq 0\), Theorem 5.3 also gives us the derivative of \(g(x,y)\), which is a \(1\times 2\) matrix \[ D_{\xb}g(\ab) = -[D_{\yb}f(\ab)]^{-1}D_{\xb}f(\ab) \quad \implies \begin{pmatrix}\displaystyle\ddy{g}{x} & \displaystyle\ddy{g}{y}\end{pmatrix} = -\frac{1}{2z}\begin{pmatrix} 2x & 2y \end{pmatrix} = \begin{pmatrix}\displaystyle-\frac{x}{z} & \displaystyle-\frac{y}{z}\end{pmatrix}. \]

[If we put \(y=0\) then this reduces to the circle example with \(x^2+z^2=1\). ]

Since there are two equations and three unknowns, this represents some curve in \((u,v,w)\) space. Let \[ \xb = (w), \quad \yb=\begin{pmatrix}u\\ v\end{pmatrix}, \quad \fb(\xb,\yb) = \begin{pmatrix}uv^2 + v^2w^3 + u^5w^4-1\\ u^2w+u^2v^3+v^4w^5+1\end{pmatrix}, \] so \(n=1\) and \(m=2\). Then \(D\fb_{\yb}(\ab)\) is the \(2\times 2\) matrix \[ D\fb_{\yb}(\ab) = \begin{pmatrix} \displaystyle\ddy{f_1}{u} & \displaystyle\ddy{f_1}{v}\\ \displaystyle\ddy{f_2}{u} & \displaystyle\ddy{f_2}{v} \end{pmatrix}_{\ab} = \begin{pmatrix} v^2 + 5u^4w^4 & 2uv + 2vw^3\\ 2uw +2uv^3 & 3u^2v^2 + 4v^3w^5 \end{pmatrix}_{(1,1,-1)} = \begin{pmatrix} 6 & 0\\ 0 & -1 \end{pmatrix}. \] This has determinant \(\displaystyle\det[D\fb_{\yb}(\ab)] = -6 \neq 0\) so by Theorem 5.3 we can write \(\yb=\gb(\xb)\), meaning \(u=g_1(w)\) and \(v=g_2(w)\).

Although it doesn’t give us a full expression for \(\gb(\xb)\) itself, Theorem 5.3 does let us find its derivative at \(\ab\). Note that \[ D_{\xb}\fb(\ab) = \begin{pmatrix} \displaystyle\ddy{f_1}{w}\\ \displaystyle\ddy{f_2}{w}\\ \end{pmatrix}_{\ab} = \begin{pmatrix} 3v^2w^2 + 4u^5w^3\\ u^2 + 5v^4w^4 \end{pmatrix}_{(1,1,-1)} = \begin{pmatrix} -1\\ 6 \end{pmatrix}, \] so \[ D_{\xb}\gb(\ab) = -\begin{pmatrix} 6 & 0\\ 0 & -1 \end{pmatrix}^{-1}\begin{pmatrix}-1\\ 6\end{pmatrix} = \frac{1}{6}\begin{pmatrix} -1 & 0\\ 0 & 6 \end{pmatrix}\begin{pmatrix}-1\\ 6\end{pmatrix} = \begin{pmatrix} \frac16\\ 6 \end{pmatrix}. \] This means that for \((u,v,w)\) near to \((1,1,-1)\) we have \[ \begin{pmatrix} u\\ v \end{pmatrix} \approx\begin{pmatrix} g_1(-1) + \frac16\big(w-(-1)\big)\\ g_2(-1) + 6\big(w-(-1)\big) \end{pmatrix} = \begin{pmatrix} 1 + \frac16(w+1)\\ 1 + 6(w+1) \end{pmatrix}. \]

The idea of linearisation – approximating a nonlinear function by a linear function using its derivative – will be invaluable in future courses, e.g. Mathematical Biology III or Dynamical Systems III (among others).