2 The action principle

We will start with the Lagrangian formulation. The underlying physical principle behind this formulation can be traced back to the idea that for some physical processes, the natural answer to the question “what is the trajectory that a particle follows” is something like “the most efficient one”. Our goal in this section is to understand, in a precise sense, how to characterize this notion of efficiency.

A fundamental example is the free particle in flat space: its motion is along a straight path. What makes the straight path special? The answer is well known: the straight path is the one that minimizes the distance travelled between the origin and the destination of a path. This is equivalent to saying that the motion of the particle is along a trajectory that, assuming constant speed for the particle, minimizes the total time travelled.

This second formulation connects with Fermat’s principle, which states that the path that a ray of light takes, when moving on a medium, is the one that minimizes the time spent by the light beam. Or more precisely, one should impose that the time is stationary (we will define this precisely below) under small variations of the path.

These two examples suggest a natural question: is there always some quantity, in problems of classical mechanics, that is minimized along physical motion? The answer is that there is indeed such a quantity, known as the action. We will now explain how to determine the equations of motion from the action, and then determine the form of the action that reproduces classical Newtonian physics.

Our basic tool will be the “Calculus of variations”, which we now describe.

2.1 Calculus of variations

Let us start by reviewing how to find the maxima and minima of a function $f(s)\colon \mathbb{R}\to\mathbb{R}$ . As you will recall, this can be done by solving the equation $\frac{df}{ds} = 0$ as a function of $s$ . As an example, if our function is $f(s)=\frac{1}{2}s^2-s$ we have $\frac{df}{ds} = s-1$ so the function has an extremum (a minimum, in this case) at $s=1$ . An alternative way of formulating the same condition makes use of the definition of derivative as encoding the change in the function under small changes in $s$ . For a small $\delta s\in\mathbb{R}$ , we have $f(s+\delta s) = f(s) + \frac{df(s)}{ds}\delta s + R(s, \delta s)$ where $R(s, \delta s)$ is an error term. It is convenient to introduce the notation $\delta f :=f(s+\delta s) - f(s)$ so the statement above becomes $\delta f = \frac{df(s)}{ds}\delta s + R(s, \delta s)\, .$ We note that the usual definition of the derivative implies $\lim_{\delta s\to 0} \frac{R(s, \delta s)}{\delta s} = 0\, .$ In these cases we say that “ $\delta f$ vanishes to first order in $\delta s$ ”. The functions that we will study will almost always admit a well-behaved Taylor expansion, so this result implies that $R(s, \delta s)$ is at least of quadratic order in $\delta s$ . Henceforth we will encode this vanishing to first order by writing $\mathcal{O}((\delta s)^2)$ instead of $R(s, \delta s)$ .

So, finally, we can say that the extrema of $f(s)$ are located at the points where $\delta f = \mathcal{O}((\delta s)^2)\, .$

The same reasoning can be applied in the case of functions of multiple variables. Consider a function $f(s_1, \ldots, s_N)\colon \mathbb{R}^N\to \mathbb{R}$ , and introduce a small displacement $s_i\to s_i+\delta s_i$ . In this case the partial derivatives $\partial f/\partial s_i$ are defined by $\delta f = \sum_{i=1}^N \frac{\partial f}{\partial s_i}\delta s_i + \mathcal{O}(\delta s^2)$ where the error term includes terms vanishing faster than $\delta s_i$ (so terms of the form $\delta s_1^2$ , $\delta s_1\delta s_2,\ldots$ ). Stationary points¹ of $f$ are located wherever $\delta f$ vanishes to first order in $\delta s_i$ .

In fact, we need to go one step further, and work with functionals: these are maps from functions to $\mathbb{R}$ . One (heuristic, but sometimes useful) way of thinking of them is as the limit of the previous multi-variate case when the number of variables $N$ goes to infinity. From instance, we could have a functional $S[y(t)]$ defined by $S[y(t)] = \int_{a}^b y(t)^2 \, dt$ for some fixed choice of $(a,b)$ . I emphasize that one should think of $S$ as the analogue of $f$ above, and the different functions $y(t)$ as the “points” in the domain of this functional.

We want to define a meaning for a function $y(t)$ to give an extremal value for the functional $S$ . In analogy with what happened in the finite dimensional case above, we can study the variation of $S$ as we displace $y(t)$ slightly. We need to be a bit careful when specifying which class of functions $y(t)$ we are going to include in our extremization problem. In the case of interest to us, we will extremize over the set of smooth² functions $y(t)$ with fixed values at the endpoints $a$ and $b$ . That is, we fix $y(a)=y_a$ and $y(b)=y_b$ , for some fixed values of $y_a$ and $y_b$ .

We say that a function $y(t)$ is stationary (for the functional $S$ ) if $\left.\frac{dS[y(t)+\epsilon z(t)]}{d\epsilon}\right|_{\epsilon=0}=0$ for all smooth $z(t)$ such that $z(a)=z(b)=0$ .

This definition encodes the idea that the path $y(t)$ extremises the action $S$ in the space of paths starting at $y_a$ and ending at $y_b$ . To see this, think of the function $z(t)$ as an arbitrary choice of direction in the space of deformations. The condition is then saying that $S[y(t)]$ is extremised along this direction. Since $z(t)$ is arbitrary, we have that the condition is satisfied in every direction in function space. It might help to compare with how you could impose that an ordinary function $f({\mathbf x})$ depending on a vector $\mathbf x$ has an extremum at $\mathbf{x}_0$ . You could impose $\left.\frac{df(\mathbf{x}_0 +\epsilon \mathbf{z})}{d\epsilon}\right|_{\epsilon = 0} = \left[\frac{d}{d\epsilon}\left(f(\mathbf{x}_0) + \epsilon \sum_{i} z_i \left.\frac{\partial f(\mathbf{x})}{\partial x_i}\right|_{\mathbf{x}=\mathbf{x}_0} + \mathcal{O}(\epsilon^2)\right)\right]_{\epsilon=0} = \sum_{i=1}^n z_i \left.\frac{\partial f(\mathbf{x})}{\partial x_i}\right|_{\mathbf{x}=\mathbf{x}_0} = 0$ for every choice of vector $\mathbf{z}$ . This is equivalent to imposing the familiar condition $\left.\frac{\partial f(\mathbf{x})}{\partial x_i}\right|_{\mathbf{x}=\mathbf{x}_0} = 0$ for all $i$ , which ensures the existence of an extremum (or saddle point) at $\mathbf{x}=\mathbf{x}_0$ .

Consider the Taylor expansion in $\epsilon$ (which is a constant) of $S[y(t)+\epsilon z(t)]$ : $S[y(t)+\epsilon z(t)] = S[y(t)] + \epsilon \left.\frac{d S[y(t)+\epsilon z(t)]}{d \epsilon}\right|_{\epsilon=0} + \frac{1}{2}\epsilon^2 \left.\frac{d^2 S[y(t)+\epsilon z(t)]}{d\epsilon^2}\right|_{\epsilon=0} + \ldots$ The condition for $y(t)$ to be stationary is that the term proportional to $\epsilon$ vanishes: $\delta S :=S[y(t)+\epsilon z(t)] - S[y(t)] = \mathcal{O}(\epsilon^2)\, .$

It is useful to think of the combination $\epsilon z(t)$ as a small variation of $y(t)$ , which we denote $\delta y(t):=\epsilon z(t)$ . We define $\mathcal{O}((\delta y(t))^n)$ to mean simply $\mathcal{O}(\epsilon^n)$ . In particular, we can rewrite the stationary condition as $\delta S = \mathcal{O}((\delta y(t))^2)\, .$

If you are ever confused about the expansions in $\delta y(t)$ below, you can replace $\delta y(t)$ with $\epsilon z(t)$ , and expand in the constant $\epsilon$ . For instance, consider the integral $\int g(t) (\delta y(t))^n dt$ For any positive integer $n$ and any function $g(t)$ . I claim that this is $\mathcal{O}((\delta y(t))^n)$ . The proof is as follows: replacing $\delta y(t)$ with $\epsilon z(t)$ we have $\int g(t) \epsilon^n z(t)^n dt = \epsilon^n \int g(t) z(t)^n dt\, .$ (Here we have used that $\epsilon$ is a constant.) In our $\mathcal{O}((\delta y(t)^n))$ notation, this means that $\int \mathcal{O}((\delta y(t))^n) dt = \mathcal{O}((\delta y(t))^n)\, .$ That is, integration does not change the order in $\delta y(t)$ (which, once more, when talking about action functionals I define to be really just the order in $\epsilon$ ).

Because our interest in dynamical problems, we will often refer to the functions $y(t)$ as paths, so that the conditions above define what an “stationary path” is.

We are now in a position to introduce the action principle. Assume that we have an action functional (or simply action) $S\colon \{\mathrm{functions}\}\to \mathbb{R}$ , which takes functions, and generates a real number. In the Lagrangian formalism all of the physical content of the theory can be summarized in the choice of action functional.

For now we also assume that we have a particle moving in one dimension, and we want to determine its motion. Its trajectory is given by a function $x(t)$ , with $t$ the time coordinate. For many physical problems equations of motion are second order in $x(t)$ , so we need data to fix two integration constants. In the Lagrangian formalism these are given by fixing the initial and final positions. That is, we will assume that we know the initial position $x(t_0)$ of the particle at time $t_0$ , and its final position $x(t_1)$ at time $t_1$ .

The action principle³ then states that for arbitrary smooth small deformations $\delta x(t)$ around the “true” path $x(t)$ (that is, the path that the particle will actually follow) we have $\delta S :=S[x+\delta x] - S[x] = \mathcal{O}((\delta x)^2)\, .$

Or in other words:

Action principle:
It is possible to choose an action functional $S[x(t)]$ such that the paths described by physical particles are stationary paths of $S$ .

In a moment we will need an important result known as the fundamental lemma of the calculus of variations. It goes as follows:

[Fundamental lemma of the calculus of variations] Consider a function $f(x)$ continuous in the interval $[a,b]$ (we assume $a<b$ ) such that $\int_a^b f(x) g(x) \, dx = 0$ for all smooth functions $g(x)$ in $[a,b]$ such that $g(a)=g(b)=0$ . Then $f(x)=0$ for all $x\in (a,b)$ .

I will prove the result by contradiction. Assume that there is a point $p\in (a,b)$ such that $f(p) > 0$ (the case $f(p) <0$ can be proven analogously). By continuity of $f(x)$ , there will be some non-vanishing interval $[p_0,p_1]$ where $f(x)>0$ . Construct $g(x) = \begin{cases} \nu(x-p_0)\nu(p_1-x) & \text{if } x\in (p_0,p_1)\\ 0 & \text{otherwise.} \end{cases}$ with $\nu(x)=\exp(-1/x)$ . It is an interesting exercise in first year calculus to prove that this function is smooth everywhere, including at $p_0$ and $p_1$ (it is an example of “bump functions”, useful in many domains). Clearly $f(x)g(x)>0$ for $x\in(p_0,p_1)$ , and vanishes otherwise. This implies that $\int_a^b f(x) g(x)\, dx = \int_{p_0}^{p_1} f(x) g(x) \, dx > 0$ which is a contradiction.

Now, for the systems that we will study during this term, it will be the case that $S$ can be expressed in a particularly nice way as the time integral of a Lagrangian. That is, we will have $S[x] = \int_{t_0}^{t_1} \! dt \, L(x(t), \dot{x}(t))$ for some function $L(a,b)$ of two real variables, where $\dot{x}(t):=\frac{dx}{dt}$ .

Whenever a Lagrangian exists, the variational principle together with the fundamental lemma of the calculus of variations leads to a set of differential equations that determine $x(t)$ . The argument is as follows. If we Taylor expand the perturbed Lagrangian to first order in $\delta x(t)$ we get⁴ $L(x(t)+\delta x(t), \dot{x}(t)+\delta \dot{x}(t)) = L(x(t),\dot{x}(t)) + \frac{\partial L}{\partial x}\delta x(t) + \frac{\partial L}{\partial\dot x} \delta \dot{x}(t) + \ldots$ Putting this expansion of the Lagrangian into the variation of the action we have $\begin{equation} \begin{split} \delta S &= \int_{t_0}^{t_1} \! dt \, L(x(t)+\delta x(t), \dot{x}(t)+\delta \dot{x}(t)) - L(x(t), \dot{x}(t))\\ & = \int_{t_0}^{t_1}\! dt\, \left(\frac{\partial L}{\partial x}\delta x(t) + \frac{\partial L}{\partial \dot{x}}\delta \dot{x}(t)\right) \end{split} \label{eq:Lagrangian-variation} \end{equation}$ where we have omitted terms of second order or higher in $\delta x$ .⁵ For notational simplicity I will often write $\partial L/\partial x$ instead of the more precise but much more cumbersome $\frac{\partial L(r, s)}{\partial r}\biggr|_{(r,s)=(x(t),\dot{x}(t))}$ where $(r,s)$ are names for the two arguments of the Lagrangian $L$ (which are conventionally, but somewhat confusingly, also named $x$ and $\dot{x}$ , a convention that I will follow most of the time… but here I want to be as clear as possible about what I mean). Similarly $\frac{\partial L}{\partial \dot{x}}:=\frac{\partial L(r, s)}{\partial s}\biggr|_{(r,s)=(x(t),\dot{x}(t))}\, .$

We proceed by noting that $\delta{\dot x(t)}=\frac{d}{dt}(\delta x(t))$ , so we can write the above as $\delta S = \int_{t_0}^{t_1}\! dt\, \left(\frac{\partial L}{\partial x}\delta x(t) + \frac{\partial L}{\partial \dot{x}}\frac{d}{dt}\delta x(t)\right)\, .$ Integration by parts of the second term⁶ now allows us to rewrite this as $\delta S = \int_{t_0}^{t_1}\! dt\, \left[\left(\frac{\partial L}{\partial x} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{x}}\right)\right)\delta x(t) + \frac{d}{dt}\left(\delta x(t)\frac{\partial L}{\partial \dot{x}}\right)\right]\, .$ The last term is a total derivative, so we can integrate it trivially to give: $\delta S = \left[\delta x(t)\frac{\partial L}{\partial \dot{x}}\right]_{t_0}^{t_1} + \int_{t_0}^{t_1}\! dt\, \left(\frac{\partial L}{\partial x} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{x}}\right)\right)\delta x(t)\, .$ Now we have that $\delta x(t_0)=\delta x(t_1)=0$ , as the paths that we consider all start and end on the same positions. This implies that the first term vanishes, so $\delta S = \int_{t_0}^{t_1}\! dt\, \left(\frac{\partial L}{\partial x} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{x}}\right)\right)\delta x(t)\, .$

Recall that the action principle demands that this variation cancels (to first order in $\delta x(t)$ , i.e. ignoring possible terms that we have not written) for arbitrary $\delta x(t)$ . By the fundamental lemma of the calculus of variations, the only way that this could possibly be true is if the function multiplying $\delta x(t)$ in the integral vanishes:

equation - () = 0 .

This is known as the Euler-Lagrange equation, in the case of one-dimensional problems.

There is a somewhat subtle point in the Lagrangian formulation that I want to make explicit. Note that the Lagrangian $L$ is an ordinary function of two parameters, and it knows nothing about paths. (In general it is a function of $2N$ parameters, with $N$ the number of “generalised coordinates” that we need to describe the system, see below.) Let me emphasize this by writing $L(r,s)$ . When using the Lagrangian function to construct the action we evaluated the Lagrangian function at $(r,s)=(x(t), \dot{x}(t))$ at each instant in time, but it is important to keep in mind that the Lagrangian itself treats $r$ and $s$ as independent variables: they are simply the two arguments to the function.

In general, if we want to study how this function changes under small displacements of $r$ and $s$ we would use the chain rule: $L(r+\delta r, s+\delta s) = L(r,s) + \frac{\partial L}{\partial r}\delta r + \frac{\partial L}{\partial s}\delta s + \ldots$ where the dots denote terms of higher order in $\delta r$ and $\delta s$ . This is what we did above in $\eqref{eq:Lagrangian-variation}$ , again with $(r,s)=(x(t), \dot{x}(t))$ .

What this all means is that the partial derivatives appearing in the Euler-Lagrange equations treat the first and second arguments of the Lagrangian function independently, leading to the somewhat funny-looking rules: $\begin{equation} \label{eq:Lagrangian-x-xdot-partials} \frac{\partial x}{\partial \dot{x}} = \frac{\partial \dot{x}}{\partial x} = 0\, . \end{equation}$ This would probably be a little clearer if we used a different notation for $\dot{x}$ (such as $v$ ) when writing Lagrangians, to emphasize that in the Lagrangian formalism $\dot{x}$ should be treated as a variable which is entirely independent of $x$ itself. But I will stick to the standard (if somewhat puzzling at first) notation, with the understanding that in the Lagrangian formalism one should impose $\eqref{eq:Lagrangian-x-xdot-partials}$ .

This also makes clear that $\eqref{eq:Lagrangian-x-xdot-partials}$ is not something you should generically expect to hold outside the Lagrangian formalism. And indeed, when we study the Hamiltonian framework below this rule will be replaced by a different one.

2.2 Configuration space and generalized coordinates

We now want to extend the Lagrangian formalism to deal with more general situations, beyond the rather special case of a particle moving in one dimension. We start with

The set of all possible (in principle) instantaneous configurations for a given physical system is known as configuration space. We will denote it by $\mathcal{C}$ .

It is important to note that this includes positions, but it does not include velocities. One informal way of thinking about configuration space is as the space of all distinct photographs one can take of the system, at least in principle.

Additionally, in constructing configuration space we make no statement about dynamics: we need to construct configuration space before we construct a Lagrangian, which tells us about the dynamics in this configuration space.

A particle moving in $\mathbb{R}^d$ (that is, $d$ -dimensional euclidean space) has configuration space $\mathbb{R}^d$ . We discussed the $d=1$ example above, where we had a particle moving in the one dimensional line $\mathbb{R}$ , which we parametrized by the coordinate $x$ .

$N$ particles moving freely in the $\mathbb{R}^d$ have configuration space $(\mathbb{R}^{d})^N=\mathbb{R}^{dN}$ . (Assuming that we can always distinguish the particles. I leave it as an amusing exercise to think what is the configuration space if you cannot distinguish the particles.)

$N$ electrically charged particles moving in $\mathbb{R}^d$ : since particles are electrically charged they repel or attract. But this is a dynamical question, and configuration space is insensitive to such matters, it is still $\mathbb{R}^{dN}$ . One way to see this is that you can always place a set of $N$ particles at any desired positions in $\mathbb{R}^d$ (barring some singular points where particles overlap, so the system has infinite energy and is unphysical, but we can ignore such subtleties here). After being released, the particles will subsequently move in a manner described by the Euler-Lagrange equations, but any initial choice is permitted.

Two particles joined by a rigid rod of length $\ell$ in $d$ -dimensions: without the rod the configuration space is $\mathbb{R}^{2d}$ , but the rod introduces the constraint that the particles are at fixed distance $\ell$ from each other. This can be written as $||\vec{x}_1 - \vec{x}_2||^2 = \ell^2$ where $\vec{x}_1$ and $\vec{x}_2$ are the positions of the two particles in $\mathbb{R}^d$ . The configuration space is $2d-1$ dimensional, given by the surface defined by this equation inside $\mathbb{R}^{2d}$ .

Finally, consider a rigid body in $\mathbb{R}^3$ , a desk for instance. We can view it as formed by $~10^{27}$ atoms, joined by atomic forces. But for the purposes of classical mechanics we certainly do not care about the motion of the individual atoms (we are doing classical mechanics, not quantum mechanics, so we would get the answer wrong anyway, even if we could compute it!). Rather, for classical dynamics we can think about the classical configurations that the desk can take. And this is a six-dimensional space, given (for instance) by the position of the centre of mass of the desk, and three rotational angles.

Given a configuration space $\mathcal{C}$ for a physical system $\mathcal{S}$ , we say that $\mathcal{S}$ has $\dim(\mathcal{C})$ degrees of freedom.

Although it is illuminating to think of configuration space abstractly, in practice we will often want to put coordinates on it, so that we can write and analyse concrete equations. I emphasize that this is a matter of convenience: any choice of coordinate system is equally valid, and the Lagrangian formalism holds regardless of the choice.

Given a configuration space $\mathcal{C}$ , any set of coordinates in this space is known as a set of generalized coordinates. Conventionally, when we want to indicate that some equation holds for arbitrary choices of generalized coordinates, we will use “ $q_i$ ” for the coordinate names, with $i\in\{1,\ldots,\dim(\mathcal{C})\}$ , and “ $\mathbf{q}$ ” (without indices) for the coordinate vector with components $q_i$ .

Consider the case of a particle moving on $\mathbb{R}^2$ . The configuration space is $\mathbb{R}^2$ . There are two natural sets of coordinates in this space (although I emphasize again that any choice is valid): we could choose the ordinary Cartesian coordinates $(x,y)$ , or it might be more convenient to choose polar coordinates $r,\theta$ satisfying $\begin{split} x & = r\cos(\theta)\, ,\\ y & = r\sin(\theta)\, . \end{split}$

Consider instead the case of a bead attached to a circular wire of unit radius on $\mathbb{R}^2$ , defined by the equation $x^2+y^2=1$ . The configuration space is the circle, $S^1$ . A possible coordinate in this space is the angular variable $\theta$ appearing in the description of the circle in polar coordinates.

We will only be dealing with unconstrained generalized coordinates when describing configuration space. That is, we want a set of exactly $\dim(\mathcal{C})$ coordinates (and no more) that describes, at least locally, the geometry of the configuration space $\mathcal{C}$ . So in example [ex:circle] we can take $\theta$ as our generalized coordinate, but we do not want to consider $(x,y)$ as generalized coordinates, as they are subject to the constraint $x^2+y^2=1$ . While there is nothing wrong geometrically with such systems of coordinates, the existence of the constraint implies that we cannot vary $x$ and $y$ independently in our variational problem (as we we will implicitly do below), and this complicates the analysis of the problem somewhat. So, for simplicity, we will just declare that henceforth we are dealing with unconstrained systems of generalized coordinates.

We can now repeat the derivation of the Euler-Lagrange equations for a general configuration space $\mathcal{C}$ . Consider a general path in configuration space given by $\mathbf{q}(t)\in\mathcal{C}$ ,⁷ and assume the existence of a Lagrangian function, $L(\mathbf{q},\mathbf{\dot{q}})$ , such that the action for the path is given by $S= \int_{t_0}^{t_1}\! dt\, L(\mathbf{q}(t), \mathbf{\dot{q}}(t))\, .$ The variational principle states that, if we fix the initial and final positions in configuration space, that is $\mathbf{q}(t_0)=\mathbf{q}^{(0)}$ and $\mathbf{q}(t_1)=\mathbf{q}^{(1)}$ , the path taken by the physical system satisfies $\delta S = 0$ to first order in $\delta \mathbf{q}(t)$ . The derivation runs parallel to the one above (here $N:=\dim(\mathcal{C})$ ): $\begin{split} \delta S & = \int_{t_0}^{t_1} \! dt \, \sum_{i=1}^N \frac{\partial L}{\partial q_i}\delta q_i + \sum_{i=1}^N \frac{\partial L}{\partial \dot{q}_i}\delta \dot{q}_i \\ & = \int_{t_0}^{t_1} \! dt \, \sum_{i=1}^N \frac{\partial L}{\partial q_i}\delta q_i + \sum_{i=1}^N \frac{\partial L}{\partial \dot{q}_i} \frac{d}{dt}(\delta {q}_i) \\ & = \int_{t_0}^{t_1} \! dt \, \sum_{i=1}^N \left(\frac{\partial L}{\partial q_i} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right)\right)\delta {q}_i + \frac{d}{dt}\left(\delta q_i \frac{\partial L}{\partial \dot{q}_i}\right) \\ & = \left[\sum_{i=1}^N\delta q_i \frac{\partial L}{\partial \dot{q}_i}\right]_{t_0}^{t_1} + \int_{t_0}^{t_1} \! dt \, \sum_{i=1}^N\left( \frac{\partial L}{\partial q_i} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right)\right)\delta {q}_i\, . \end{split}$ As mentioned above, we are dealing with unconstrained coordinates, meaning that we can vary the $q_i$ independently in configuration space. Since there are $\dim(\mathcal{C})$ independent coordinates, applying the fundamental lemma of the calculus of variations leads to the system of $\dim(\mathcal{C})$ equations

equation [eq:Euler-Lagrange] - () = 0 i{1,…,()}

known as the Euler-Lagrange equations. I want to emphasize the fact that we have not made any assumptions about the specific choice of coordinate system used in deriving these equations, so the Euler-Lagrange equations are valid in any coordinate system.⁸

We emphasized in note [note:partial-derivative-subtlety] above that in the case of systems with one degree of freedom the Lagrangian is a function of the coordinate $x$ (a coordinate in the one-dimensional configurations space) and $\dot{x}$ , and these should be treated as independent variables when writing down the Euler-Lagrange equations for the system.

Similarly, for $N$ -dimensional configuration spaces, with generalized coordinates $q_i$ with $i\in\{1,\ldots,N\}$ , we have in the Lagrangian formalism $\frac{\partial q_i}{\partial \dot{q}_j} = \frac{\partial\dot{q}_i}{\partial q_j} = 0\$ and $\frac{\partial q_i}{\partial q_j} = \frac{\partial \dot{q}_i}{\partial \dot{q}_j} = \delta_{ij} = \begin{cases} 1 & \text{if } i=j\, ,\\ 0 & \text{otherwise.} \end{cases}$

We will later on include the possibility of Lagrangians that depend on time explicitly. We indicate this as $L(\mathbf{q},\mathbf{\dot{q}}, t)$ , an example could be $L=\frac{1}{2}m\dot{x}^2 - t^2x^2$ .

This is a mild modification of the discussion above, and it does not affect the form of the Euler-Lagrange equations, but there are a couple of things to keep in mind:

When taking partial derivatives, $t$ should be taken to be independent from $\mathbf{q}$ and $\mathbf{\dot{q}}$ . The reasoning for this is as in note [note:partial-derivative-subtlety]: the Lagrangian is now a function of $2\dim(\mathcal{C})+1$ arguments (the generalized coordinates, their velocities, and time), which are unrelated to each other. It is only when we use the Lagrangian to build the action that the parameters become related, but the partial derivatives that appear in the functional variation do not care about this, since they arise in computing the variation of the action under small changes in the path.

For instance, for $L=\frac{1}{2}m\dot{x}^2 - \frac{1}{2}t^2x^2$ we have $\frac{\partial L}{\partial \dot{x}} = m\dot{x}\qquad ; \qquad \frac{\partial L}{\partial x} = xt^2 \qquad ; \qquad \frac{\partial L}{\partial t} = -tx^2\, .$
Since in extremizing the action we change the path, but leave the time coordinate untouched, there is no Euler-Lagrange equation associated to $t$ . In the example above there would be a single Euler-Lagrange equation, of the form $\frac{d}{dt}\left (\frac{\partial L}{\partial {\dot z }}\right )-\frac{\partial L}{\partial z} = m\ddot{z} + t^2z = 0\, .$

2.3 Lagrangians for classical mechanics

So far we have kept $L(\mathbf{q},\mathbf{\dot{q}})$ unspecified. How should we choose the Lagrangian in order to reproduce the classical equations of motion? Ultimately, this needs to be decided by experiment, but in problems in classical mechanics there is a very simple prescription, that I will now state. Consider a system with kinetic energy $T(\mathbf{q},\mathbf{\dot{q}})$ and potential energy $V(\mathbf{q})$ . Then the Lagrangian that leads to the right equations of motion is

equation* L = T - V

Let us see that this gives the right equations of motion in the simple case of a particle moving in three dimensions. The configuration space is $\mathbb{R}^3$ , and if we choose Cartesian coordinates $x_i$ (that is, we choose $q_i=x_i$ ) we have $T = \frac{1}{2}m(\dot{x}_1^2 + \dot{x}_2^2 + \dot{x}_3^2)$ and $V=V(x_1,x_2,x_3)$ . Note, in particular, that $T$ depends only on $\dot{x}_i$ , and $V$ depends on $x_i$ only. We have three degrees of freedom, so we have three Euler-Lagrange equations, given by $\begin{split} 0 & = \frac{\partial L}{\partial x_i} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{x}_i}\right) \\ & = -\frac{\partial V}{\partial x_i} - m\frac{d}{dt}(\dot{x}_i) \\ & = -\frac{\partial V}{\partial x_i} - m\ddot{x}_i \end{split}$ where we have used that $\frac{\partial V}{\partial \dot{x}_i}=0$ and $\frac{\partial T}{\partial x_i}=0$ , since $x_i$ and $\dot{x}_i$ are independent variables in the Lagrangian formalism, as we explained above. We can rewrite the equations above in vector notation as $m\frac{d^2}{dt^2}(\vec{x}) = -\vec{\nabla} V$ which is precisely Newton’s second law for a conservative force $\vec{F}=-\vec{\nabla} V$ .

The simplest example of the discussion so far is the free particle of mass $m$ moving in $d$ dimensions. Its configuration space is $\mathbb{R}^d$ , which we can parametrize using Cartesian coordinates $x_i$ . In these coordinates the kinetic energy is given by $T = \frac{1}{2}m\sum_{i=1}^d \dot{x}_i^2$ and the potential energy $V$ vanishes. This gives a Lagrangian $L = T - V = \frac{1}{2}m\sum_{i=1}^d \dot{x}_i^2$ which leads to the $d$ Euler-Lagrange equations of motion $m\ddot{x}_i = 0 \quad \forall i\in\{1,\ldots,d\}\, .$ These equations are solved by the particle moving at constant speed, $x_i=v_i t+b_i$ , with $v_i, b_i$ constants.

Our second example will be a pendulum moving under the influence of gravity. Our conventions will be as in figure 1: we have a mass $m$ attached by a rigid massless rod of length $\ell$ to a fixed point at the origin. The pendulum can swing on the $(x,y)$ plane. The configuration space of the system is $S^1$ . We choose as a coordinate the angle $\theta$ of the rod with the downward vertical axis from the origin, measured counterclockwise. The whole system is affected by gravity, which acts downwards.

The pendulum discussed in example [ex:pendulum].

We now need to compute the kinetic and potential energy in terms of $\theta$ . The expression of the kinetic energy in the $(x,y)$ coordinates is $\frac{1}{2}m(\dot{x}^2 + \dot{y}^2)$ . In terms of $\theta$ we have $x = \ell \sin(\theta)\qquad \text{and}\qquad y = -\ell \cos(\theta)\, .$ This implies $\dot{x}=\ell\cos(\theta)\dot{\theta}$ and $\dot{y}=\ell\sin(\theta)\dot{\theta}$ , so $T = \frac{1}{2}m \ell^2 \dot{\theta}^2\,.$ The potential energy, in turn, is (up to an irrelevant additive constant) given by $V = mgy = -mg\ell\cos(\theta)$ leading to the Lagrangian $L = T-V = \frac{1}{2}m\ell^2\dot{\theta}^2 + mg\ell\cos(\theta)\, .$ The corresponding Euler-Lagrange equations are $m\ell^2 \ddot{\theta} + mg\ell\sin(\theta) = 0$ or equivalently $\ddot{\theta} + \frac{g}{\ell}\sin(\theta) = 0\, .$ The exact solution of this system requires using something known as elliptic integrals, but as a simple check of our solution, note that for small angles $\sin(\theta)\approx \theta$ , and the Euler-Lagrange equation reduces to $\ddot{\theta} + \frac{g}{\ell}\theta = 0$ with solution $\theta(t)=a\sin(\omega t) + b\cos(\omega t)$ , where $\omega=\sqrt{g/\ell}$ , and $a,b$ are arbitrary constants that encode initial conditions. These are the simple oscillatory solutions that one expects close to $\theta=0$ .

Consider instead a spring with a mass attached to it. The spring is attached on one end to the origin, but it is otherwise free to rotate on the $(x,y)$ plane, without friction. In this case we ignore the effect of gravity, and we assume that the spring has vanishing natural length, and constant $\kappa$ . The configuration is shown in figure 2.

The rotating spring studied in example [ex:rotating-spring].

In this case the configuration space is $\mathbb{R}^2$ . It is easiest to solve the Euler-Lagrange equations in Cartesian coordinates. We have the kinetic energy $T = \frac{1}{2}m(\dot{x}^2 + \dot{y}^2)\, .$ The potential energy is given by the square of the extension of the spring, times the spring constant. We are assuming that the natural length of the spring is 0, so we have that the extension of the spring in $\ell = \sqrt{x^2+y^2}$ . So the potential energy is $V = \frac{1}{2}\kappa \ell^2 = \frac{1}{2}\kappa (x^2+y^2) \, .$ Putting everything together, we find that $L = T-V = \frac{1}{2}m (\dot{x}^2+\dot{y}^2) - \frac{1}{2}\kappa (x^2+ y^2)\, .$ The Euler-Lagrange equations split into independent equations for $x$ and $y$ , given by $\begin{split} \ddot{x} + \frac{\kappa}{m}x &= 0\, ,\\ \ddot{y} + \frac{\kappa}{m}y & =0\, . \end{split}$ The general solution is then simply $\begin{split} x(t)&=a_x\sin(\omega t) + b_x\cos(\omega t)\, ,\\ y(t)&=a_y\sin(\omega t) + b_y\cos(\omega t)\, , \end{split}$ with $a_x,a_y,b_x,b_y$ constants encoding the initial conditions, and $\omega=\sqrt{\kappa/m}$ .

Let us try to solve this last example in polar coordinates $r,\theta$ . These are related to Cartesian coordinates by $\begin{split} x & = r\cos(\theta)\, ,\\ y & = r\sin(\theta)\, . \end{split}$ Taking time derivatives, and using the Chain Rule for time derivatives, we find $\begin{split} \dot{x} & = \dot{r}\cos(\theta) - r\sin(\theta)\dot{\theta}\, ,\\ \dot{y} & = \dot{r}\sin(\theta) + r\cos(\theta)\dot{\theta}\, . \end{split}$ A little bit of algebra then shows that $T = \frac{1}{2}m(\dot{x}^2 + \dot{y}^2) = \frac{1}{2}m(\dot r^2 + r^2\dot{\theta}^2)\, .$ On the other hand, the potential energy is simpler. We have $V = \frac{1}{2}\kappa (x^2+y^2) = \frac{1}{2}\kappa r^2\, .$ We thus find that the Lagrangian in polar coordinates is $L = T-V = \frac{1}{2}m(\dot r^2 + r^2\dot{\theta}^2) - \frac{1}{2}\kappa r^2\, .$ Let us write the Euler-Lagrange equations. For the coordinate $r$ we have $\frac{d}{dt}\left (\frac{\partial L}{\partial {\dot r }}\right )-\frac{\partial L}{\partial r} = m\ddot{r} - mr\dot{\theta}^2 + \kappa r = 0$ while for the $\theta$ coordinate we have $\frac{d}{dt}\left (\frac{\partial L}{\partial {\dot \theta }}\right )-\frac{\partial L}{\partial \theta} = \frac{d}{dt}\left(mr^2\dot{\theta}\right) = 0\, .$ This equation is quite remarkable: it tells us that there is a conserved quantity in this system, given by $mr^2\dot{\theta}$ . This was not obvious at all in the Cartesian formulation of the problem,⁹ but it follows immediately in polar coordinates, since the Lagrangian does not depend on $\theta$ , only on $\dot{\theta}$ , and accordingly $\partial L/\partial \theta=0$ . We can use this knowledge to simplify the problem. Define $J :=mr^2\dot{\theta}\, .$ This is a constant of motion, so on any given classical trajectory it is simply a real number fixed by initial conditions. We can use this knowledge to simplify the Euler-Lagrange equation for $r$ , which after replacing $\dot{\theta}=J/mr^2$ becomes an equation purely in terms of $r$ : $m\ddot{r} - mr\left(\frac{J}{mr^2}\right)^2 + \kappa r = m\ddot{r} - \frac{J^2}{mr^3} + \kappa r = 0\, .$

2.4 Ignorable coordinates and conservation of generalised momenta

It is useful to formalize what we just saw happen in example [ex:rotating-spring].

Given a set $\{q_1,\ldots,q_N\}$ of generalized coordinates, we say that a specific coordinate $q_i$ in the set is ignorable if the Lagrangian function, expressed in these generalised coordinates, does not depend on $q_i$ . That is, a coordinate is ignorable iff $\frac{\partial L(q_1,\ldots,q_N,\dot{q}_1,\ldots,\dot{q}_N)}{\partial q_i} = 0\, .$

The generalized momentum $p_i$ associated to a generalized coordinate is $p_i:=\frac{\partial L}{\partial \dot{q}_i}\, .$

With these two definitions in place we have

The generalized momentum associated to an ignorable coordinate is conserved.

This follows immediately from the Euler-Lagrange equation for the ignorable coordinate. Denoting the ignorable coordinate $q_i$ and its associated generalized momentum $p_i$ , we have $\frac{d}{dt}\left (\frac{\partial L}{\partial {\dot q_i }}\right )-\frac{\partial L}{\partial q_i} = \frac{dp_i}{dt} - 0 = \frac{dp_i}{dt} = 0\, .$

We already found a ignorable coordinate in example [ex:rotating-spring]. We have that $\theta$ was ignorable, and it associated generalized momentum is $p_\theta = \frac{\partial L}{\partial\dot{\theta}} = mr^2\dot{\theta} \, .$

An even simpler example is the free particle moving in $d$ dimensions. In Cartesian coordinates we have $L = T-V = \frac{1}{2} m \sum_{i=1}^d \dot{x}_i^2\, ,$ so every coordinate is ignorable. The associated generalized momenta are $p_i = \frac{\partial L}{\partial \dot{x}_i} = m\dot{x}_i\, .$ In this case conservation of generalized momenta is simply conservation of linear momentum.

Let us look again to the free particle, but this time in two dimensions ( $d=2$ ), and in polar coordinates. We have $L = T - V = \frac{1}{2}m(\dot{r}^2 + r^2\dot{\theta}^2)\, .$ We have that $\theta$ is ignorable. The associated conserved generalized momentum is $p_\theta = \frac{\partial L}{\partial \dot{\theta}} = mr^2\dot{\theta}\, .$ You might recognize this as the angular momentum of the particle (that is, linear momentum $\times$ position vector), which should indeed be conserved for the free particle.

3 Symmetries, Noether’s theorem and conservation laws

3.1 Ordinary symmetries

Our discussion of ignorable coordinates hints at a connection between symmetries and conservation laws: the fact that the Lagrangian does not depend on $q_i$ can be rephrased as the statement that the Lagrangian is invariant under the transformation $q_i\to q_i + \epsilon a_i$ , with $\epsilon a_i$ an arbitrary constant shift. (We will define all these concepts more carefully momentarily.) And we saw that whenever this happens, there is a conserved quantity, the generalized momentum $p_i$ .

This result is somewhat unsatisfactory, in that we can only understand the appearance of the conserved charges in carefully chosen coordinate systems. And, as we saw in the example of the free particle above, we might need to patch together results in different coordinate systems in order to access all the conserved charges in the system.

Noether’s theorem fixes these deficiencies, providing a coordinate-independent connection between symmetries and conservation laws. Before we get to the theorem itself, we will need some preliminary results and definitions.

Consider a uniparametric family of smooth maps $\varphi(\epsilon)\colon \mathcal{C}\to\mathcal{C}$ from configuration space to itself, with the property that $\varphi(0)$ is the identity map. We call this family of maps a transformation depending on $\epsilon$ . In any given coordinate system we can write the transformation as $q_i \to \phi_i(q_1, \ldots, q_N,\epsilon)$ with $\phi_i$ a set of $N:=\dim(\mathcal{C})$ functions representing the finite transformation in the given coordinate system. We take the change in velocities to be $\dot{q}_i \to \frac{d}{dt}\phi_i\, .$

At the level of the Lagrangian we treat $q_i$ and $\dot{q}_i$ as independent variables, so it is not automatic that the transformation of the velocities $\dot{q}_i$ is as given. One should take the prescription $\dot{q}_i \to \frac{d}{dt}\phi_i$ as part of the definition above.

A word on notation: when it is clear from the context which transformation we are talking about, we often write $q_i'$ instead of $\phi_i(\mathbf{q}, \epsilon)$ . That is, we often write $q_i \to q_i' = \ldots$ where the omitted terms are some function of $q_i$ and $\epsilon$ .

The generator of $\varphi$ is $\frac{d\varphi(\epsilon)}{d\epsilon}\Big|_{\epsilon=0} :=\lim_{\epsilon\to 0} \frac{\varphi(\epsilon)-\varphi(0)}{\epsilon}\, .$ In any given coordinate system we have $q_i\to \phi_i(\mathbf{q},\epsilon) = q_i + \epsilon a_i(\mathbf{q}) + \mathcal{O}(\epsilon^2)$ where $a_i = \frac{\partial\phi_i(\mathbf{q},\epsilon)}{\partial\epsilon}\Big|_{\epsilon=0}$ is a function of the generalized coordinates. So, in coordinates, the generator of the transformation is $a_i$ . Similarly, for the velocities we have $\dot{q}_i \to \dot{q}_i + \epsilon \dot{a}_i(q_1,\ldots,q_N,\dot{q}_1,\ldots,\dot{q}_N) + \mathcal{O}(\epsilon^2)$ generated by $\dot{a}_i$ .

A particle moving in $\mathbb{R}^d$ can be described in Cartesian coordinates $x_i$ . The transformation associated to translations of the origin of coordinates in the first direction is $x_1\to x_1 + \epsilon$ , with the other coordinates constant. So we have that shifts of the coordinate system in the $x_1$ direction are generated by $a_i = \begin{cases} 1 & \text{for } i=1\, ,\\ 0 & \text{otherwise} \end{cases}$ and $\dot{a}_i=0$ .

Say that we have a particle moving in two dimensions, and we want to consider the finite transformations given by rotations around the origin. In Cartesian coordinates we have $\begin{split} x & \to x\cos(\epsilon) - y\sin(\epsilon)\\ y & \to x\sin(\epsilon) + y\cos(\epsilon)\, . \end{split}$ In order to find the generators, we can derive the associated infinitesimal transformations by using the expansions $\sin(\epsilon) = \epsilon + \mathcal{O}(\epsilon^3)$ and $\cos(\epsilon) = 1 + \mathcal{O}(\epsilon^2)$ . We find $\begin{split} x & \to x - y\epsilon + \mathcal{O}(\epsilon^2)\\ y & \to y + x\epsilon + \mathcal{O}(\epsilon^2)\, . \end{split}$ This implies that the transformation is generated in Cartesian coordinates by $a_x = -y \qquad ; \qquad a_y = x \qquad; \qquad \dot{a}_x = -\dot{y} \qquad ; \qquad \dot{a}_y = \dot{x}\, .$

The equations of motion do not change if we modify the Lagrangian by addition of a total derivative of a function of coordinates and time. That is, $L \to L + \frac{dF(q_1,\ldots,q_N,t)}{dt}$ does not affect the equations of motion.

Since the term that we add is a total time derivative, the effect on the action is $\begin{equation} \label{eq:S-symmetry-transform} S = \int_{t_0}^{t_1} \!dt\, L \to S' = S + F(q_1(t_1), \ldots, q_N(t_1), t_1) - F(q_1(t_0), \ldots, q_N(t_0), t_0)\, . \end{equation}$ Now, recall that the variational principle tells us that the equations of motion are obtained by imposing that $\delta S$ vanishes to first order in $\delta q_i(t)$ , keeping the $q_i$ fixed at the endpoints of the path. This implies that in the variational problem both $F(q_1(t_0), \ldots, q_N(t_0), t_0)$ and $F(q_1(t_1), \ldots, q_N(t_1), t_1)$ are kept fixed. So $\begin{split} \delta S' & = S'[\mathbf{q}+\delta \mathbf{q}] - S'[\mathbf{q}]\\ & = S[\mathbf{q}+\delta \mathbf{q}] + F(q_1(t_1), \ldots, q_N(t_1), t_1) - F(q_1(t_0), \ldots, q_N(t_0), t_0) \\ & \phantom{=} - \left(S[\mathbf{q}] + F(q_1(t_1), \ldots, q_N(t_1), t_1) - F(q_1(t_0), \ldots, q_N(t_0), t_0)\right) \\ & = S[\delta \mathbf{q}] - S[\mathbf{q}] = \delta S\, . \end{split}$ We learn that the addition of $\frac{dF}{dt}$ to the Lagrangian does not affect the variation of the action in the variational problem, so it cannot affect the equations of motion.

This result motivates the following definition:

A transformation $\varphi(\epsilon)$ is a symmetry if, to first order in $\epsilon$ , there exists some function $F(\mathbf{q},t)$ such that the change in the Lagrangian is a total time derivative of $F(\mathbf{q}, t)$ : $L \to L' = L(\phi_1(\mathbf{q},\epsilon), \ldots, \phi_N(\mathbf{q},\epsilon), \dot\phi_1(\mathbf{q}, \mathbf{\dot{q}}, \epsilon), \ldots, \dot\phi_N(\mathbf{q}, \mathbf{\dot{q}}, \epsilon)) = L(\mathbf{q}, \mathbf{\dot{q}}) + \epsilon \frac{dF(\mathbf{q},t)}{dt} + \mathcal{O}(\epsilon^2)\, .$

I emphasize that $F(\mathbf{q},t)$ is only defined up to a constant: if some $F(\mathbf{q},t)$ exists such that $L' = L(\mathbf{q}, \mathbf{\dot{q}}) + \epsilon \frac{dF(\mathbf{q},t)}{dt} + \mathcal{O}(\epsilon^2)$ any other $F'(\mathbf{q},t)=F(\mathbf{q},t)+c$ with $c$ is a constant will also satisfy the same equation. The specific choice of $c$ is arbitrary, and any choice will lead to correct results. In what follows I will simply pick a convenient representative $F(\mathbf{q},t)$ — for instance $F(\mathbf{q},t)=0$ whenever this is possible.

Whenever we have an ignorable coordinate $q_i$ , the symmetry associated to shifting it by constants, $q_i\to q_i + c_i$ , is clearly a symmetry, since by definition the coordinate does not appear in the Lagrangian, and $\dot{q}_i$ stays invariant. So in this case $F$ can be chosen to be 0.

As an example, consider the example of the rotating spring discussed in example [ex:rotating-spring]. In polar coordinates $(r,\theta)$ , we have $L = \frac{1}{2}m(\dot{r}^2 + r^2\dot{\theta}^2) - \frac{1}{2}\kappa r^2\, .$ In this case the $\theta$ coordinate is ignorable, so the associated shift $\theta\to\theta+\epsilon$ is a symmetry. The generators of the symmetry are $a_r = 0 \qquad ; \qquad a_\theta = 1 \qquad ; \qquad \dot{a}_r = 0 \qquad ; \qquad \dot{a}_\theta = 0\, .$

Let us study the same system as in the previous example, but now in Cartesian coordinates. We have $L = \frac{1}{2}m (\dot{x}^2 + \dot{y}^2) - \frac{1}{2}\kappa (x^2+y^2)\, .$ The transformation $\theta\to\theta+\epsilon$ is a rotation around the origin. Whenever $\epsilon\ll 1$ , we have $\begin{split} x & \to x' = x - \epsilon y + \mathcal{O}(\epsilon^2)\\ y & \to y' = y + \epsilon x + \mathcal{O}(\epsilon^2)\, . \end{split}$ as we argued in example [ex:2d-in-polars]. And accordingly, for the time derivatives we have $\begin{split} \dot{x} & \to \dot{x}' = \dot{x} - \dot{y}\epsilon + \mathcal{O}(\epsilon^2)\\ \dot{y} & \to \dot{y}' = \dot{y} + \dot{x}\epsilon + \mathcal{O}(\epsilon^2)\, . \end{split}$ Note that this transformations imply that $x^2+y^2 \to {x}'^2+{y'}^2=(x+\epsilon y)^2+(y-\epsilon x)^2 = x^2+y^2 + \mathcal{O}(\epsilon^2)$ and similarly that ${\dot x}^2+{\dot y}^2\to {\dot x}'^2+{\dot y}'^2={\dot x}^2+{\dot y}^2+\mathcal{O}(\epsilon^2)$ The action of the symmetry on the Lagrangian is then, to first order in $\epsilon$ : $L \to L' = L(x', y', \dot{x}', \dot{y}') = L + \mathcal{O}(\epsilon^2)$ so we also see in this coordinate system that the rotation is a symmetry.

Note that this argument generalizes straightforwardly for any Lagrangian of the form $L = \frac{1}{2}m(\dot{x}^2+\dot{y}^2) - V(x^2 + y^2)$ with $V(r)$ an analytic function of $r$ , since in this case $V(x^2+y^2+\mathcal{O}(\epsilon^2)) = V(x^2+y^2) + \mathcal{O}(\epsilon^2)\, .$

Consider a system with Lagrangian $L=\frac{m}{2}\left ({\dot x}^2+{\dot y}^2\right ) -y{\dot x}-\frac{1}{2}x^2\, ,$ and a transformation generated by $\begin{aligned} x &\to & x'=x\, ,\\ y &\to & y'=y+\epsilon\, . \end{aligned}$ Then ${\dot x}'={\dot x}$ and ${\dot y}'={\dot y}$ and $\delta L = L(x',y',{\dot x}',{\dot y}')-L(x,y,{\dot x},{\dot y})= -y'{\dot x'}+y{\dot x}=-\epsilon{\dot x}\, .$ So this is also a symmetry, this time with $F=-x$ .

It is important to notice that the definition of symmetry above does not involve the equations of motion: the Lagrangian must stay invariant (up to a total derivative) without using the equations of motion. That is, the Lagrangian must be invariant also for those paths in configuration space that do not extremize $S$ .

We are finally in a position to state and prove Noether’s theorem.

[Noether] Consider a transformation generated by $a_i(q_1,\ldots,q_N)$ (in a given set of generalized coordinates), such that $L\to L + \epsilon \frac{dF(q_1,\ldots,q_N,t)}{dt} + \mathcal{O}(\epsilon^2)\, ,$ so that it is a symmetry. Then

is conserved (that is, $\frac{dQ}{dt}=0$ ). The conserved quantity $Q$ is known as the Noether charge.

I will start by giving the intuitive idea behind the proof. Recall that physical trajectories $q_i(t)$ are those that satisfy $\delta S=0$ to first order in $\delta q_i(t)$ , keeping the endpoints $q_i(t_0)$ and $q_i(t_1)$ fixed. A general transformation acts as $q_i(t)\to q(t)+\epsilon a_i(q, t)$ , but crucially it does not necessarily keep the endpoints $q_i(t_0)$ and $q_i(t_1)$ fixed. So the action of a physical path can change to first order in $\epsilon$ under a generic transformation. But it does so in a fairly localised way: only the behaviour near the endpoints of the path, at $t_0$ and $t_1$ , can contribute to $\delta S$ . If the transformation is furthermore a symmetry, we can compute $\delta S$ (to first order in $\epsilon$ ) in a second way, as a function of quantities at $t_0$ and $t_1$ only, using the transformation $\eqref{eq:S-symmetry-transform}$ we saw in the proof of lemma [lemma:EOMS-total-derivative]. Equating the result of both approaches leads to Noether’s theorem.

In detail, this goes a follows. We want to understand the variation of the action under the transformation $q_i \to q_i+\delta q_i = q_i + \epsilon a_i$ in two different ways. On one hand, as for any other variation of the path, we can Taylor expand to obtain $\delta S = \int_{t_0}^{t_1} \!dt\, \sum_{i=1}^N\left(\epsilon a_i\frac{\partial L}{\partial q_i} + \epsilon \dot{a}_i\frac{\partial L}{\partial \dot{q}_i}\right) + \mathcal{O}(\epsilon^2)$ which becomes, using the Euler-Lagrange equations $\begin{split} \delta S & = \int_{t_0}^{t_1} \!dt\, \sum_{i=1}^N\left(\epsilon a_i\frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right) + \epsilon\dot{a}_i\frac{\partial L}{\partial \dot{q}_i}\right) + \mathcal{O}(\epsilon^2)\\ & = \int_{t_0}^{t_1}\!dt\, \epsilon \frac{d}{dt}\left(\sum_{i=1}^Na_i\frac{\partial L}{\partial \dot{q}_i}\right) + \mathcal{O}(\epsilon^2)\\ & = \epsilon\left[\sum_{i=1}^Na_i\frac{\partial L}{\partial \dot{q}_i}\right]_{t_0}^{t_1} + \mathcal{O}(\epsilon^2)\, . \end{split}$ Note that we have used the Euler-Lagrange equations of motion in going from the first to the second line, so the result will only be valid along the path that satisfies the equations of motion.

On the other hand, using the fact that the variation is a symmetry, we have $\begin{split} \delta S & = S[\mathbf{q}+\delta \mathbf{q}] - S[\mathbf{q}] \\ & = \int_{t_0}^{t_1} \! dt\, \left(\left(L + \epsilon \frac{dF}{dt} + \mathcal{O}(\epsilon^2)\right) - L\right)\\ & = \epsilon \left[F\right]_{t_0}^{t_1} + \mathcal{O}(\epsilon^2)\, . \end{split}$ Equation both results, we immediately obtain that $Q(t_1)=Q(t_0)$ . Since the choice of $t_0$ and $t_1$ is arbitrary, the result now follows easily: choose $t_1=t_0+\epsilon$ . We have $Q(t_1) - Q(t_0) = Q(t_0 + \epsilon) - Q(t_0) = \epsilon \frac{dQ}{dt} + \mathcal{O}(\epsilon^2) = 0$ so $\frac{dQ}{dt}=0$ .

Whenever the coordinate $q_i$ is ignorable, we have a symmetry (with $f=0$ ) generated by $q_i\to q_i+\epsilon$ , leaving the other coordinates constant. That is, $a_k = \delta_{ik} :=\begin{cases} 1 & \text{if } i = k\, .\\ 0 & \text{otherwise.} \end{cases}$ The corresponding Noether charge is then $Q = \sum_{k=1}^N a_k\frac{\partial L}{\partial \dot{q}_i} = \sum_{k=1}^N \delta_{ki}\frac{\partial L}{\partial \dot{q}_i} = \frac{\partial L}{\partial \dot{q}_i}$ as expected.

Let us come back to the conservation of angular momentum in rotationally symmetric systems, expressed in Cartesian coordinates. Assume that we have a system with Lagrangian $L = \frac{1}{2}m(\dot{x}^2+\dot{y}^2) - V(x^2+y^2)\, .$ We saw in example [ex:rotation-Cartesian] that rotations around the origin, which are generated by $a_x = -y \qquad ; \qquad a_y = x\, ,$ are a symmetry of the system with $F=0$ .

Noether’s theorem then tells us that the associated charge is $Q = a_x \frac{\partial L}{\partial \dot{x}} + a_y \frac{\partial L}{\partial \dot{y}} = m(-y\dot{x} + x\dot{y})\, .$ It is a simple exercise to show that this is indeed equal to $mr^2\dot{\theta}$ .

Finally, let us revisit example [ex:nonzero-F-symmetry]. We have a Lagrangian $L=\frac{m}{2}\left ({\dot x}^2+{\dot y}^2\right ) -y{\dot x}-\frac{1}{2}x^2\, ,$ and a transformation generated by $\begin{aligned} x &\to & x'=x\, ,\\ y &\to & y'=y+\epsilon\, . \end{aligned}$ That is, $a_x=0$ and $a_y=1$ . We found in example [ex:nonzero-F-symmetry] that this transformation is a symmetry with $F=-x$ . The associated Noether charge is $Q = a_x \frac{\partial L}{\partial \dot{x}} + a_y \frac{\partial L}{\partial \dot{y}} - F = m\dot{y} + x\, .$ We can check that this is conserved from the equations of motion, which are $\begin{split} \frac{d}{dt}\left (\frac{\partial L}{\partial {\dot x }}\right )-\frac{\partial L}{\partial x} & = m\ddot{x} - \dot{y} + x = 0\, ,\\ \frac{d}{dt}\left (\frac{\partial L}{\partial {\dot y }}\right )-\frac{\partial L}{\partial y} & = m\ddot{y} + \dot{x} = 0\, . \end{split}$ Note in particular that the second equation is precisely $\frac{dQ}{dt}=0$ .

3.2 Energy conservation

Conservation of energy can be understood in a way quite similar to what we have seen: energy can be defined as the Noether charge associated with time translations. The derivation is quite similar to the one above, but with some small, but crucial, differences needed in order to take into account the fact that the time coordinate “ $t$ ” is treated specially in the formalism so far (it is not a generalised coordinate).

Let us consider the possibility that the Lagrangian depends explicitly on time. That is, we promote the Lagrangian $L$ to a function of the generalized coordinates $q_i(t)$ , their associated velocities $\dot{q}_i(t)$ , and time itself. We write this as $L(\mathbf{q},\mathbf{\dot{q}},t)$ . The expression of the action is now $S = \int_{t_0}^{t_1} L(q_1(t), \ldots, q_N(t), \dot{q}_1(t), \ldots, \dot{q}_N(t), t)\, dt\, .$ It is not difficult to see that the Euler-Lagrange equations do not change if we do this.¹⁰

Given a Lagrangian $L(\mathbf{q},\mathbf{\dot{q}},t)$ , we defined the energy to be

Along a path $\mathbf{q}(t)$ satisfying the equations of motion, we have $\frac{dE}{dt} = -\frac{\partial L}{\partial t}\, .$ In particular, the energy is conserved if and only if the Lagrangian does not depend explicitly on time.

In this theorem $\frac{\partial L}{\partial t}$ denotes taking the derivative of the Lagrangian with respect to time, keeping $\mathbf{q}$ and $\mathbf{\dot{q}}$ fixed. See note [note:time-dependent-Lagrangians] for a further discussion of this point.

[Elementary proof] It is easy to verify directly, by taking the time derivative of the definition of energy, that the theorem holds. The calculation goes as follows. If we take the time derivative of the energy, we have (from definition [def:energy]) $\begin{split} \frac{dE}{dt} & = \frac{d}{dt}\left(\left(\sum_{i=1}^N \dot{q}_i \frac{\partial L}{\partial \dot{q}_i} \right) - L\right)\\ & = \left(\sum_{i=1}^N \ddot{q}_i \frac{\partial L}{\partial \dot{q}_i} + \dot{q}_i\frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right)\right) - \frac{dL}{dt} \end{split}$ Using the Euler-Lagrange equations, this becomes $\frac{dE}{dt} = \left(\sum_{i=1}^N \ddot{q}_i \frac{\partial L}{\partial \dot{q}_i} + \dot{q}_i\frac{\partial L}{\partial q_i}\right) - \frac{dL}{dt}\, .$ On the other hand, from the Chain Rule, we have $\frac{dL}{dt} = \left(\sum_{i=1}^N \frac{\partial L}{\partial \dot{q}_i}\ddot{q}_i + \frac{\partial L}{\partial q_i}\dot{q}_i\right) + \frac{\partial L}{\partial t}\, .$ The result now follows from substitution.

[Alternative proof]

Here I will present a less straightforward but (in my opinion) more illuminating proof, closer in spirit to the one we used in proving Noether’s theorem.

Imagine that we take a path $\mathbf{q}(t)$ satisfying the equations of motion, and we displace it to a new path $\mathbf{q}'(t)=\mathbf{q}(t-\epsilon)$ . That is, we move the whole path slightly forward in time, keeping its shape. We have $\begin{split} S' & = \int_{t_0}^{t_1} \!dt\, L(q'_1(t), \ldots, q'_N(t), \dot{q}'_1(t), \ldots, \dot{q}'_N(t), t) \\ & = \int_{t_0}^{t_1} \!dt\, L(q_1(t-\epsilon), \ldots, q_N(t-\epsilon), \dot{q}_1(t-\epsilon), \ldots, \dot{q}_N(t-\epsilon), t)\, . \end{split}$

We can compute this expression in two different ways. First, by the Chain Rule, we have that $\begin{gathered} L(q_1(t-\epsilon), \ldots, q_N(t-\epsilon), \dot{q}_1(t-\epsilon), \ldots, \dot{q}_N(t-\epsilon), t) = \\ L(q_1(t), \ldots, q_N(t), \dot{q}_1(t), \ldots, \dot{q}_N(t), t) - \epsilon \left(\sum_{i=1}^N \frac{\partial L}{\partial q_i} \dot{q}_i + \frac{\partial L}{\partial \dot{q}_i} \ddot{q}_i \right) + \mathcal{O}(\epsilon^2)\, . \end{gathered}$ Using the Euler-Lagrange equations of motion, we can write this as $\begin{split} \sum_{i=1}^N \frac{\partial L}{\partial q_i} \dot{q}_i + \frac{\partial L}{\partial \dot{q}_i} \ddot{q}_i & = \sum_{i=1}^N \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right)\dot{q_i} + \frac{\partial L}{\partial \dot{q}_i} \ddot{q}_i\\ & = \frac{d}{dt}\left(\sum_{i=1}^N \dot{q}_i \frac{\partial L}{\partial \dot{q}_i}\right)\, . \end{split}$ Substituting these expressions into the action, we have just proven that $S' = S-\epsilon \left[\sum_{i=1}^N \dot{q}_i \frac{\partial L}{\partial \dot{q}_i}\right]_{t_0}^{t_1} + \mathcal{O}(\epsilon^2)\, .$ On the other hand, introducing a new variable $t'=t-\epsilon$ , we can write $\begin{split} S' & = \int_{t_0}^{t_1} \!dt\, L(q_1(t-\epsilon), \ldots, q_N(t-\epsilon), \dot{q}_1(t-\epsilon), \ldots, \dot{q}_N(t-\epsilon), t)\\ & = \int_{t_0-\epsilon}^{t_1-\epsilon} \!dt'\, L(q_1(t'), \ldots, q_N(t'), \dot{q}_1(t'), \ldots, \dot{q}_N(t'), t'+\epsilon)\, . \end{split}$ We can expand this as a series in $\epsilon$ using Leibniz’s rule (see equation $\eqref{eq:Leibniz-rule}$ in the appendix for a reminder), to get: $\begin{split} S' & = S + \epsilon \frac{dS'}{d\epsilon}\Big|_{\epsilon=0} + \mathcal{O}(\epsilon^2)\\ & = S - \epsilon L(q_1(t_1),\ldots,q_N(t_1), \dot{q}_1(t_1),\ldots, \dot{q}_N(t_1), t_1) \\ & \phantom{= S} + \epsilon L(q_1(t_0),\ldots,q_N(t_0), \dot{q}_1(t_0),\ldots, \dot{q}_N(t_0), t_0) \\ & \phantom{=S} + \epsilon \left[\int_{t_0-\epsilon}^{t_1-\epsilon}\!dt'\, \frac{\partial L(q_1(t'), \ldots, q_N(t'), \dot{q}_1(t'), \ldots, \dot{q}_N(t'), t'+\epsilon)}{\partial \epsilon}\right]_{\epsilon=0}\\ & \phantom{=S} + \mathcal{O}(\epsilon^2) \end{split}$ Now we note that that, by the Chain Rule, we have $\frac{\partial L(q_1(t'), \ldots, q_N(t'), \dot{q}_1(t'), \ldots, \dot{q}_N(t'), t'+\epsilon)}{\partial \epsilon} = \frac{\partial L(q_1(t'), \ldots, q_N(t'), \dot{q}_1(t'), \ldots, \dot{q}_N(t'), t'+\epsilon)}{\partial t'}$ so $\begin{gathered} \left[\int_{t_0-\epsilon}^{t_1-\epsilon}\!dt'\, \frac{\partial L(q_1(t'), \ldots, q_N(t'), \dot{q}_1(t'), \ldots, \dot{q}_N(t'), t'+\epsilon)}{\partial \epsilon}\right]_{\epsilon=0} \\= \int_{t_0}^{t_1}\!dt\, \frac{\partial L(q_1(t), \ldots, q_N(t), \dot{q}_1(t), \ldots, \dot{q}_N(t), t)}{\partial t}\, . \end{gathered}$ The theorem now follows from equating the two expressions for $S'$ that we found.

It is not obvious that the quantity $E$ that is conserved if $\frac{\partial L}{\partial t}=0$ is what is usually known as “energy” in classical mechanics. But this is easy to verify. Assume that we have a particle with Lagrangian $L = T(\dot{x}_1,\ldots,\dot{x}_N) - V(x_1,\ldots,x_N)$ with $T(\dot{x}_1,\ldots,\dot{x}_N) = \frac{1}{2}m(\dot{x}_1^2+\ldots+\dot{x}_d^2)$ , as we often do in classical mechanics. Then applying the definition [def:energy] above one easily finds the expected relation $E = T+V\, .$ The result holds more generally. Consider a Lagrangian of the form $L = \underbrace{\left(\sum_{i,j=1}^{N} K_{ij}(q_1,\ldots,q_N) \dot{q}_i\dot{q}_j\right)}_{T(q,\dot{q})} - V(q)$ with the $K_{ij}(q)$ and $V(q)$ arbitrary functions on configuration space $\mathcal{C}$ . Then it is easy to verify that $E = T + V\, .$

Say that we have a spring that becomes weaker with time, with a spring constant $\kappa(t)=e^{-t}$ . A mass attached to the spring can then be described by a Lagrangian $L = \frac{1}{2} m \dot{x}^2 - \frac{1}{2}\kappa(t) x^2\, .$ The resulting equation of motion is $m\ddot{x} + \kappa(t) x = 0 \, .$ The energy of the system is $E = \frac{1}{2} m \dot{x}^2 + \frac{1}{2}\kappa(t) x^2\, .$ Since the Lagrangian depends explicitly on time, we expect that energy is not conserved. And indeed: $\begin{split} \frac{dE}{dt} & = m\dot{x}\ddot{x} + \kappa(t)x \dot{x} + \frac{1}{2}x^2 \frac{d\kappa(t)}{dt} \\ & = \dot{x}(m\ddot{x} + \kappa(t)x) + \frac{1}{2}x^2 \frac{d\kappa(t)}{dt} \\ & = \frac{1}{2}x^2 \frac{d\kappa(t)}{dt} \end{split}$ where in the last step we have used the equation of motion. On the other hand $\frac{\partial L}{\partial t} = - \frac{1}{2}x^2 \frac{d\kappa(t)}{dt}$ since time appears explicitly only in $\kappa(t)$ . So we have verified that $\frac{dE}{dt} = - \frac{\partial L}{\partial t}\, .$

Note that our definition [def:energy] for the energy does not require the Lagrangian to have the specific form $L=T-V$ . Consider for instance the Lagrangian $L=-m\left(\sqrt{1-\dot{x}^{2}-\dot{y}^{2}-\dot{z}^{2}}\right)\, .$ (This specific Lagrangian is in fact fairly important, as it describes the motion of a particle in $\mathbb{R}^3$ in special relativity.) Definition [def:energy] gives $E =\dot{x}\frac{\partial L}{\partial\dot{x}}+\dot{y}\frac{\partial L}{\partial\dot{y}}+\dot{z}\frac{\partial L}{\partial\dot{z}}-L\, .$ We have $\dot{x}\frac{\partial L}{\partial\dot{x}}=\frac{m\dot{x}^{2}}{\sqrt{1-\dot{x}^{2}-\dot{y}^{2}-\dot{z}^{2}}},$ and similarly for $\dot{y}$ and $\dot{z}$ . Putting everything together we find $\begin{split} E&=\frac{m(\dot{x}^2+\dot{y}^{2}+\dot{z}^{2})}{\sqrt{1-\dot{x}^{2}-\dot{y}^{2}-\dot{z}^{2}}}+m\sqrt{1-\dot{x}^{2}-\dot{y}^{2}-\dot{z}^{2}}\\ & =\frac{m}{\sqrt{1-\dot{x}^{2}-\dot{y}^{2}-\dot{z}^{2}}}\, . \end{split}$

4 Normal modes

So far we have studied the Euler-Lagrange equations abstractly, but we have not spent much effort actually trying to solve them, except on some fairly elementary examples. The reason for this is simple: in most cases we cannot solve the equations in closed form. Even if we can, it is rarely the case that the answer can be written in terms of elementary functions. Recall, for instance, example [ex:pendulum] above, where we discussed the pendulum. We found that the Euler-Lagrange equations of motion were of the form $\ddot{\theta} + \frac{g}{\ell}\sin(\theta) = 0\, .$ This equation can be solved in closed form, in terms of a class of special functions known as “elliptic functions”, but the solution is relatively involved, and not particularly illuminating for our current purposes. Rather than insisting in solving the problem exactly from the outset, it is often illuminating to instead try to understand what the system does for small displacements away from equilibrium. That is, for small values of $\theta$ . In this regime we have that $\sin(\theta)\approx\theta$ , and the equation of motion becomes $\ddot{\theta} + \frac{g}{\ell}\theta = 0$ which can be solved straightforwardly to give $\theta = a \cos(\omega t) + b \sin(\omega t)$ with $\omega = \sqrt{\frac{g}{\ell}}$ and $a,b$ constants that depend on the initial conditions.

The technology of normal modes, which we introduce in this section, is a way of formalizing this observation, and applying it systematically to more complicated systems.

4.1 Canonical kinetic terms

Let us restrict ourselves to the neighbourhood of minima of the potential. Assume, to start with, that we have a Lagrangian $\begin{equation} \label{eq:normal-mode-Lagrangian-simple} L = \frac{1}{2}\sum_{i=1}^n \dot{q}_i^2 - V(\mathbf{q})\, . \end{equation}$ This particularly simple form for the kinetic term $T = \frac{1}{2} \sum_{i=1}^n \dot{q}_i^2$ is known as a canonical kinetic term.

Assume that there is a stationary point of $V(\mathbf{q})$ at $\mathbf{q}=0$ , that is $\frac{\partial V}{\partial q_i}\Big|_{\mathbf{q}=0}=0 \quad \forall i\, .$ If the stationary point we are interested in is at some other position $\mathbf{q}=(a_1,\ldots,a_N)$ , we can simply introduce new variables $q'_i=q_i-a_i$ such that the stationary point is now at $\mathbf{q}'=0$ . Clearly in doing this the form of equation $\eqref{eq:normal-mode-Lagrangian-simple}$ is preserved, so for simplicity we will assume henceforth that the stationary point we are studying is indeed at $\mathbf{q}=0$ .

We can write an approximate Lagrangian, describing the dynamics around this extremum, by expanding $V(\mathbf{q})$ to second order in $\mathbf{q}$ $L_{\text{approx}} = \frac{1}{2}\sum_{i=1}^n \dot{q}_i^2 - V({\mathbf 0}) - \frac{1}{2} \sum_{i,j} \mathsf{A}_{ij} q_i q_j$ with $\mathsf{A}_{ij} = \frac{\partial^2 V}{\partial q_i \partial q_j}\biggr|_{q=0}\, .$ The equations of motion arising from the approximate Lagrangian are given in matrix notation by $\mathbf{\ddot{q}}+ \mathsf{A}{\bf q} = 0\, .$ Note that the $V({\mathbf 0})$ term does not affect the equations of motion. The approximate equations of motion are linear, since they can be written as $D_{\mathsf{A}}{\bf q} :=\left(\frac{d^2}{dt^2} + \mathsf{A}\right) {\bf q} = 0\, .$ where we have defined $D_{\mathsf{A}}:=\frac{d^2}{dt^2} + \mathsf{A}$ . This is a linear operator, meaning that given any two vectors $\bf a$ and $\bf b$ we have $D_\mathsf{A}({\bf a}+{\bf b}) = D_{\mathsf{A}}{\bf a} + D_\mathsf{A}{\bf b}$ , and also for any $c\in \mathbb{R}$ and vector ${\bf a}$ we have $D_\mathsf{A}(c{\bf a}) = c D_\mathsf{A}{\bf a}$ . We have $n$ equations, and the equations are of second order and linear, so we expect to be able to express any solution of the approximate equations of motion as a linear superposition of some $2n$ basic solutions.

To find these solutions, let us start by noticing that the $n\times n$ matrix $\mathsf{A}$ is real and symmetric (for any potential whose second partial derivatives are continuous, which will be the case during this course), so it has real eigenvalues and eigenvectors. We denote the set of eigenvalues of $\mathsf{A}$ by $\lambda^{(i)}$ , and the $n$ corresponding eigenvectors by ${\bf v}^{(i)}$ , so that $\mathsf{A}{\bf v}^{(i)} = \lambda^{(i)}{\bf v}^{(i)}\, .$ Let us now take an ansatz¹¹ $\begin{equation} \label{eq:normal-mode-ansatz} \mathbf{q}^{(i)}(t) = f^{(i)}(t) {\bf v}^{(i)} \end{equation}$ for some function $f^{(i)}(t)$ that we will determine. Since the eigenvector ${\bf v}^{(i)}$ has eigenvalue $\lambda^{(i)}$ we have $\begin{split} \left(\frac{d^2}{dt^2} + \mathsf{A}\right)\mathbf{q}^{(i)}(t) & = \left(\frac{d^2}{dt^2} + \mathsf{A}\right)f^{(i)}(t) {\bf v}^{(i)} \\ & = {\bf v}^{(i)} \left(\frac{d^2}{dt^2} + \lambda^{(i)}\right)f^{(i)}(t)\\ & = 0\, . \end{split}$ Since ${\bf v}^{(i)}\neq 0$ , this implies that $\left(\frac{d^2}{dt^2} + \lambda^{(i)}\right)f^{(i)}(t) = 0\, .$ The form of the solution depends on the sign of $\lambda^{(i)}$ : $f^{(i)}(t) = \begin{cases} \alpha^{(i)}\cos(\sqrt{\lambda^{(i)}}t) + \beta^{(i)}\sin(\sqrt{\lambda^{(i)}}t) & \text{if } \lambda^{(i)}>0\\ C^{(i)}t + D^{(i)} & \text{if } \lambda^{(i)}=0\\ \alpha^{(i)}\cosh(\sqrt{-\lambda^{(i)}}t) + \beta^{(i)}\sinh(\sqrt{-\lambda^{(i)}}t) & \text{if } \lambda^{(i)}<0 \end{cases}$ where the $\alpha^{(i)}$ , $\beta^{(i)}$ , $C^{(i)}$ and $D^{(i)}$ are constants to be fixed by initial conditions. Note that whatever the value of $\lambda^{(i)}$ , each eigenvector leads to a two-dimensional space of solutions. Since the eigenvectors span $n$ -dimensional space, our ansatz gives us the full $2n$ -dimensional space of solutions to the linear equation. So we can write the general solution of the system in terms of the ansatz $\eqref{eq:normal-mode-ansatz}$ as $\mathbf{q}(t) = \sum_{i=1}^N {\bf v}^{(i)} f^{(i)}(t)\, .$

The qualitative behaviour of the solution depends on the sign of the eigenvalues $\lambda^{(i)}$ . For $\lambda^{(i)}$ all being positive we are at a local minimum, and we have oscillatory behaviour around the minimum. If we have a negative eigenvalue we instead have exponential behaviour away from the stationary point. This agrees with expectations: if we are at a maximum along some direction, small perturbations away the point will quickly grow, and we are trying to expand around an unstable solution. Finally, zero eigenvalues are associated with motion with constant velocity, displaying no oscillatory behaviour.

Each basic solution $\mathbf{q}(t) = {\bf v}^{(i)}\left(\alpha^{(i)}\cos(\sqrt{\lambda^{(i)}}t) + \beta^{(i)}\sin(\sqrt{\lambda^{(i)}}t)\right)$ associated with an eigenvalue $\lambda^{(i)}>0$ is a normal mode.

Each basic solution $\mathbf{q}(t) = {\bf v}^{(i)}\left(C^{(i)}t + D^{(i)}\right)$ associated with a zero eigenvalue $\lambda^{(i)}=0$ is a zero mode.

Each basic solution $\mathbf{q}(t) = {\bf v}^{(i)}\left(\alpha^{(i)}\cosh(\sqrt{-\lambda^{(i)}}t) + \beta^{(i)}\sinh(\sqrt{-\lambda^{(i)}}t)\right)$ associated with an eigenvalue $\lambda^{(i)}<0$ is an instability.

The general solution in the absence of instabilities is the superposition of the ordinary normal modes for the non-zero eigenvalues and the zero modes

Let me emphasize that the existence of zero modes is fairly brittle: if we slightly deform our starting potential $V(\mathbf{q})$ in a generic way, then the eigenvalues of $\mathsf{A}$ will generically change slightly, and the zero eigenvalues will generically becomes either positive or negative. So whenever we find a zero mode in a real physical system this tells us very valuable information: we expect to be able to find some principle that restricts the possible deformations of $V(\mathbf{q})$ !

As an example, imagine that we have two particles with the same mass moving in one dimension, located at $x_1$ and $x_2$ . Assume that the physics is independent of the choice of origin of coordinates, or equivalently that there is a symmetry $\begin{split} {x}_1 & \to {x}_1 + \epsilon {a}\\ {x}_2 & \to {x}_2 + \epsilon {a}\\ \end{split}$ for any constant ${a}$ . Then the potential can only depend on the difference ${x}_1-{x}_2$ : $L = \frac{1}{2}m(\dot{x}_1^2 + \dot{x}_2^2) - V(x_1-x_2)\, .$ This symmetry will then always lead to the existence of a zero mode, associated with translation of the centre of mass of the system. We can see this explicitly if we introduce new coordinates $x_+:=\frac{1}{\sqrt{2}}(x_1+x_2)$ , $x_-:=\frac{1}{\sqrt{2}}(x_1-x_2)$ . Our Lagrangian becomes $L = \frac{1}{2}m(\dot{x}_+^2 + \dot{x}_-^2) - V(\sqrt{2}x_-)$ which clearly leads to a zero mode for $x_+$ , no matter the specific form of $V$ . So in this case we find that the existence of the zero mode is ultimately protected by the translation symmetry!

Consider two pendula, each of length one with mass one, suspended a distance $d$ apart. Connecting the masses is a spring of constant $\kappa$ and also of natural length $d$ .

The velocity of the left hand mass is simply $((-\cos(\theta_1){\dot \theta})_1^2+(\sin(\theta_1){\dot \theta}_1)^2)={\dot \theta}_1^2$ . We get a similar result for the right hand mass so the total kinetic energy $T$ is $T=\frac{1}{2}\left ({\dot \theta}_1^2+{\dot \theta}_2^2\right ).$ The potential comes from gravity, which gives a contribution $g(-\cos(\theta_1)-\cos(\theta_2))$ , and from the spring. For a spring of constant $\kappa$ , its potential energy is given by $\kappa(l-d)^2/2$ , where $l-d$ is the extension of the spring. The length $l$ of the spring is given by Pythagoras Theorem as $l=\sqrt{(\sin(\theta_1)-\sin(\theta_2)+d)^2+(\cos(\theta_1)-\cos(\theta_2))^2}.$ Thus the Lagrangian for the system is given by $\begin{aligned} L&=& \frac{1}{2}\left ({\dot \theta}_1^2+{\dot \theta}_2^2\right )+g(\cos(\theta_1)+\cos(\theta_2))\\ &-&\frac{\kappa}{2}\left (\sqrt{(\sin(\theta_1)-\sin(\theta_2)+d)^2+(\cos(\theta_1)-\cos(\theta_2))^2}-d\right )^2. \end{aligned}$ Finding the exact solution to the equations of motion resulting from this Lagrangian seems hopeless. However, it is clear that the system would be happy to sit at $\theta_1=\theta_2=0$ , as this configuration minimises both the gravitational potential energy, and the spring energy since the spring would be at its natural unextended length $d$ . Let us now try to find an approximate Lagrangian which describes the system when $\theta_i\ll 1$ .

Approximating the gravitational potential is easy. $\cos(\theta)\approx 1-\theta^2/2+O(\theta^4)$ so we can take $-g(\cos(\theta_1)+\cos(\theta_2))=-g\left (2-\frac{\theta_1^2}{2}-\frac{\theta_2^2}{2}\right ).$ The constant term $-2g$ can be discarded using the usual reason that additions of constants to potentials/Lagrangians has no effect. The spring potential looks more tricky to deal with, but note that to calculate $\kappa(l-d)^2/2$ to quadratic order in the small $\theta_i$ we only need to calculate $l-d$ to order $\theta_i$ , since it is linear in the $\theta_i$ : $\begin{aligned} l-d &=\sqrt{(\sin(\theta_1)-\sin(\theta_2)+d)^2+(\cos(\theta_1)-\cos(\theta_2))^2}-d\\ &=\sqrt{(\sin(\theta_1)-\sin(\theta_2)+d)^2}-d +O(\theta^2)\\ & = |\sin(\theta_1)-\sin(\theta_2)+d| - d\\ &=\theta_1-\theta_2\, ,\end{aligned}$ where in the last step we have used that for small angles $\sin(\theta_1)-\sin(\theta_2)\approx \theta_1-\theta_2$ , and $|\theta_1-\theta_2|\ll d$ , so $|\sin(\theta_1)-\sin(\theta_2)+d|\approx \theta_1-\theta_2+d$ . The approximate Lagrangian is then $L_{\text{approx}}=\frac{1}{2}\left ({\dot \theta}_1^2+{\dot \theta}_2^2\right )-\frac{g}{2}\left (\theta_1^2+\theta_2^2\right )-\frac{\kappa}{2}\left (\theta_1-\theta_2\right )^2.$ The equations which follow from this are $\begin{aligned} {\ddot \theta}_1 +(g+\kappa)\theta_1 - \kappa \theta_2 &=& 0\\ {\ddot \theta}_2 - \kappa \theta_1 + (g+\kappa) \theta_2 &=&0.\end{aligned}$ If one arranges the equations of motion in this way, so that all the terms proportional to $\theta_1$ and those proportional to $\theta_2$ appear in columns then it is straightforward to read the elements of matrix $A$ from the equations as $A=\begin{pmatrix} g+\kappa & -\kappa \\ -\kappa & g+\kappa \end{pmatrix}.$ Solving for the eigenvalues of $A$ we find that $\lambda = g$ or $g+2\kappa$ , with eigenvectors $(1,1)$ or $(1,-1)$ respectively. So we can write the normal modes as $\begin{pmatrix}\theta_1\\ \theta_2\end{pmatrix} = \begin{pmatrix}1\\1 \end{pmatrix} (\alpha^{1}\cos(\sqrt{g}t) + \beta^{(1)}\sin(\sqrt{g}t))$ and $\begin{pmatrix}\theta_1\\ \theta_2\end{pmatrix} = \begin{pmatrix}1\\-1 \end{pmatrix} (\alpha^{2}\cos(\sqrt{g+2\kappa}t) + \beta^{(2)}\sin(\sqrt{g+2\kappa}t))\, .$ The first of these has $\theta_1=\theta_2$ whilst the second has $\theta_1=-\theta_2$ . These two normal modes can be pictured as follows:

For the normal mode which has $\theta_1=\theta_2$ , the spring always remains exactly length $d$ and therefore remains unextended and exerts no force. The result of this is that the angular frequency or this normal mode is $\sqrt{g}$ which does not involve $\kappa$ the spring constant. On the other hand, for the second normal mode the pendula move in opposite directions, and in this case the spring stretches and contracts, enhancing the effect of gravity which results in an angular frequency $\sqrt{g+2\kappa}$ which is greater than that of the first normal mode in which only gravity plays a role.

The general solution of the system is thus given by $\begin{split} \begin{pmatrix} \theta_1 \\ \theta_2 \end{pmatrix} & = \begin{pmatrix}1 \\ 1\end{pmatrix} \left[ \alpha^{(1)} \cos(t\sqrt{g}) + \beta^{(1)}\sin(t\sqrt{g}) \right] \\ & \phantom{=} + \begin{pmatrix}1 \\ -1\end{pmatrix} \left[ \alpha^{(2)} \cos(t\sqrt{g+2\kappa}) + \beta^{(2)}\sin(t\sqrt{g+2\kappa}) \right] \end{split}$ with $\alpha^{(i)}$ and $\beta^{(i)}$ arbitrary constants. To see how the general solution we found helps in practice when studying the motion of the system, let us use this solution to study what happens if we release the two masses from rest at $t=0$ from $\theta_1=-\theta_2=\delta$ . Setting $\theta_1=-\theta_2=\delta$ at $t=0$ we find $\begin{pmatrix} \delta \\ -\delta \end{pmatrix} = \begin{pmatrix} \alpha^{(1)} + \alpha^{(2)} \\ \alpha^{(1)} - \alpha^{(2)} \end{pmatrix}$ so $\alpha^{(1)}=0$ and $\alpha^{(2)}=\delta$ . Similarly, the condition that the masses are released from rest is encoded in $\begin{pmatrix} \dot{\theta}_1(t=0) \\ \dot{\theta}_2(t=0) \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}$ which taking derivatives in our general solution is easily shown to lead to $\begin{pmatrix} 0 \\ 0 \end{pmatrix} = \begin{pmatrix} \beta^{(1)} + \beta^{(2)} \\ \beta^{(1)} - \beta^{(2)} \end{pmatrix}$ which implies $\beta^{(1)}=\beta^{(2)}=0$ . So we find that the motion is given by $\begin{pmatrix} \theta_1 \\ \theta_2 \end{pmatrix} = \begin{pmatrix} \delta \\ -\delta \end{pmatrix} \cos(t\sqrt{g+2\kappa})$ which is an oscillatory motion in which the masses move oppositely, without changing the centre of mass, as one might have guessed.

4.2 Non-canonical kinetic terms

Finally, we consider configurations with non-canonical kinetic terms of the form $L = \frac{1}{2}\sum_{i,j} \mathsf{B}_{ij}(q)\dot{q}_i\dot{q}_j - V(q)\, .$ We still obtain a linear differential operator if we restrict $\mathsf{B}(q)\to \mathsf{B}(0)$ . Physically, this corresponds to considering oscillations with not too much kinetic energy, which makes sense if we want to stay at the minimum. The resulting equations of motion are $\mathsf{B}\mathbf{\ddot{q}}+ \mathsf{A}{\bf q} = 0$ where we have defined $\mathsf{B}\equiv\mathsf{B}(0)$ , a constant matrix. $\mathsf{B}$ generally does not have zero eigenvalues (since this would correspond to generalised coordinates without a kinetic term), so we assume no zero eigenvalues. This implies that $\det(\mathsf{B})\neq 0$ , and so $\mathsf{B}^{-1}$ exists. We then have an equivalent set of equations $\mathbf{\ddot{q}}+ \mathsf{B}^{-1}\mathsf{A}{\bf q} = 0$ which reduces to the case we have already studied if you define $\mathsf{C}:=\mathsf{B}^{-1}\mathsf{A}$ .

There is one small subtlety that needs to be mentioned here: the fact that $\mathsf{A}$ was symmetric was quite important in our discussion above, since it ensured that its eigenvalues were real, but in general $\mathsf{B}^{-1}\mathsf{A}$ will not be symmetric, even if both $\mathsf{B}^{-1}$ and $\mathsf{A}$ separately are. Let us assume that $\mathsf{A}$ is positive semi-definite: that is, all its eigenvalues are either positive or zero. Then it exists a symmetric matrix $\mathsf{A}^{\frac{1}{2}}$ such that $(\mathsf{A}^{\frac{1}{2}})^2 = \mathsf{A}$ .¹² We can use this matrix to rewrite $\mathsf{C}:=\mathsf{B}^{-1}\mathsf{A}= \mathsf{A}^{-\frac{1}{2}}\left(\mathsf{A}^{\frac{1}{2}}\mathsf{B}^{-1}\mathsf{A}^{\frac{1}{2}}\right)\mathsf{A}^{\frac{1}{2}}$ so we find that $\mathsf{C}$ is similar (in the sense of similarity transformations of matrices) to $\mathsf{A}^{\frac{1}{2}}\mathsf{B}^{-1}\mathsf{A}^{\frac{1}{2}}$ . This matrix is manifestly symmetric, so its eigenvalues are real. Since similar matrices have the same eigenvalues, the eigenvalues of $\mathsf{C}$ will be real too. It is straightforward to check that if ${\mathbf v}$ is an eigenvector of $\mathsf{A}^{\frac{1}{2}}\mathsf{B}^{-1}\mathsf{A}^{\frac{1}{2}}$ with eigenvalue $\lambda$ , then $\mathsf{A}^{-\frac{1}{2}}{\mathbf v}$ will be an eigenvector of $\mathsf{C}$ with the same eigenvalue. So, in practice, we can simply compute the eigenvalues and eigenvectors of $\mathsf{C}$ , and proceed as we did above.

5 Fields and The Wave Equation

5.1 Variational Principle for Continuous Systems

In the next section we will derive the equations of motion for the string. Before going into the details of that particular system, we will derive in general how to deduce the Euler-Lagrange equations for fields, which is a simple generalisation of what we did in the case of systems with a finite number of degrees of freedom.

Assume that we can express the action $S$ in terms of some Lagrangian density $\mathcal{L}$ (we will determine $\mathcal{L}$ for the string in the next section) $S = \int\!\! dt \!\!\int\!\! dx \,\, \mathcal{L}(u, u_t, u_x, x, t)$ where we have introduced for convenience the notation $u_x :=\frac{\partial u}{\partial x} \qquad ; \qquad u_t :=\frac{\partial u}{\partial t}\, .$

I emphasize that that the “ $x$ ” coordinate plays a significantly different role in field theory than it did for the point particle: in field theory “ $x$ ” is a coordinate that fields depend on, and it is on the same footing as “ $t$ ”. They are, in particular, independent variables, and they are not generalized coordinates.

On the other hand, for the point particle we often denoted by “ $x(t)$ ” the position of the particle, which was a generalized coordinate that for any given path was a function of time. In field theory the closest thing to this “ $x$ ” is the field value “ $u(x,t)$ ”.

There is an important notational point that I want to clarify: say that we have a Lagrangian density $\mathcal{L}(u,u_x,u_t,x,t)$ depending on the field, its first derivatives, and $x$ and $t$ themselves. Then we have two notions of “derivative of $\mathcal{L}$ with respect to $t$ ” (the following discussion generalizes straightforwardly to $x$ , so I will not consider this case separately). We might mean either:

The derivative with respect to any explicit appearances of $t$ , keeping $u$ , $u_x$ , $u_t$ and $x$ fixed.
The derivative of $\mathcal{L}$ with respect to $t$ , taking into account that $u$ , $u_x$ and $u_t$ are functions of $t$ , so we need to use the chain rule.

In the context of the point particle, we denoted the first derivative “ $\partial/\partial t$ ” and the second “ $d/dt$ ”.

In the context of field theory it is more common and useful to switch conventions, and denote the second option by $\partial\mathcal{L}/\partial t$ . That is, we define: $\begin{split} \frac{\partial \mathcal{L}(u,u_x,u_t,x,t)}{\partial t} :=\lim_{h\to 0} \frac{1}{h}(&\mathcal{L}(u(x,t+h), u_x(x,t+h), u_t(x,t+h), x, t+h) \\ & - \mathcal{L}(u(x,t), u_x(x,t), u_t(x,t), x, t)) \end{split}$ We will simply never need to consider the first notion of partial derivative in the context of fields during this course, so this leads to no ambiguity.

The main reason to switch conventions is that this reproduces the natural definition: $u_t :=\frac{\partial u(x,t)}{\partial t}$ that we gave above, since the meaning of the derivative here is the usual one: we are varying $t$ keeping $x$ fixed.

In this case we expect to be able to derive the equations of motion for the system by making use of the variational principle we discussed in previous sections. To see how this goes, consider a solution of the equations of motion $u_s(x,t)$ , and consider a small variation $\delta u(x,t)$ around it: $u(x,t) = u_s(x,t) + \delta u(x,t)\, .$ If $u_s$ is indeed an stationary function for the action, we expect the first order change in $S$ $\delta S = S[u_s+\delta u] - S[u_s] = \int\!\! dt \!\!\int\!\! dx \,\, \left(\delta u \frac{\partial \mathcal{L}}{\partial u} + \delta u_x \frac{\partial \mathcal{L}}{\partial u_x} + \delta u_t \frac{\partial \mathcal{L}}{\partial u_t} \right) + O((\delta u)^2)$ to vanish.

We will work to first order, and drop the $O((\delta u)^2)$ terms henceforth. Now, for our variations we have $\delta u_x = \delta \left(\frac{\partial u}{\partial x}\right) = \frac{\partial}{\partial x}(\delta u) \qquad ; \qquad \delta u_t = \delta \left(\frac{\partial u}{\partial t}\right) = \frac{\partial}{\partial t}(\delta u)$ which allows us to integrate $\delta S$ by parts, in order to obtain $\delta S = \int\!\! dt \!\!\int\!\! dx \,\, \delta u \left(\frac{\partial \mathcal{L}}{\partial u} - \frac{\partial}{\partial x} \left(\frac{\partial \mathcal{L}}{\partial u_x}\right) - \frac{\partial}{\partial t} \left(\frac{\partial \mathcal{L}}{\partial u_t}\right) \right) + \int\!\!dt \left[\delta u \frac{\partial \mathcal{L}}{\partial u_x}\right]_{x_i}^{x_f} + \int\!\!dx \left[\delta u \frac{\partial \mathcal{L}}{\partial u_t}\right]_{t_i}^{t_f}$ If we assume that we hold $u$ fixed at the endpoints both in $x$ and $t$ the last two terms on the right cancel. Imposing $\delta S = 0$ for arbitrary $\delta u$ then implies, by the fundamental lemma of the calculus of variations,¹³ that the generalised Euler-Lagrange equations for fields

equation* - () - () = 0 .

Here are some easy generalisations. Clearly, if we have $n$ fields $u^{(i)}$ we end up with $n$ generalised equations of motion: $\frac{\partial \mathcal{L}}{\partial u^{(i)}} - \frac{\partial}{\partial x} \left(\frac{\partial \mathcal{L}}{\partial u^{(i)}_x}\right) - \frac{\partial}{\partial t} \left(\frac{\partial \mathcal{L}}{\partial u^{(i)}_t}\right) = 0 \qquad \text{for all } i\, .$ Another possible easy generalisation is considering fields that depend on more coordinates than two. If we replace $(t,x)$ by a set of $d+1$ coordinates $(x_0,\ldots,x_d)$ we have $\frac{\partial \mathcal{L}}{\partial u^{(i)}} - \sum_{k=0}^d\frac{\partial}{\partial x_k} \left(\frac{\partial \mathcal{L}}{\partial u^{(i)}_k}\right) = 0 \qquad \text{for all } i$ where we have defined $u_k^{(i)}:=\frac{\partial u^{(i)}}{\partial x_k}$ .

5.2 Example: the wave equation from the Lagrangian for a string

Our main example will be a Lagrangian density that can be thought of the Lagrangian density for the one-dimensional string oscillating in one dimension. The standard name for this Lagrangian is the “massless scalar field” Lagrangian.

The massless scalar field Lagrangian is $\mathcal{L}= \frac{1}{2} \rho u_t^2 - \frac{1}{2}\tau u_x^2\, .$ We refer to the constants $\rho$ and $\tau$ as the density and tension, respectively. The field “ $u$ ” in this expression is the massless scalar.

It is in fact possible, and we do this in section 5.2.1 below, to derive this Lagrangian density from the physics of an idealized string in the limit in which the oscillations are small. This explains the origin of the labels “density” and “tension” above. I emphasize that the uses of this Lagrangian in Mathematical Physics go well beyond explaining vibrating strings.

The Euler-Lagrange equations for fields immediately imply the equation of motion $\rho u_{tt} - \tau u_{xx} = 0$ for the massless scalar $u$ , where $u_{tt}:=\frac{\partial^2 u}{\partial t^2} = \frac{\partial u_t}{\partial t}$ and similarly for $u_{xx}$ . Introducing for convenience $c^2=\tau/\rho$ (both the tension and the density are assumed to be positive, so $c$ is real), the equation of motion for the massless scalar becomes: $u_{tt} = c^2 u_{xx}\, .$ We will refer to this equation as the wave equation. More precisely, what we are describing here is known as the wave equation in one spatial dimension.

5.2.1 Derivation of the massless scalar Lagrangian from a physical system

We will now derive the massless scalar Lagrangian from the dynamics of a string vibrating in one dimension, in the approximation where the displacements are small. Similarly to the case of point particles, the Lagrangian density can be constructed in terms of the kinetic and potential energy densities. That is, if we have $T(u,u_x,u_t,x,t) = \int\!\! dx \, \mathcal{T}(u,u_x,u_t,x,t)$ and $V(u,u_x,u_t,x,t) = \int\!\! dx \, \mathcal{V}(u,u_x,u_t,x,t)$ for the total kinetic energy $T$ and total potential energy $V$ of the string, then we call $\mathcal{T}$ and $\mathcal{V}$ the corresponding densities of kinetic and potential energy, and we have $\mathcal{L}= \mathcal{T}-\mathcal{V}$ So we need to find expressions for the kinetic and potential energy densities. We will work to leading (that is, quadratic) order in $u_x$ and $u_t$ . This is the regime in which the oscillations are neither too large nor too fast. We do this because it leads to much simpler equations, while still being quite useful for modelling many systems in Nature. Similarly, we will assume that the string is only displaced vertically, without any horizontal displacement.

The kinetic energy can be obtained relatively straightforwardly by subdividing the string into small pieces. Consider the small piece lying between $x$ and $x+\delta x$ . If the segment is small enough its behaviour will be approximately point-like; therefore its kinetic energy will be of the form $\frac{m}{2}v^2$ . The mass of the small segment of string is given by $m=\rho\, ds \approx \rho \sqrt{1+(u_x)^2}\, \delta x \approx \rho\, \delta x.$ Here $\rho$ is the density of the string (which we take to be constant), and $ds$ the arc-length of the string segment. The final approximation follows from taking $u_x \ll 1$ . Since $u(x,t)$ denotes the vertical displacement of the string it is clear that the vertical velocity is $u_t$ . The contribution to the kinetic energy from the small piece of string that we are considering is then $\frac{1}{2}(u_t)^2\rho \, \delta x$ . We then immediately obtain the kinetic energy of the whole string by integrating over all the segments to find that the kinetic energy is given by $T = \frac{\rho}{2}\int_{-\infty}^\infty \!\! dx \left ( u_t \right )^2$ so the kinetic energy density is $\mathcal{T}=\frac{\rho}{2}(u_t)^2 \, .$

Obtaining the potential energy is a little bit more subtle. We know that the tension in the string is a constant, which we call $\tau$ . It follows that the work done in extending the string’s length by a distance $\delta l$ will be $\tau \delta l$ . If we imagine extruding the entire length of the string from a point we reach the conclusion that the potential energy of the string is $\tau$ times its length. Of course, our string is infinitely long, so that this may initially be a concern, until we recall that adding a constant to the potential energy makes no difference. We are not really interested in the absolute value of the potential energy, but rather the differences in potential energy between string in various configurations. Therefore we will take the potential energy of a string in some configuration $u(x,t)$ to be defined as $\tau$ times the difference in length between the string with shape $u(x,t)$ and the length of the undisturbed string lying along the $x$ -axis for which $u(x,t)=0$ . To be more precise we have $\begin{aligned} V &=& \tau\left ( \int_{-\infty}^{\infty} ds - \int_{-\infty}^{\infty}dx\right )\\ &=& \tau\left ( \int_{-\infty}^{\infty}( \sqrt{1+(u_x)^2} - 1)dx\right )\\ &\approx& \tau \left ( \int_{-\infty}^{\infty}(1+\frac{(u_x)^2}{2} - 1)dx\right )\\ &=&\frac{\tau}{2} \int_{-\infty}^{\infty}(u_x)^2 \, dx\end{aligned}$ again to leading order in oscillations. From here we obtain the potential energy density $\mathcal{V}= \frac{\tau}{2}(u_x)^2$ and thus the Lagrangian density $\mathcal{L}= \mathcal{T}-\mathcal{V}= \frac{\rho}{2}(u_t)^2 - \frac{\tau}{2} (u_x)^2\, .$

5.3 D’Alembert’s Solution to the Wave Equation

The general solution to the wave equation in one spatial dimension was given by D’Alembert, and it is simply $u(x,t)=f(x-ct)+g(x+ct)$ where $f$ and $g$ are arbitrary functions. The part of the solution $f(x-ct)$ corresponds to a wave moving to the right with speed $c$ , whilst the remaining part $g(x+ct)$ corresponds to a wave moving to the left with speed $c$ .

D’Alembert’s solution $u(x,t)=f(x-ct)+g(x+ct)$ is the general solution to the wave equation.

We introduce new variables $x_+=x+ct$ and $x_-=x-ct$ , or equivalently $x=\frac{1}{2}(x_++x_-)$ and $t=\frac{1}{2c}(x_+-x_-)$ . By the Chain Rule: $\begin{split} \frac{\partial u}{\partial x} & = \frac{\partial u}{\partial x_+}\frac{\partial x_+}{\partial x} + \frac{\partial u}{\partial x_-}\frac{\partial x_-}{\partial x} = \frac{\partial u}{\partial x_+} + \frac{\partial u}{\partial x_-}\,, \\ \frac{\partial u}{\partial t} & = \frac{\partial u}{\partial x_+}\frac{\partial x_+}{\partial t} + \frac{\partial u}{\partial x_-}\frac{\partial x_-}{\partial t} = c\left(\frac{\partial u}{\partial x_+} - \frac{\partial u}{\partial x_-}\right)\, . \end{split}$ Taking derivatives again, once more using the Chain Rule: $\begin{split} \frac{\partial^2 u}{\partial x^2} & = \frac{\partial}{\partial x}\left(\frac{\partial u}{\partial x_+} + \frac{\partial u}{\partial x_-}\right) = \frac{\partial}{\partial x_+}\left(\frac{\partial u}{\partial x_+} + \frac{\partial u}{\partial x_-}\right) + \frac{\partial}{\partial x_-}\left(\frac{\partial u}{\partial x_+} + \frac{\partial u}{\partial x_-}\right) \\ & = \frac{\partial^2u}{\partial x_+^2} + \frac{\partial^2u}{\partial x_-^2} + 2 \frac{\partial^2u}{\partial x_+\partial x_-}\\ \frac{\partial^2 u}{\partial t^2} & = c\frac{\partial}{\partial t}\left(\frac{\partial u}{\partial x_+} - \frac{\partial u}{\partial x_-}\right) = c^2\frac{\partial}{\partial x_+}\left(\frac{\partial u}{\partial x_+} - \frac{\partial u}{\partial x_-}\right) - \frac{\partial}{\partial x_-}\left(\frac{\partial u}{\partial x_+} - \frac{\partial u}{\partial x_-}\right) \\ & = c^2\left(\frac{\partial^2u}{\partial x_+^2} + \frac{\partial^2u}{\partial x_-^2} - 2 \frac{\partial^2u}{\partial x_+\partial x_-}\right) \end{split}$ As usual, we have used the assumption that partial derivatives commute. We see that in these variables the wave equation $u_{tt}=c^2u_{xx}$ becomes $u_{tt} - c^2u_{xx} = -4c^2 \frac{\partial^2u(x_+,x_-)}{\partial x_+ \partial x_-} = 0\, .$ The general solution of this equation is indeed $u(x_+, x_-) = f(x_-) + g(x_+)\, .$

In practice, we are often interested in understanding what happens if we release a string from a given configuration. How does the string evolve? This is an initial value problem, which D’Alembert also solved in general. Assume that we are told that at $t=0$ the string has profile $\psi(x)$ , that is $u(x,0) = \varphi(x)$ and in addition we know with which speed the string is moving at that instant: $u_t(x,0) = \psi(x)\, .$ In terms of $f$ and $g$ , which parametrise the general form of the solution, these equation are $f(x) + g(x) = \varphi(x)$ and $-c f'(x) + c g'(x) = \psi(x)\, .$ This last equation can be integrated (formally) to give $g(x) - f(x) = d + \frac{1}{c}\int_{-\infty}^x \!ds\, \psi(s)$ with $d$ some unknown constant. We now have two equations for two unknowns, so solving for $f$ and $g$ we find $\begin{aligned} f(x) & = \frac{1}{2}\left(\varphi(x) - d - \frac{1}{c}\int_{-\infty}^x \!ds\, \psi(s)\right) \\ g(x) & = \frac{1}{2}\left(\varphi(x) + d + \frac{1}{c}\int_{-\infty}^x \!ds\, \psi(s)\right)\end{aligned}$ so we finally find $\begin{split} u(x,t) & =f(x-ct) + g(x+ct) \\ & = \frac{\varphi(x-ct)+\varphi(x+ct)}{2} + \frac{1}{2c}\int_{x-ct}^{x+ct} \!ds\, \psi(s)\, . \end{split}$

5.4 Noether’s theorem for fields

The Lagrangian density that we found for the string does not involve $u$ explicitly, which only enters through its derivatives. This is a situation analogous to that of having an ignorable coordinate in the case of point particles. So we should expect that there is a symmetry associated to this fact, generated by the infinitesimal transformation $u\to u' = u + \epsilon$ , and associated to this symmetry some conserved quantity, by some analogue for fields of Noether’s theorem. This analogue does exist, as we now describe.

It is illuminating to do this more generally, for $d$ spatial dimensions, and arbitrary symmetries. So let us introduce coordinates $x_0,\ldots,x_d$ . The case $d=1$ would have $x_0=t$ , $x_1=x$ . Our field $u(x_0,\ldots,x_d)$ is a map from $\mathbb{R}^{d+1}\to\mathbb{R}$ . For convenience, we introduce the notation $u_i :=\frac{\partial u}{\partial x_i}\, .$

A symmetry (in the context of field theory) is a transformation $u \to u' = u + \epsilon a(u)$ such that $\delta \mathcal{L}= O(\epsilon^2)$ without having to use the equations of motion.

We could include a total derivative, or more precisely, a divergence, on the right hand side of the variation $\delta\mathcal{L}$ , as we did in the case of the point particle, but we ignore this possibility for simplicity.

We define the generalised momentum vector ${\bf \Pi} :=\left(\frac{\partial\mathcal{L}}{\partial u_0}, \ldots, \frac{\partial\mathcal{L}}{\partial u_d}\right)\, .$

Given a transformation generated by $a$ , we define the Noether current associated to the transformation by ${\bf J} :=a {\bf \Pi}\, ,$ or in components $J_i :=a \frac{\partial \mathcal{L}}{\partial u_i}\, .$

[Noether’s theorem for fields] [thm:Noether-for-fields] If $\bf J$ is the Noether current associated to a symmetry, then

We can proceed analogously as to what we did when proving Noether’s theorem for discrete systems. Under a generic transformation we have $\delta\mathcal{L}= \epsilon a \frac{\partial \mathcal{L}}{\partial u} + \epsilon \sum_{i=0}^d \frac{\partial a}{\partial x_i} \frac{\partial \mathcal{L}}{\partial u_i} + O(\epsilon^2)\, .$ Using the Euler-Lagrange equations, this becomes $\delta\mathcal{L}= \epsilon \sum_{i=0}^d \frac{\partial}{\partial x_i}\left(a\frac{\partial \mathcal{L}}{\partial u_i}\right) + O(\epsilon^2)$ which equating with the explicit action of the symmetry on $\mathcal{L}$ leads to $\sum_{i=0}^d \frac{\partial}{\partial x_i} \left(a\frac{\partial \mathcal{L}}{\partial u_i} \right) = 0\, .$

Given a Noether current $\bf J$ associated to a transformation, we define the (Noether) charge density $\mathcal{Q}:=J_0\, .$ Furthermore, in the $d=1$ case (one spatial dimension)¹⁴ we define the charge contained in an interval $(a,b)$ to be $Q_{(a,b)} :=\int_a^b \mathcal{Q}\, dx = \int_a^b J_0\, dx\, .$

Assume $d=1$ . Then $\frac{dQ_{(a,b)}}{dt} = J_1(a) - J_1(b)\, .$

Taking the derivative inside the integral, we have $\begin{aligned} \frac{dQ_{(a,b)}}{dt} & = \frac{d}{dt}\int_a^b J_0 \, dx\\ & = \int_a^b \frac{\partial J_0}{\partial t} dx\, . \end{aligned}$ Now, in our $d=1$ case the conservation equation $\eqref{eq:current-conservation}$ becomes $\frac{\partial J_0}{\partial t} + \frac{\partial J_1}{\partial x} = 0$ so replacing $\frac{\partial J_0}{\partial t}$ by $-\frac{\partial J_1}{\partial x}$ inside the integral above we have $\frac{dQ_{(a,b)}}{dt} = - \int_a^b \frac{\partial J_1}{\partial x} dx = J_1(a) - J_1(b)\, .$

The way to interpret proposition [prop:local-charge-conservation] is that it is telling us that the charge within some region changes only due to charge leaving or entering through the boundaries of the region. The current $J_1$ measures how much charge is leaving or entering by unit time on a given boundary component.

Given a Noether current $\bf J$ associated to a transformation, we define the Noether charge to be the total charge over all space. In the case of one spatial dimension ( $d=1$ ) this is $Q :=Q_{(-\infty, \infty)} = \int_{-\infty}^\infty J_0 \, dx\, .$

Assume that $d=1$ , and $\lim_{x\to \pm\infty} J_1=0$ . Then $\frac{dQ}{dt} = 0$ for the Noether charge associated to a symmetry.

This follows immediately from proposition [prop:local-charge-conservation], since we assume $J_1(\pm \infty)=0$ .

Let us apply all this abstract discussion to our guiding example, the one-dimensional string, and the symmetry arising from $u$ being ignorable, namely $u\to u+\epsilon$ . In this case we have $a=1$ , so the Noether current is simply given by ${\bf J} = {\bf \Pi} = \left(\frac{\partial\mathcal{L}}{\partial u_t}, \frac{\partial\mathcal{L}}{\partial u_x}\right) = (\rho u_t, -\tau u_x) \, .$ From here, we conclude that the Noether charge $Q = \int \!dx \, {\bf J}_{0} = \rho \int \!dx \, u_{t}$ is conserved in time, assuming that $J_1=-\tau u_x$ vanishes at infinity (in this case, since $\tau$ is a non-zero constant, this is equivalent to $u_x$ vanishing at infinity). Indeed $\frac{dQ}{dt} = \rho \int\! dx \, u_{tt} = \tau \int\! dx \, u_{xx} = \tau \bigl[u_x \bigr]_{-\infty}^{+\infty} = 0\, ,$ where in the middle step we have used the wave equation $\rho u_{tt}=\tau u_{xx}$ for the string.

5.5 The Energy-Momentum Tensor

In addition to the conservation laws for transformations of the field itself, we also expect conservation laws associated to transformations of $x$ and $t$ . This is analogous to the fact that for systems with discrete degrees of freedom, we could construct an energy that satisfied $\frac{dE}{dt} = -\frac{\partial L}{\partial t}$ Since $t$ does not appear explicitly in the Lagrangian density for the string, we would expect energy to be conserved for oscillations of the string too. And indeed, it will prove quite easy to show that the total energy of the string is conserved. But the situation for the string is more interesting than that for the point particle. The string’s energy is distributed along its length; some places may have no energy, whilst other parts of the string may be very energetic. As a wave packet travels, regions that had no energy may energise for some time, and then come back to having no energy. So we should not expect to have that the energy density at any given point is conserved. Additionally, in the case of fields the $t$ and $x$ directions are treated on equal footing, so there should be some generalised notion that treat the $x$ variable the same as the $t$ variable.

The energy-momentum tensor is

The energy density $\mathcal{E}$ is defined to be equal to $T_{00}$ .

As for the case of the point particle, you can convince yourself that this definition of the energy density agrees with the ordinary one whenever the Lagrangian density is of the form $\mathcal{L}=\frac{1}{2}\rho u_t^2-\frac{1}{2}\tau u_x^2 - \mathcal{V}(u)$ ; that is, a kinetic energy density minus a potential energy contribution (which in this case contains a possible contribution from the string tension, plus an additional term $\mathcal{V}(u)$ containing arbitrary extra contributions to the potential energy). See for instance example [example:energy-momentum-sensor-1d-string] below. In cases where the Lagrangian density is not of this form we can still define the energy-momentum tensor, and we simply define the energy density to be the $T_{00}$ component.

The conservation laws for the energy-momentum tensor are:

Consider the variation of the Lagrangian density $\mathcal{L}(u,u_0,\ldots,u_d)$ as we move in the $x_i$ direction.¹⁵ By the Chain Rule, this is given by $\frac{\partial\mathcal{L}}{\partial x_i} = \frac{\partial\mathcal{L}}{\partial u} \frac{\partial u}{\partial x_i} + \sum_{j=0}^d \frac{\partial\mathcal{L}}{\partial u_j} \frac{\partial^2 u}{\partial x_i \partial x_j}$ Using the Euler-Lagrange equations for the field, we can rewrite this as $\begin{split} \frac{\partial\mathcal{L}}{\partial x_i} & = \left(\sum_{j=0}^d\frac{\partial}{\partial x_j}\left(\frac{\partial \mathcal{L}}{\partial u_j}\right) \right) \frac{\partial u}{\partial x_i} + \sum_{j=0}^d \frac{\partial\mathcal{L}}{\partial u_j} \frac{\partial^2 u}{\partial x_i \partial x_j} \\ & = \sum_{j=0}^d\frac{\partial}{\partial x_j}\left(\frac{\partial \mathcal{L}}{\partial u_j}\frac{\partial u}{\partial x_i}\right) \end{split}$ or equivalently $\sum_{j=0}^d \frac{\partial}{\partial x_j}\left(\frac{\partial \mathcal{L}}{\partial u_j}\frac{\partial u}{\partial x_i} - \delta_{ij}\mathcal{L}\right) = 0\, .$

Note that we have $d+1$ conservation equations for the energy-momentum tensor, one for each choice of “ $i$ ”.

This may look a little complicated, but it is not hard to evaluate in practice. For instance, for our string we have $T_{tt} = u_t\frac{\partial\mathcal{L}}{\partial u_t} - \mathcal{L}= \frac{\rho}{2} (u_{t})^2 + \frac{\tau}{2} (u_{x})^2$ which is indeed the energy density for the string. The rest of the components can be computed similarly, with the result $T = \begin{pmatrix} \frac{\rho}{2} (u_{t})^2 + \frac{\tau}{2} (u_{x})^2 & -\tau u_t u_x \\ \rho u_t u_x & - \frac{\rho}{2} (u_{t})^2 - \frac{\tau}{2} (u_{x})^2 \end{pmatrix}$ The conservation laws in the case of the string are then: $\frac{\partial T_{tt}}{\partial t} + \frac{\partial T_{tx}}{\partial x} = 0$ and similarly $\frac{\partial T_{xt}}{\partial t} + \frac{\partial T_{xx}}{\partial x} = 0$ In order to see what these laws mean physically, let us denote the energy in the piece of string lying between $x=a$ and $x=b$ by $E_{(a,b)}(t)$ . Since we had that the energy density is given by $T_{tt}$ , we have that $E_{(a,b)}=\int_{a}^{b} T_{tt}\, dx.$ The energy in this piece of string will not be conserved. It might be at rest at one time, and then a few seconds later acquire energy as a wave passes between $x=a$ and $x=b$ , and then later, lose all its energy as the wave passes on. How the energy in this portion of the string varies is given by $\begin{split} \frac{d}{dt}(E_{(a,b)}(t)) & = \frac{d}{dt} \int_a^b \! T_{tt}\, dx\\ & = \int_{a}^{b} \frac{\partial T_{tt}}{\partial t} \, dx \\ & = - \int_{a}^{b} \frac{\partial T_{tx}}{\partial x}\, dx \\ & = -\left[T_{tx}\right]_{a}^{b}\\ & = (T_{tx})_{x=a}-(T_{tx})_{x=b} \end{split}$ where in going from the second to the third line we have used the conservation law. In this way, the rate of change in the energy in the interval $(a,b)$ can be expressed in terms of the difference of a function evaluated at $x=a$ and $x=b$ . If we interpret $T_{tx}=-\tau u_t u_x$ as the flux of energy moving from left to right, then our formula can be interpreted as the rate of change of energy of the string in the interval $(a,b)$ is equal to the flux of energy coming into the segment of string from the left at $x=a$ minus the flux of energy leaving the string segment to the right at $x=b$ .

Note that the the rate of change of $E$ , the total energy on the whole string, is given by $\frac{dE}{dt}=\frac{d}{dt}\left (E_{(-\infty,\infty)}\right )= \tau \left [ u_t u_x \right ]_{-\infty}^\infty.$ This rate of change vanishes, so that the total energy is conserved, provided that $u_t u_x\to 0$ as $|x|\to \infty$ . In other words, the energy is conserved provided none of it leaks away at infinity. If we disturb the string at $t=0$ near $x=0$ , it will take an infinite amount of time before the disturbance propagates out to infinity, so indeed energy will be conserved.

5.6 Monochromatic Waves

We have already seen that we can write down a general solution to the wave equation, which is solvable as a result of its linearity. Below we will analyse what happens to waves in the presence of boundaries and junctions. This analysis is often simplified if, rather than considering what happens to an arbitrary wave on the string, we ‘decompose’ the wave into its various constituent wavelengths and consider what happens to each wavelength separately. Using the linearity of the wave equation, the full answer can then be reconstructed by superposing the solution for the constituent wavelengths. A physical analogy would be to imagine the wave to be a light wave. One finds out how red, orange, yellow, green, blue, indigo and violet light behave, and then deduce how a general light wave behaves by mixing the colours together. More mathematically one is simply Fourier analysing the signal. For example for a right moving wave we can write as a sum, or more precisely an integral over waves with different frequencies as follows: $u(x,t)=f(x-ct)=\int_{-\infty}^{\infty} dk A(k) e^{ik(x-ct)}.$ The solutions with a definite frequency, or monochromatic waves, are $A(k) e^{ik(x-ct)}$ . We have chosen to work with complex exponentials rather than coses and sines, as this makes life easier, but if we need to recover a real solution we can take instead $u_{k}=\Re \left (A(k) e^{ik(x-ct}\right )=\Re \left (|A| e^{i\theta} e^{ik(x-ct}\right )=|A|\cos(k(x-ct)+\theta).$ The graph of $u_k$ shows that $|A|$ is the amplitude of the wave and that the wavelength is $2\pi/k$ .

A monochromatic wave moving to the left is given by $u(x,t)=A(k) e^{-ik(x+ct)}$ , or we can again take the real part of this to obtain a real solution.

Let us calculate the energy flux of a monochromatic wave. The expression $T_{tx}=-\tau u_t u_x$ we derived measures the flux of energy carried by a solution past a point moving from left to right, so we should expect the answer to be positive for a right moving wave. Taking our solution to be $u(x,t)=u_k$ defined above we see that the flux is given by $\begin{aligned} T_{tx} = -\tau (u_k)_t (u_k)_x &=& - \tau \left (kc|A|\sin(k(x-ct)+\theta)\right )\left (-k|A|\sin(k(x-ct)+\theta)\right )\\ &=& \tau c k^2|A|^2 \sin^2(k(x-ct)+\theta)\end{aligned}$ which is clearly positive although it fluctuates with time. If we average over a whole period we see that the average energy passing a point per unit time is given by $\frac{kc}{2\pi}\int_0^{\frac{2\pi}{kc}} \tau c k^2|A|^2 \sin^2(k(x-ct)+\theta) dt =\frac{\tau c k^2|A|^2 }{2}.$ Note that the energy flux proved to be positive as we had predicted. If we had performed the same calculation on a left moving wave $u=\Re \left (A(k) e^{-ik(x+ct}\right )$ we would find the average flux to be $-\tau c k^2|A|^2 /2$ .

5.7 Strings with Boundaries

Now that we know how to deal with infinitely long strings which run from $x=-\infty$ to $x=\infty$ , let us complicate the situation a bit by introducing a boundary, or end, to our string at $x=0$ . The string is still infinitely long but now runs from $x=-\infty$ to $x=0$ . In such a situation it is necessary to specify a boundary condition at $x=0$ , specifying how the string interacts with the boundary. The most natural thing that we can impose is that no energy flows into the boundary. This is what one should expect if the string is attached to a rigid boundary of infinite mass: in this (idealised) case the vibrations of the string do not affect the boundary at all, and in particular there is no energy flow into the boundary.

We have seen above that the right-moving energy flux for the string is $T_{tx}=-\tau u_x u_t$ . So the condition that no energy flows into the boundary is $\lim_{x\to 0^-} T_{tx}(x,t) = -\lim_{x\to 0^-} \tau u_x(x,t) u_t(x,t) = 0\, .$ There are two natural solutions to this equation: $\lim_{x\to 0^-} u_t(x,t)=0$ and $\lim_{x\to 0^-} u_x(x,t)=0$ . For convenience, at the cost of some slight imprecision, we will refer to these conditions as $u_t(0,t)=0$ and $u_x(0,t)=0$ . We study them in turn.

5.7.1 Dirichlet boundary condition

The first case, $u_t(0,t)=0$ is perhaps the most natural: it enforces that the endpoint of the string at $x=0$ does not change with time, or in other words $u(0,t)$ is a constant. This is what you get if you simply tie a string to a wall. Given that there is a shift symmetry for $u$ , let us simply assume that the condition is that $u(0,t)=0$ . This is called a Dirichlet boundary condition.

It is quite straightforward to find the general solution in this case. We know that $u(x,t)$ satisfies the wave equation for $x<0$ , so the solution must be of D’Alembert’s form $u(x,t)=f(x-ct)+g(x+ct)=f(x-ct)+h(-x-ct)$ where for convenience we have introduced a function $h(\xi)=g(-\xi)$ . The boundary condition tells us that $u(0,t)=0=f(-ct)+h(-ct),$ from which it follows that $h(\xi)=-f(\xi)$ . It follows that $u(x,t)=f(x-ct)-f(-x-ct)$ .

To understand this solution a little better, note that, considered as a function on the whole of the $x$ -axis, $u(x,t)$ is an odd function in $x$ ; that is $u(x,t)=-u(-x,t)$ .

The figure shows the solution $u(x,t)$ for all $x$ . In the physical region there is a wave moving towards the boundary. The dotted line represents a mirror image of the physical string. This mirror image moves to the left, and after some time will pass the line $x=0$ , emerging into the physical region $x<0$ as the reflected wave. At later times the solution will look like below.

So we see from this that waves reflect off the boundary and are turned upside down by this boundary condition.

5.7.2 Neumann boundary condition

The other classic boundary condition for a string is the Neumann (sometimes called free) boundary condition $u_x(0,t)=0$ . Again the flux of energy into the boundary vanishes, so that energy is conserved on the string. Once more we can deduce the general solution from D’Alembert’s solution $u(x,t)=f(x-c t)+h(-x-ct)$ . Demanding that $u_x(0,t)=0$ gives us that $u_x(0,t)=f'(0-ct)-h'(0-ct)=0$ from which we deduce that it is possible to take $f(\xi)=h(\xi)$ (up to a constant shift of $u$ ), so that $u(x,t)=f(x-c t)+f(-x-ct).$ In this case, the function $u(x,t)$ considered over the whole line is an even function.

As before, given enough time the mirror image of the incoming wave emerges from behind the boundary $x=0$ as the reflected wave, but in this case since $u(x,t)$ is even rather than odd it will emerge the same way up as the incoming wave.

5.8 Junctions

Junctions or defects afford another possible way of introducing boundary conditions. We shall explain the idea of junctions through an example.

String with a spring of constant \kappa attached at x=0. — String with a spring of constant $\kappa$ attached at $x=0$ .

Consider a setup in which we attach at $x=0$ a spring, with constant $\kappa$ and zero natural length, to the string, as in figure 3. We can view this system as two strings, one on the right and another on the left, joined at a junction at $x=0$ . Away from the junction at $x=0$ we have a vanilla string, so we expect the monochromatic wave to be a good solution there. We want to understand what happens to such a monochromatic wave coming from the left as it hits the junction. Physically, we expect that part of the wave will be transmitted across the junction, and part will be reflected.

In order to solve the problem, it is essential to introduce junction conditions, describing which conditions should $u$ satisfy as we cross the junction. The first condition is straightforward, namely that $u$ is continuous at $x=0$ :

The second condition is energy conservation across the junction. In order to formulate this, note that on an infinitesimal neighbourhood $[-\epsilon, \epsilon]$ of $x=0$ we have the energy $\frac{1}{2}\kappa\, u(0,t)^2 + \int_{-\epsilon}^{+\epsilon} dx \left(\frac{1}{2}\rho (u_t)^2 + \frac{1}{2}\tau (u_x)^2\right)\, .$ That is, there is a contribution coming from the vibrating string between $\epsilon$ and $\epsilon$ , and a contribution from the extended spring at $x=0$ . We will assume that $\lim_{\epsilon\to 0} \int_{-\epsilon}^{+\epsilon} dx \left(\frac{1}{2}\rho (u_t)^2 + \frac{1}{2}\tau (u_x)^2\right) = 0\, .$ so the only contribution to the total energy of the small interval in the limit $\epsilon\to 0$ is the one coming from the extension of the spring. Conservation of energy tells us

As an example, suppose that we send in a monochromatic wave of unit amplitude. We expect that upon encountering the spring, this will be partially reflected into a left moving wave on the left side of the string, and partially transmitted to a right moving wave on the right side of the string. Putting this together our ansatz is $u(x,t) = \begin{cases} \Re\left(\left (e^{ipx}+Re^{-ipx}\right )e^{-ipct}\right) & \text{for } x \le 0\\ \Re\left(Te^{ip(x-c t)}\right) & \text{for } x > 0 \end{cases}$ where $T$ gives the amplitude/phase of the transmitted wave. Away from $x=0$ we have monochromatic waves, which satisfy the wave equation. All that remains is to ensure that the ansatz also satisfies the junction conditions, by adjusting $R$ and $T$ . Continuity of $u(x,t)$ at $x=0$ — that is, equation $\eqref{eq:junction-continuity}$ — implies $\Re((1+R)e^{-ipct}) = \Re(Te^{-ipct})\, .$ This will hold for all $t$ if and only if¹⁶ $1+R=T\, .$ This is our first junction condition in this case.

In order to study energy conservation, as given by equation $\eqref{eq:junction-energy-conservation}$ , it is convenient to note that for our monochromatic wave solution continuity of $u(x,t)$ at $x=0$ (or equivalently $1+R=T$ , as we just showed) implies $\lim_{x\to 0^-} u_t(x,t) = \lim_{x\to 0^+} u_t(x,t)\, .$ In other words, $u_t(x,t)$ is continuous at $x=0$ , and $u_t(0,t)$ is well defined.

We have computed above that $T_{tx}=-\tau u_xu_t$ , and we have that $\lim_{\epsilon\to 0} E(-\epsilon,\epsilon)=\frac{1}{2}\kappa\, u(0,t)^2$ , so in the current case energy conservation across the junction becomes $\begin{equation} \label{eq:junction-example-energy-conservation} \kappa \, u(0,t) u_t(0,t) = \tau \biggl[u_tu_x\biggr]_{-\epsilon}^\epsilon \end{equation}$ This is our second junction condition. This equation can be simplified since there is a factor of $u_t(0,t)$ on both sides that we can divide, to obtain: $\kappa u(0,t) = \tau \left[\lim_{x\to 0^+} u_x(x,t) - \lim_{x\to 0^-} u_x(x,t)\right]\, .$ Plugging in our candidate monochromatic solution, this is $\kappa \Re((1+R)e^{-ipct}) = \tau\Re\left(ip(T-(1-R))e^{-ipct}\right)$ which holds for all $t$ if and only if $\kappa (1+R) = i\tau p (R+T-1)\, .$ Solving this equation together with the continuity equation $1+R=T$ we find that $\begin{aligned} R & = \frac{\kappa}{2ip\tau -\kappa}\\ T & = \frac{2ip\tau }{2ip\tau -\kappa}\end{aligned}$

To get some intuition for these formulas, first assume that we make the spring very stiff by sending $\kappa\to\infty$ . Then $R\to -1$ and $T\to 0$ . This is as we expect; if the spring becomes very stiff, then the left hand piece of string has its end effectively pinned so it has a Dirichlet boundary condition, and nothing gets through to the right hand side. On the other hand if we send $\kappa\to 0$ , then we are effectively removing the spring, and the two pieces of string will become as one. In this case we can explicitly see that as $\kappa\to 0$ , $R\to 0$ and $T\to 1$ .

Alternatively, we can think of fixing $\kappa$ and consider the effect on waves with different value of $p$ . The energy flux associated with the incoming wave, whose amplitude is fixed at one, is given by $\frac{\tau c p^2|A|^2 }{2}=\frac{\tau c p^2 }{2}.$ If $p$ is very small, that is to say the wavelength is very long, then the energy flux is very small, and again it is hard for the wave to excite the spring since it does not have enough energy, so effectively we have Dirichlet boundary conditions. Again in this limit $p\to 0$ , $R\to -1$ and $T\to 0$ , the value for Dirichlet boundary conditions. On the other hand, if $p$ is very large, the energy of the incoming wave is so large that the spring has little effect, and indeed $R\to 0$ , $T\to 1$ in this limit.

6 The Hamiltonian formalism

6.1 Phase space

So far we have discussed the Lagrangian formalism, in which the evolution of the system is determined by the Euler-Lagrange equations. Given a set of initial conditions, these equations determine the time evolution of the system in configuration space. Recall from §2.2 that this is the space described by the generalized coordinates $\mathbf{q}$ , without including the information about the velocities $\mathbf{\dot{q}}$ .

The Hamiltonian formalism is closely related to the Lagrangian formalism that we have been studying so far, but it starts from a slightly different perspective: instead of considering configuration space, we now want to consider the space of all states of our physical systems. This space is known as phase space. I now define these notions.

The state of a classical system at a given instant in time is a complete set of data that fully fixes the future evolution of the system.

The Euler-Lagrange equations are second order linear differential equations on $\mathbf{q}(t)$ (assuming that the Lagrangian depends on $\mathbf{q}(t)$ and $\mathbf{\dot{q}}(t)$ only, and not higher derivatives of $\mathbf{q}(t)$ ). We can fix the integration constants that appear in solving these equations by giving the positions $\mathbf{q}(t_0)$ and velocities $\mathbf{\dot{q}}(t_0)$ at any chosen time $t_0$ , for some convenient choice of generalised coordinates and velocities. Once we have fixed these constants we know the behaviour of the system for all future times, so in this case we can parametrize the state at a given time $t$ by giving $\mathbf{q}(t)$ and $\mathbf{\dot{q}}(t)$ .

The parametrization of the physical state in terms of $\mathbf{q}(t)$ and $\mathbf{\dot{q}}(t)$ is not the only possible one: any parametrisation that allows us to fully fix future evolution is valid. We will see an example of a different parametrisation momentarily.

The phase (or state) space $\mathscr{P}$ of a classical system is the space of all possible states that the system can be in at a given instant in time.

This definition for phase space sounds rather similar to the definition of configuration space (this was definition [def:configuration-space]). But note that that phase space has twice the dimension of configuration space: while configuration space encodes the (generalised) position of the system at a time $t$ , phase space encodes the generalised positions and the velocities.

Consider a particle moving in one dimension. Phase space in this case is the two dimensional plane $\mathbb{R}^2$ : one coordinate for $x$ and one coordinate for $\dot{x}$ . Every possible point in this plane is a possible state for the particle. For instance, the point $(x,\dot{x})=(0,10)$ parametrizes a particle at the origin, moving toward positive values of $x$ . Similarly the particle moving in $d$ dimensions has a phase space $\mathbb{R}^{2d}$ . Note that the precise form of the Lagrangian does not enter in our definition of phase space: given a point in phase space the Lagrangian will determine future evolution, but any point in phase space is acceptable as an initial condition (by definition).

The Hamiltonian formalism studies dynamics on phase space, parametrized by generalised coordinates $\mathbf{q}(t)$ and their associated generalised momenta $\mathbf{p}(t)$ .

The fundamental step in going from the Lagrangian to the Hamiltonian formalism is to invert the definition equations for the generalised momenta: $p_i :=\frac{\partial L(\mathbf{q},\mathbf{\dot{q}},t)}{\partial \dot{q}_i}\, .$ The right hand side of these equations are a set of functions of $\mathbf{q}$ , $\mathbf{\dot{q}}$ and $t$ . We often¹⁷ can invert these equations to express $\mathbf{\dot{q}}$ in terms of $\mathbf{q}$ , $\mathbf{p}$ and $t$ . Once we do this, we can express any function in phase space (the Lagrangian, for instance) in terms of $\mathbf{q}$ , $\mathbf{p}$ and $t$ only.

Consider a particle of mass $m$ moving in one dimension, expressed in Cartesian coordinates. Its Lagrangian is $L(x,\dot{x}) = \frac{1}{2}m\dot{x}^2\, ,$ so its associated momentum is $p = \frac{\partial L}{\partial \dot{x}} = m\dot{x}\, .$ We can trivially solve this equation to find $\dot{x} = p/m$ . We find that the Lagrangian for this system is thus $L(x,p) = \frac{p^2}{2m}$ in the Hamiltonian formalism.

Let us now take a particle moving in two dimensions, expressed in polar coordinates. Its Lagrangian is $L(\mathbf{q},\mathbf{\dot{q}}) = \frac{1}{2}m(\dot{r}^2 + r^2\dot{\theta}^2)$ so its generalised momenta are $p_r = m\dot{r} \qquad ; \qquad p_\theta = mr^2\dot{\theta}\, .$ We can easily invert these equations, to find $\dot{r}=p_r/m$ and $\dot{\theta}=p_\theta/(mr^2)$ . In this way we can express any function of phase space in terms of the $\mathbf{q}$ and $\mathbf{p}$ . For instance, for the Lagrangian itself we have $L(\mathbf{q},\mathbf{p}) = \frac{1}{2m}\left(p_r^2 + \frac{1}{r^2}p_\theta^2\right)\, .$

6.2 The Poisson bracket and Hamiltonian flows

We still need to understand how a given state evolves in time in this new formalism. That is, if we know which point in phase space describes a system at a given time, which trajectory in phase space will describe subsequent motion of the system?

In fact, there would be little point in doing this if all we gained was a description of the dynamics in a different set of variables. After all, the Lagrangian formalism will do the job of giving the equations of motion for the system perfectly well.¹⁸ The advantage of switching to the Hamiltonian formalism is that we will be able to exhibit a rather deep and beautiful geometric structure to classical dynamics, in which we will obtain (in a sense) a reciprocal of Noether’s theorem! Recall that Noether’s theorem states that every symmetry has an associated conserved charge. We will see below that in the Hamiltonian formalism the conserved charge generates the symmetry: if we know the form of the conserved charge for a symmetry we will be able to reconstruct systematically the infinitesimal form of the symmetry transformation.

The fundamental object that allows us to think of charges as generating transformations is the Poisson bracket:

The Poisson bracket between two functions $f(\mathbf{q},\mathbf{p},t)$ and $g(\mathbf{q},\mathbf{p},t)$ on phase space is the function in phase space defined by

where $n$ is the dimension of configuration space (so half the dimension of phase space).

Note that in the definition of the Poisson bracket the position and momenta are independent coordinates in phase space, and are treated as independent variables when taking partial derivatives: $\frac{\partial q_i}{\partial p_j} = \frac{\partial p_i}{\partial q_j} = 0 \qquad ; \qquad \frac{\partial q_i}{\partial q_j} = \frac{\partial p_i}{\partial p_j} = \delta_{ij}\, .$

The simplest functions in phase space that we can construct are those that give the coordinates of a point in a given basis. From the definition of the Poisson bracket, we have the fundamental brackets $\{q_i, q_j\} = \{p_i, p_j\} = 0 \qquad ; \qquad \{q_i, p_j\} = \delta_{ij}\, .$

The Poisson bracket has a number of interesting properties, which I now list. The proof of these properties is straightforward, and can be found in the problem sheet for week 10:

The Poisson bracket is antisymmetric: $\{f,g\} = -\{g,f\}\, .$

The Poisson bracket is linear: $\{\alpha f + \beta g, h\} = \alpha \{f,h\} + \beta \{g, h\}$ for $\alpha,\beta\in\mathbb{R}$ . Note that together with antisymmetry this implies $\{h, \alpha f + \beta g\} = \alpha \{h,f\} + \beta \{h,g\}$ so the Poisson bracket is in fact bilinear (that is, linear on both terms).

The Poisson bracket obeys the Leibniz identity: $\{fg,h\} = f\{g,h\} + g\{f,h\}\, .$

The Poisson bracket obeys the Jacobi identity for the sum of the cyclic permutations: $\{\{f,g\}, h\} + \{\{h,f\},g\} + \{\{g,h\},f\} = 0\, .$

Denote by $\mathscr{F}$ the space of all functions from phase space $\mathscr{P}$ to $\mathbb{R}$ . Given any function $f\in\mathscr{F}$ , we can define an operator $\Phi_f$ that generates infinitesimal transformations on $\mathscr{F}$ using the Poisson bracket.

The Hamiltonian flow defined by $f\colon\mathscr{P}\to\mathbb{R}$ is the infinitesimal transformation on $\mathscr{F}$ defined by $\begin{split} \Phi_f^{(\epsilon)}&\colon \mathscr{F}\to \mathscr{F}\\ \Phi_f^{(\epsilon)}(g) &= g + \epsilon \{g,f\} + \mathcal{O}(\epsilon^2)\, . \end{split}$

I am taking a small liberty with the language here to avoid having to introduce some additional formalism: what I have just introduced is the infinitesimal version of what is commonly known as “Hamiltonian flow” in the literature, which is typically defined for finite (that is, non-infinitesimal) transformations. The finite version of the transformation is obtained by exponentiation: $\Phi^{(a)}_f(g) = e^{a{\{\cdot,f\}}}g :=g + a\{g,f\} + \frac{a^2}{2!}\{\{g,f\},f\} + \frac{a^3}{3!}\{\{\{g,f\},f\},f\} + \ldots$

By studying the action of $\Phi_f^{(\epsilon)}$ on the coordinates $\mathbf{q}$ , $\mathbf{p}$ of phase space, we can also understand $\Phi_f^{(\epsilon)}$ as the generator of a map from phase space to itself. We have $\begin{split} \Phi_f^{(\epsilon)}(q_i) &= q_i + \epsilon\{q_i, f\} + O(\epsilon^2) = q_i + \epsilon \frac{\partial f}{\partial p_i} + O(\epsilon^2)\\ \Phi_f^{(\epsilon)}(p_i) &= p_i + \epsilon\{p_i,f\} + O(\epsilon^2) = p_i - \epsilon \frac{\partial f}{\partial q_i} + \mathcal{O}(\epsilon^2)\, . \end{split}$ The two definitions are compatible: $\begin{split} \Phi_f^{(\epsilon)}(g) & = g(q_1+\epsilon\{q_1,f\},\ldots,q_n+\epsilon\{q_n,f\},p_1+\epsilon\{p_1,f\},\ldots,p_n+\epsilon\{p_n,f\})\\ & = g(q_1,\ldots,q_n,p_1,\ldots,p_n) + \epsilon\sum_{i=1}^n\left(\frac{\partial g}{\partial q_i}\{q_i,f\} + \frac{\partial g}{\partial p_i}\{p_i,f\}\right) \\ & = g(q_1,\ldots,q_n,p_1,\ldots,p_n) + \epsilon\sum_{i=1}^n\left(\frac{\partial g}{\partial q_i}\frac{\partial f}{\partial p_i} - \frac{\partial g}{\partial p_i}\frac{\partial f}{\partial q_i}\right)\\ & = g + \epsilon\{g,f\} \end{split}$ where in the second line we have done a Taylor expansion, and we have omitted higher order terms in $\epsilon$ throughout for notational simplicity.

As a simple example, consider a particle moving in one dimension. The Hamiltonian flow $\Phi_p$ associated to the canonical momentum $p$ acts on phase space functions as: $\Phi_p^{(\epsilon)}(g(q,p)) = g(q,p) + \epsilon \frac{\partial g}{\partial q} + \mathcal{O}(\epsilon^2)\, .$ Alternatively, $\Phi_p^{(\epsilon)}$ acts on the coordinate $q$ as $q \to q+\epsilon$ , so the effect of $\Phi_p^{(\epsilon)}$ on phase space is a uniform shift in the $q$ direction:

We can reproduce the effect on arbitrary functions of $q$ from this viewpoint by doing a Taylor expansion: $g(q+\epsilon,p) = g(q,p) + \epsilon\frac{\partial g}{\partial q} + \mathcal{O}(\epsilon^2)\, .$ (You might also find it interesting to reproduce the full form of the Taylor expansion of $f(x+a)$ around $x$ using the exponentiated version in remark [remark:finite-flow].)

As a second example, consider a particle of unit mass moving in two dimensions, expressed in Cartesian coordinates, which we call $q_1$ and $q_2$ . We choose the Lagrangian to be of the form $L = \frac{1}{2}(\dot{q}_1^2 + \dot{q}_2^2) - V(q_1,q_2)\, .$ For the function generating the flow we will choose $J=q_1\dot{q}_2 - q_2\dot{q}_1$ . (Recall from example [ex:rotations-2d-Cartesian] that this function is angular momentum, which Noether’s theorem associated with rotations around the origin.) From the Lagrangian we have $p_1=\dot{q}_1$ and $p_2=\dot{q}_2$ , so in terms of standard $(\mathbf{q},\mathbf{p})$ coordinates of phase space we have $J(\mathbf{q},\mathbf{p})=q_1p_2-q_2p_1$ . The Hamiltonian flow $\Phi_J^{(\epsilon)}$ then acts on phase space as $\begin{split} \Phi_{J}^{(\epsilon)}(q_1) &=q_1+\epsilon\{q_1,J\}=q_1+\epsilon\frac{\partial J}{\partial p_1}=q_1-\epsilon q_2\, ,\\ \Phi_{J}^{(\epsilon)}(q_2) &=q_2+\epsilon\{q_2,J\}=q_2+\epsilon\frac{\partial J}{\partial p_2}=q_2+\epsilon q_1\, ,\\ \Phi_{J}^{(\epsilon)}(p_1) &=p_1+\epsilon\{p_1,J\}=p_1-\epsilon\frac{\partial J}{\partial q_1}=p_1-\epsilon p_2\, ,\\ \Phi_{J}^{(\epsilon)}(p_2) &=p_2+\epsilon\{p_2,J\}=p_2-\epsilon\frac{\partial J}{\partial q_2}=p_2+\epsilon p_1\, . \end{split}$ omitting higher orders in $\epsilon$ . So the effect of $J$ on the coordinates can be written as an infinitesimal rotation on the $\mathbf{q}$ and the $\mathbf{p}$ (independently) $\begin{aligned} \Phi_J^{(\epsilon)}\begin{pmatrix}q_1\\q_2\end{pmatrix}&=\begin{pmatrix}1 & -\epsilon\\ \epsilon & 1\end{pmatrix}\begin{pmatrix}q_1\\q_2\end{pmatrix}\, ,\\ \Phi_J^{(\epsilon)}\begin{pmatrix}p_1\\p_2\end{pmatrix}&=\begin{pmatrix}1 & -\epsilon\\ \epsilon & 1\end{pmatrix}\begin{pmatrix}p_1\\p_2\end{pmatrix}\, . \end{aligned}$ For instance, the action of $\Phi_J^{(\epsilon)}$ on the $(q_1,q_2)$ slice of phase space (which in this case has four dimensions) is as in the following picture:

6.2.1 Flows for conserved charges

We have just seen that linear momentum $p$ generates spatial translations, and angular momentum generates rotations. This is in fact general: assume that we have a transformation acting as $q_i\to q_i+\epsilon a_i(\mathbf{q})+O(\epsilon^2)$ on the generalised coordinates. Noether’s theorem assigns a charge to this transformation given, in the Lagrangian framework, by $Q(\mathbf{q},\mathbf{\dot{q}},t) = \left(\sum_{i=1}^n a_i(\mathbf{q}) \frac{\partial L(\mathbf{q},\mathbf{\dot{q}},t)}{\partial \dot{q}_i}\right) - F(\mathbf{q}, t)\, .$ This charge can be written in the Hamiltonian framework in terms of generalised coordinates and generalised momenta as $Q(\mathbf{q},\mathbf{p},t) = \left(\sum_{i=1}^n a_i(\mathbf{q}) p_i\right) - F(\mathbf{q}, t)\, .$ If we now compute the Hamiltonian flow associated to this charge on the generalised coordinates we find $\Phi_Q^{(\epsilon)}(q_i) = q_i + \epsilon \{q_i,Q\} + O(\epsilon^2) = q_i + \epsilon a_i + \mathcal{O}(\epsilon^2)\, .$

This is a very important result: Noether’s theorem told us that symmetries imply the existence of conserved quantities. We have just seen that we can go in the other direction too: conserved quantities generate the corresponding symmetry transformations, via the associated Hamiltonian flow.

6.3 The Hamiltonian and Hamilton’s equations

We have just proven that conserved quantities generate the corresponding symmetries. It is natural to guess at this point that energy will generate time evolution, via Hamiltonian flow. This is indeed the case.

The Hamiltonian “ $H$ ” of a physical system is the energy expressed in terms of generalised coordinates and generalised momenta. That is: $H :=\left(\sum_{i=1}^n p_i \dot{q}_i(\mathbf{q},\mathbf{p},t)\right) - L(\mathbf{q},\mathbf{\dot{q}}(\mathbf{q}, \mathbf{p},t), t)\, .$

Consider the harmonic oscillator in one dimension, with Lagrangian $L = \frac{1}{2}m\dot{x}^2 - \frac{1}{2}\kappa x^2\, .$ The generalised momentum is $p=m\dot{x}$ , so the Hamiltonian for this system is $H = \frac{1}{2m}p^2 + \frac{1}{2}\kappa x^2\, .$

The time evolution of the generalised coordinates and momenta is given by the Hamiltonian flow $\Phi_H$ : $\Phi_H^{(\epsilon)}(q_i) = q_i(t+\epsilon) \qquad ; \qquad \Phi_H^{(\epsilon)}(p_i) = p_i(t+\epsilon)\, .$ We will prove the infinitesimal version of these relations: expanding $\Phi_H^{(\epsilon)}$ from definition [def:Hamiltonian-flow], Taylor expanding $q_i(t+\epsilon)=q_i(t)+\epsilon \dot{q}_i(t)+\ldots$ and similarly for $p_i$ :

These equations are known as Hamilton’s equations of motion.

The first thing to do is to note that when we write the partial derivative $\frac{\partial A}{\partial q_j}$ in the Hamiltonian picture we mean differentiate $A$ with respect to $q_j$ keeping the other $q$ ’s, any explicit time dependance in $A$ , and the $p$ ’s fixed. This should be contrasted with the Lagrangian picture, where differentiating with respect to $q_j$ involved keeping the other $q$ ’s time, and and the ${\dot q}$ ’s fixed. To highlight this point, in this proof I will write $\frac{\partial A}{\partial q_j}|_\mathbf{p}$ or $\frac{\partial A}{\partial q_j}|_{\mathbf{\dot{q}}}$ to clarify which set of variables are being held fixed when taking partial derivatives.¹⁹

Given this let us calculate the derivates of $H$ with respect to $q_j$ and $p_j$ . we have $\begin{aligned} \left . \frac{\partial H}{\partial q_j}\right |_{\mathbf{p}} & = \left . \frac{\partial}{\partial q_j} \left (\sum_i p_i {\dot q}_i (q,p,t)-L(q,{\dot q}(q,p,t),t)\right )\right |_\mathbf{p}\\ &= \sum _i p_i\left . \frac{\partial {\dot q}_i}{\partial q_j}\right |_\mathbf{p}-\sum_i\left .\frac{\partial L}{\partial q_i}\right |_{\mathbf{\dot{q}}}\left.\frac{\partial q_i}{\partial q_j}\right|_\mathbf{p}-\sum_i\left .\frac{\partial L}{\partial {\dot q}_i}\right |_\mathbf{q} \left. \frac{\partial {\dot q}_i}{\partial q_j}\right |_\mathbf{p}\\ &= \sum _i p_i\left . \frac{\partial {\dot q}_i}{\partial q_j}\right |_\mathbf{p}-\left .\frac{\partial L}{\partial q_j}\right |_{\mathbf{\dot{q}}} -\sum_i\left .\frac{\partial L}{\partial {\dot q}_i}\right |_\mathbf{q} \left. \frac{\partial {\dot q}_i}{\partial q_j}\right |_\mathbf{p}\\ &=\sum_i \left (p_i - \left .\frac{\partial L}{\partial {\dot q}_i}\right |_\mathbf{q}\right )\left. \frac{\partial {\dot q}_i}{\partial q_j}\right |_\mathbf{p}-\left .\frac{\partial L}{\partial q_j}\right |_{\mathbf{\dot{q}}}\, . \end{aligned}$ The first bracket in this expression is zero by the definition of $p_i$ . So along a physical path $\left . \frac{\partial H}{\partial q_j}\right |_{\mathbf{p}} = -\left .\frac{\partial L}{\partial q_j}\right |_{\mathbf{\dot{q}}}= -\frac{d}{dt}\left (\left.\frac{\partial L}{\partial {\dot q}_j}\right|_\mathbf{q}\right )=-{\dot p}_j\, ,$ where in the second equality we have used the Euler-Lagrange equation for $q_j$ . Similarly, calculating $\frac{\partial H}{\partial p_j}$ we find $\begin{aligned} \left . \frac{\partial H}{\partial p_j}\right |_{\mathbf{q}} &= \left . \frac{\partial}{\partial p_j} \left (\sum_i p_i {\dot q}_i (q,p,t)-L(q,{\dot q}(q,p,t),t)\right )\right |_\mathbf{q}\\ &= \sum_i \frac{\partial p_i}{\partial p_j} {\dot q}_i + \sum _i p_i\left . \frac{\partial {\dot q}_i}{\partial p_j}\right |_\mathbf{q} -\sum_i\left .\frac{\partial L}{\partial {\dot q}_i}\right |_\mathbf{q}\left. \frac{\partial {\dot q}_i}{\partial p_j}\right |_\mathbf{q}\\ &= \sum_i \delta_{ij}{\dot q}_i +\sum_i \left (p_i - \left .\frac{\partial L}{\partial {\dot q}_i}\right |_\mathbf{q}\right )\left. \frac{\partial {\dot q}_i}{\partial p_j}\right |_\mathbf{q}\\ &= {\dot q}_j \end{aligned}$ again using the definition of $p_i$ to show the last term vanishes. Note that we did not need to use the Euler-Lagrange equations to derive this last equation. Accordingly, in practice this equation generally just reproduces the result of inverting the definition of the generalised momentum in the Lagrangian formalism to express $\mathbf{\dot{q}}$ in terms of $\mathbf{q}$ , $\mathbf{p}$ and $t$ .

The time evolution of any function $f(\mathbf{q},\mathbf{p})$ on phase space is generated by $\Phi_H$ : $\begin{aligned} \frac{df}{dt} & = \{f,H\} \, .\\ \end{aligned}$ In the case that $f$ depends explicitly on time, $f=f(\mathbf{q},\mathbf{p},t)$ , we have $\begin{aligned} \frac{df}{dt} &= \frac{\partial f}{\partial t} + \{f,H\}\, . \end{aligned}$

The function $f$ will depend on time through its explicit dependence on $t$ , if any, and via its implicit dependence via $\mathbf{q}$ and $\mathbf{p}$ , who themselves are functions of time. Using the Chain Rule we find $\begin{split} \frac{df}{dt} & = \frac{\partial f}{\partial t} + \sum_{i=1}^n\left(\frac{\partial f}{\partial q_i}\dot{q}_i + \frac{\partial f}{\partial p_i}\dot{p}_i\right) \\ & = \frac{\partial f}{\partial t} + \sum_{i=1}^n\left(\frac{\partial f}{\partial q_i}\frac{\partial H}{\partial p_i} - \frac{\partial f}{\partial p_i}\frac{\partial H}{\partial q_i}\right) \\ & = \frac{\partial f}{\partial t} + \{f,H\}\, , \end{split}$ where we have used Hamilton’s equations in going to the second line.

We can apply this corollary to give a very neat proof of conservation of energy. Energy, in the Hamiltonian formalism, is equal to the Hamiltonian itself. So the equation for energy conservation can be written as $\begin{equation} \label{eq:energy-conservation-Hamiltonian} \frac{dH}{dt} = \frac{\partial H}{\partial t} + \{H,H\} = \frac{\partial H}{\partial t} \end{equation}$ using the fact that the Poisson bracket is antisymmetric. We see that if time does not appear explicitly in the expression for the Hamiltonian, then energy is conserved.

A small variation of this last equation is sometimes included as part of Hamilton’s equations. From the definition of the Hamiltonian we have that $\begin{split} \left.\frac{\partial H(\mathbf{q},\mathbf{p},t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}} & = \frac{\partial}{\partial t}\left.\left(\left(\sum_{i=1}^n \dot{q}_i(\mathbf{q},\mathbf{p},t) p_i\right) - L(\mathbf{q},\mathbf{\dot{q}}(\mathbf{q},\mathbf{p},t),t)\right)\right|_{\mathbf{q},\mathbf{p}}\\ & = \left(\sum_{i=1}^n p_i \left.\frac{\partial \dot{q}_i(\mathbf{q},\mathbf{p},t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}}\right) - \left.\frac{\partial L(\mathbf{q}, \mathbf{\dot{q}}(\mathbf{q},\mathbf{p},t), t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}}\, . \end{split}$ Now note that $L$ can have an explicit dependence on $t$ through $\mathbf{\dot{q}}$ , if $\mathbf{\dot{q}}(\mathbf{q},\mathbf{p},t)$ depends explicitly on time. Using the Chain Rule: $\left.\frac{\partial L(\mathbf{q}, \mathbf{\dot{q}}(\mathbf{q},\mathbf{p},t), t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}} = \left.\frac{\partial L(\mathbf{q}, \mathbf{\dot{q}}, t)}{\partial t}\right|_{\mathbf{q},\mathbf{\dot{q}}} + \left(\sum_{i=1}^n \left.\frac{\partial L(\mathbf{q},\mathbf{\dot{q}},t)}{\partial \dot{q}_i}\right|_\mathbf{q}\left.\frac{\partial \dot{q}_i(\mathbf{q},\mathbf{p},t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}}\right)$ which implies $\left.\frac{\partial H(\mathbf{q},\mathbf{p},t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}} = -\left.\frac{\partial L(\mathbf{q}, \mathbf{\dot{q}}, t)}{\partial t}\right|_{\mathbf{q},\mathbf{\dot{q}}} + \left(\sum_{i=1}^n \left(p_i - \left.\frac{\partial L(\mathbf{q},\mathbf{\dot{q}},t)}{\partial \dot{q}_i}\right|_\mathbf{q}\right) \left.\frac{\partial \dot{q}_i(\mathbf{q},\mathbf{p},t)}{\partial t}\right|_{\mathbf{q},\mathbf{p}}\right)\, .$ The second term vanishes due to the definition of the generalised momentum, so

In particular, this makes $\eqref{eq:energy-conservation-Hamiltonian}$ compatible with theorem [thm:energy-conservation-Lagrangian].

More generally, assume that we have a function $Q(\mathbf{q},\mathbf{p}, t)$ on phase space. We have that $Q$ is conserved if $\frac{dQ}{dt} = \{Q,H\} + \frac{\partial Q}{\partial t}= 0\, .$ In particular, if $Q$ does not depend explicitly on time, $Q=Q(\mathbf{q},\mathbf{p})$ , we have that $Q$ is conserved if and only if $\{Q,H\}=0$ . By antisymmetry of the Poisson bracket we can also read this condition as $\{H,Q\} = 0$ which can be interpreted as saying that the Hamiltonian is left invariant by the Hamiltonian flow generated by $Q$ : $\Phi^{(a)}_Q(H) = H + a \{H,Q\} + \frac{1}{2} a^2 \{\{H,Q\},H\}+\ldots = H \, .$

A system whose Lagrangian is given by $L=\frac{1}{2}\left ( {\dot r}^2 + r^2{\dot \theta}^2 \right )-\frac{r^2}{2}.$ We define the momenta to be $\begin{aligned} p_r&=\frac{\partial L}{\partial {\dot r}} = {\dot r}\\ p_\theta&=\frac{\partial L}{\partial {\dot \theta}} = r^2 {\dot \theta} \end{aligned}$ so that $\begin{aligned} {\dot r} &= p_r\, ,\\ {\dot \theta}&= \frac{p_\theta}{r^2}\, . \end{aligned}$ The Hamiltonian is given by $\begin{aligned} H &= p_r{\dot r}+p_\theta{\dot \theta}-\left (\frac{1}{2}\left ( {\dot r}^2 + r^2{\dot \theta}^2 \right )-\frac{r^2}{2}\right )\\ &= p_r^2 + p_\theta\left (\frac{p_\theta}{r^2}\right ) -\frac{1}{2}\left ( {p_r}^2 + r^2\left (\frac{p_\theta}{r^2}\right )^2-\frac{r^2}{2}\right )\\ &= \frac{1}{2}\left (p_r^2 +\frac{p_\theta^2}{r^2}\right )+\frac{r^2}{2}. \end{aligned}$ Hamilton’s Equations of Motion tell us that $\begin{aligned} {\dot r} &= \frac{\partial H}{\partial p_r}=p_r\\ {\dot \theta} &= \frac{\partial H}{\partial p_\theta}=\frac{p_\theta}{r^2}\\ {\dot p_r} &= -\frac{\partial H}{\partial r} =\frac{p_\theta^2}{r^3}-r\\ {\dot p_\theta} &= -\frac{\partial H}{\partial \theta}=0. \end{aligned}$ Note that the first two equations here simply reproduce the results of expressing the $\mathbf{q}$ in terms of the $\mathbf{p}$ ’s. This is always the case when we derive the Hamiltonian system from a Lagrangian system like above. The last equation shows that $p_\theta$ is conserved as a result of the Hamiltonian being independent of $\theta$ . The concept of an ignorable coordinate goes over completely from the Lagrangian picture to the Hamiltonian picture. The real ‘meat’ of the dynamics is in the remaining equation for ${\dot p_r}$ . Given that $p_\theta$ is a constant and that $p_r={\dot r}$ it can be read as ${\ddot r} = \frac{p_\theta^2}{r^3}-r.$

Suppose we start instead with a Hamiltonian: $H=\frac{p^2}{2}+xp.$ Hamilton’s equations are $\begin{aligned} {\dot x} &= \frac{\partial H}{\partial p}=p+x\\ {\dot p} &= -\frac{\partial H}{\partial x}= - p. \end{aligned}$ Solving the second equation, we have that $p=Ae^{-t}$ . Substituting this into the first equation we find ${\dot x} - x = Ae^{-t}$ which is a linear first order differential equation. Multiplying through by the integrating factor we find $\frac{d}{dt}\left (xe^{-t}\right ) = Ae^{-2t}$ which can be integrated to give $x=Ce^t -Ae^{-t}/2$ .

The following is a Hamiltonian for the damped harmonic oscillator: $H=\frac{e^{-bt}p^2}{2} + \frac{e^{bt}w^2 x^2}{2}.$ Notice that $H$ explicitly depends on time; this implies that it is not conserved, as we would expect for the damped harmonic oscillator, whose motion dies away to nothing. Hamilton’s equation of motion are $\begin{aligned} {\dot x} &= \frac{\partial H}{\partial p} = e^{-bt}p\\ {\dot p} &= -\frac{\partial H}{\partial x} = -w^2e^{bt}x. \end{aligned}$ Differentiating the first equation with respect to $t$ , we see that $\begin{aligned} {\ddot x} &= -be^{-bt}p+e^{-bt}{\dot p}\\ &= -b{\dot x}-w^2 x\, , \end{aligned}$ which is indeed the equation for a damped harmonic oscillator.

6.4 There and back again

Let me finish by closing the circle of ideas that we have been developing. We have seen in §3.1 that Noether’s theorem says that every symmetry implies the existence of a conserved Noether charge $Q$ , and we have also shown in §6.2.1 that the Noether charge associated to a symmetry generates the right transformations on the generalized coordinates. It is natural to ask at this point: does any conserved charge generate a symmetry transformation? This is indeed that case, as we now show for the class of conserved charges that we have been discussing.

Assume that we have a function $Q(\mathbf{q},\mathbf{p},t)$ of the form $Q(\mathbf{q},\mathbf{p},t) = \left(\sum_{i=1}^n a_i(\mathbf{q}) p_i\right) - F(\mathbf{q},t)$ such that $\frac{dQ}{dt} = \{Q,H\} + \frac{\partial Q}{\partial t} = 0\, ,$ Then $\Phi_Q^{(\epsilon)}(L)=L+\epsilon \frac{dF}{dt}+O(\epsilon^2)$ , so $Q$ generates a symmetry, whose Noether charge is $Q$ .

Let me start by proving some simple auxiliary results. Note first that $\begin{equation} \label{eq:qQ} \{q_i, Q\} = \frac{\partial Q}{\partial p_i} = a_i(\mathbf{q}) \end{equation}$ which implies, in particular, that $\frac{\partial \{q_i, Q\}}{\partial t} = \frac{\partial a_i(\mathbf{q})}{\partial t} = 0\, .$ Note also that since $Q$ is conserved, and the only explicit time dependence of $Q$ on time is via $F$ , we have $\{Q,H\} = \frac{dQ}{dt} - \frac{\partial Q}{\partial t} = -\frac{\partial Q}{\partial t} = \frac{\partial F}{\partial t}\, ,$ so $- \{\{Q,H\}, q_i\} = \{q_i, \{Q,H\}\} = \frac{\partial \{Q,H\}}{\partial p_i} = \frac{\partial}{\partial p_i}\left(\frac{\partial F(\mathbf{q}, t)}{\partial t}\right) = 0\, .$ Using these two results we find that $\begin{aligned} \frac{d\{q_i,Q\}}{dt} & = \frac{\partial \{q_i, Q\}}{\partial t} + \{\{q_i,Q\}, H\} \\ & = \{\{q_i,Q\}, H\}\\ & = -\{\{H,q_i\}, Q\} - \{\{Q,H\}, q_i\}\\ & = \{\{q_i,H\}, Q\} \, . \end{aligned}$ where on the third line we have used the Jacobi identity in proposition [prop:Poisson-Jacobi]. Hamilton’s equations then imply that $\begin{equation} \label{eq:derivative-of-Poisson-bracket} \frac{d\{q_i,Q\}}{dt} = \{\dot{q}_i, Q\}\, . \end{equation}$

Using these results it is straightforward to compute the change in the Lagrangian due to $Q$ . The Lagrangian in Hamiltonian coordinates is $L(\mathbf{q},\mathbf{p}, t) = \left(\sum_{i=1}^n \dot{q}_i(\mathbf{q},\mathbf{p}) p_i\right) - H(\mathbf{q},\mathbf{p},t)\, .$ Since $Q$ is conserved, we have $\begin{split} \{L, Q\} & = \left[\sum_{i=1}^n \left(\{\dot{q}_i,Q\} p_i + \dot{q}_i\{p_i,Q\}\right)\right] + \{Q,H\}\\ & = \left[\sum_{i=1}^n \left(\{\dot{q}_i,Q\} p_i + \dot{q}_i\{p_i,Q\}\right)\right] - \frac{\partial Q}{\partial t}\, , \end{split}$ which becomes, using the results above: $\begin{split} \{L,Q\} & = \left[\sum_{i=1}^n \left(p_i\frac{d}{dt}\{q_i,Q\} + \dot{q}_i\{p_i,Q\}\right)\right] - \frac{\partial Q}{\partial t} \\ & = \left[\sum_{i=1}^n \left(\frac{d}{dt}\biggl(\{q_i,Q\} p_i\biggr) - \dot{p}_i\{q_i,Q\} + \dot{q}_i\{p_i,Q\}\right)\right] - \frac{\partial Q}{\partial t} \\ & = \left[\sum_{i=1}^n \left(\frac{d}{dt}\biggl(\{q_i,Q\} p_i\biggr) - \dot{p}_i\frac{\partial Q}{\partial p_i} - \dot{q}_i\frac{\partial Q}{\partial q_i}\right)\right] - \frac{\partial Q}{\partial t}\\ & = \frac{d}{dt}\left(\sum_{i=1}^n \{q_i,Q\} p_i\right) - \frac{d Q}{d t}\, . \end{split}$ where on the first line we have used $\eqref{eq:derivative-of-Poisson-bracket}$ , and in going from the third to the fourth the Chain Rule. Finally, using $\eqref{eq:qQ}$ we find that $\sum_{i=1}^n \{q_i,Q\} p_i = \sum_{i=1}^n a_i p_i = Q + F$ which implies $\{L, Q\} = \frac{d(Q+F-Q)}{dt} = \frac{dF}{dt}\, .$ Recalling the definition of the Hamiltonian flow operator, this gives $\Phi^{(\epsilon)}_Q(L) = L + \epsilon \{L, Q\} + \mathcal{O}(\epsilon^2) = L + \epsilon \frac{dF}{dt} + cO(\epsilon^2)\, .$

7 A review of some results in calculus

I include here a brief review of some basic results in many variable calculus that will appear often during the course.

Derivatives and partial derivatives

Let me introduce the notation $\delta f(x)=f(x+\delta x)-f(x)$ for the variation of a function $f$ as we change its argument. We typically want to make $\delta x$ small, and understand how $\delta f$ depends on $\delta x$ . The answer is that $\delta f = \frac{df}{dx}\delta x + \mathcal{O}((\delta x)^2)$ where the last term is a “correction term”, satisfying $\lim_{\delta x\to 0} \frac{\mathcal{O}((\delta x)^2)}{\delta x} = 0\, .$ Notice that this is just a restatement of the usual definition of the derivative $\frac{df}{dx} = \lim_{\delta x\to 0} \frac{f(x+\delta x) - f(x)}{(x+\delta x) - x}$ in a form which is more convenient for our applications.

This definition extends straightforwardly to functions of several variables. We define the partial derivatives of $f(x_1, \ldots, x_n)$ by $\frac{\partial f}{\partial x_i}=\lim_{\delta x_i\rightarrow 0}\frac{f(x_1, \ldots, x_{i-1}, x_i+\delta x_i, x_{i+1}, \ldots, x_n)-f(x_1,\ldots,x_n)}{\delta x}\, .$ Note that in the case of functions of a single variable the definitions of the partial and ordinary derivatives coincide.

We can now express the change in $f(\vec{x})$ under small changes $\delta\vec{x}$ of $\vec{x}$ as $\delta f(\vec{x})=f(\vec{x}+\delta{\vec{x}})-f(\vec{x})\approx \sum_{i=1}^n\delta x_i\, \frac{\partial f}{\partial x_i}=\delta {\vec{x}}\cdot \vec{\nabla} f + \mathcal{O}(\delta \vec{x}^2)$ where we have defined the vector of derivatives $(\vec{\nabla}f)_i :=\frac{\partial f(\vec{x})}{\partial x_i}\, .$ When the variation is infinitesimal we write $\delta x_i\to dx_i$ , and we have $df(\vec{x}) = \sum_{i=1}^n dx_i\, \frac{\partial f}{\partial x_i} = d\vec{x}\cdot \vec{\nabla} f \, .$

The chain rule and commuting derivatives

Assume now that the vector $\vec{x}$ is a function of time, which we denote as $\vec{x}(t)$ , and that we have a function $f(\vec{x}(t), t)$ as above (where we have included a possible explicit dependence on the time coordinate). Note that $f$ is now implicitly a function of $t$ via its dependence on $\vec{x}(t)$ , in addition to any possible explicit dependence on $t$ it might have. The variation of this function as $t$ varies is given by the chain rule $\frac{d f}{dt}=\left(\sum_{i=1}^n \frac{\partial f}{\partial x_i} \frac{dx_i}{dt}\right) + \frac{\partial f}{\partial t} = \vec{\nabla} f\cdot \frac{d{\vec{x}}}{dt} + \frac{\partial f}{\partial t}\,.$ There is a version of this rule for the case of multiple variables. Say that you have a set of variables $(x_1, \ldots, x_n)$ that depend on other variables $(y_1, \ldots, y_m)$ . Then: $\frac{\partial f}{\partial y_j}=\sum_{i=1}^n\frac{\partial f}{\partial x_i}\frac{\partial x_i}{\partial y_j} \,.$

Another theorem that we will use later is that if we partially differentiate a function first with respect to $x_i$ and then with respect to $x_j$ we obtain the same as if we differentiated in the opposite order provided that the result is continuous: $\frac{\partial }{\partial x_i}\left(\frac{\partial f}{\partial x_j}\right) = \frac{\partial }{\partial x_j}\left(\frac{\partial f}{\partial x_i}\right)\, .$ This result is known as Schwarz’s theorem. During this course all second derivatives will be continuous, so we will apply this result freely.

As it is a relatively common mistake, let me note that in general partial derivatives associates to variables belonging to different coordinates do not commute. Denoting by $x_i$ the first set of coordinates, and $u_i$ a second set, we have $\frac{\partial }{\partial x_i}\left(\frac{\partial f}{\partial u_j}\right) \neq \frac{\partial }{\partial u_j}\left(\frac{\partial f}{\partial x_i}\right)\, .$ As a simple example, consider the function $f\colon \mathbb{R}^2\to\mathbb{R}$ that tells us how far a point is from the vertical axis. We choose as $x_i$ the Cartesian coordinates $(x, y)$ , and as $u_j$ the polar coordinates $(r,\theta)$ . In Cartesian coordinates we have simply $f(x,y)=x$ , while in polar coordinates we have $f(x,\theta)=r\cos\theta$ . We have $\frac{\partial}{\partial r} \left(\frac{\partial f}{\partial x}\right) = \frac{\partial}{\partial r} \left(1\right) = 0$ while $\begin{split} \frac{\partial}{\partial x} \left(\frac{\partial f}{\partial r}\right) & = \frac{\partial}{\partial x}\left(\cos\theta\right) = \frac{\partial}{\partial x} \left(\frac{x}{\sqrt{x^2 + y^2}}\right)\\ & = \frac{1}{\sqrt{x^2 + y^2}} - \frac{x^2}{(x^2+y^2)^{3/2}}\\ & \neq 0 \, . \end{split}$

Leibniz’s rule

Assume that you have a function of $x$ expressed in integral form: $f(x) = \int_{a(x)}^{b(x)}\! dt \, g(x, t)\, .$ Then $\begin{equation} \label{eq:Leibniz-rule} \frac{df}{dx} = g(x, b(x)) \frac{db}{dx} - g(x, a(x)) \frac{da}{dx} + \int_{a(x)}^{b(x)} \!dt\, \frac{\partial g(x,t)}{\partial x}\, . \end{equation}$

Notation for time derivatives

Finally, an additional piece of notation: the time coordinate $t$ will play a special role during this course, so for convenience we introduce special notation for derivatives with respect to $t$ . Given a function $x(t)$ we will write $\dot x \equiv \frac{dx}{dt}$ and similarly for higher order time derivatives. For instance, $\ddot x \equiv \frac{d^2x}{dt^2}\, .$

7.1 Two useful lemmas for coordinate changes

Consider two sets of generalised coordinates $\{u_i\}$ and $\{q_i\}$ related by $u_i=u_i(q_1,q_2,...q_n,t)$ . Note that we allow for the change of coordinates to depend on time.²⁰ Such transformation is known as a point transformation. We also note that the $\{u_i\}$ coordinates depend on $\{q_i\}$ (and possibly $t$ ), but not on $\{\dot{q}_i\}$ . This is no longer true if we take a time derivative: generically $\dot{u}_i$ will depend on $\{q_i,\dot{q}_i,t\}$ . We start by proving the following two simple lemmas:

Lemma 1 (A).

If $u_i=u_i(q_1,q_2,...q_n,t)$ then $\frac{\partial u_i}{\partial q_j} = \frac{\partial \dot{u}_i}{\partial {\dot q}_j}.$

By the Chain Rule $\begin{aligned} \dot u_i &=& \sum_{k=1}^n\frac{\partial u_i}{\partial q_k} {\dot q}_k + \frac{\partial u_i}{\partial t}\end{aligned}$ Further differentiating with respect to ${\dot q}_j$ just picks out the coefficient of ${\dot q}_j$ (since $u_i$ does not depend on $\dot{q}_i$ its derivative ${\partial u_i}/\partial q_k$ does not either) giving the advertised result $\frac{\partial}{\partial {\dot q}_j} \left ( \dot u_i \right ) = \frac{\partial}{\partial {\dot q}_j} \left ( \sum_{k=1}^n\frac{\partial u_i}{\partial q_k} {\dot q}_k + \frac{\partial u_i}{\partial t} \right ) = \frac{\partial u_i}{\partial q_j}\, .$

Lemma 2 (B).

If $u_i=u_i(q_1,q_2,...q_n,t)$ then $\frac{\partial {\dot u_i}}{\partial q_j}=\frac{d}{dt} \left ( \frac{\partial u_i}{\partial q_j} \right ).$

We again use the Chain Rule, and the fact that partial derivatives on the same set of coordinates commute (if the result is continuous): $\frac{d}{dt} \left ( \frac{\partial u_i}{\partial q_j} \right ) = \sum_{k=1}^n \frac{\partial ^2 u_i}{\partial q_k \partial q_j} {\dot q}_k + \frac{\partial^2 u_i}{\partial t\partial q_j} = \frac{\partial}{\partial q_j}\left (\sum_{k=1}^n \frac {\partial u_i}{\partial q_k} {\dot q}_k + \frac{\partial u_i}{\partial t}\right ) = \frac{\partial}{\partial q_j} \left ( {\dot u}_i \right ) \, .$

Example: invariance of the Euler-Lagrange equations under coordinate changes

As an example of how the theorems above are useful, let us prove explicitly that the choice of generalized coordinates does not affect the form of the Euler-Lagrange equations.

Theorem 1. Assume that we have two sets of generalized coordinates $\{u_1,\ldots,u_n\}$ and $\{q_1,\ldots,q_n\}$ related by an invertible change of coordinates $u_i=u_i(q_1,\ldots,q_n,t)$ . Then the Euler-Lagrange equations $\frac{\partial L}{\partial q_i} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right) = 0 \quad \forall i\in \{1,\ldots,n\}$ are equivalent to $\frac{\partial L}{\partial u_k} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{u}_k}\right) = 0 \quad \forall k\in \{1,\ldots,n\}$

We will prove the result by repeated application of the Chain Rule. For the first term in the Euler-Lagrange equations we get $\begin{aligned} \frac{d}{dt}\left(\frac{\partial L}{\partial {\dot q}_i}\right) & = \frac{d}{dt}\left(\sum_{k=1}^n \frac{\partial L}{\partial u_k} \underbrace{\frac{\partial u_k}{\partial {\dot q}_i}}_{=0} + \sum_{k=1}^n \frac{\partial L}{\partial {\dot u}_k} \frac{\partial {\dot u}_k}{\partial {\dot q}_i} + \frac{\partial L}{\partial t}\underbrace{\frac{\partial t}{\partial \dot{q}_i}}_{=0}\right)\\ \end{aligned}$ which using Lemma (A) becomes $\begin{aligned} & = \frac{d}{dt}\left(\sum_{k=1}^n \frac{\partial L}{\partial {\dot u}_k} \frac{\partial {u}_k}{\partial {q}_i}\right)\\ & = \sum_{k=1}^n \left[\frac{d}{dt}\left(\frac{\partial L}{\partial {\dot u}_k} \right)\right] \frac{\partial {u}_k}{\partial {q}_i} + \sum_{k=1}^n \left(\frac{\partial L}{\partial {\dot u}_k}\right) \frac{d}{dt}\left(\frac{\partial {u}_k}{\partial {q}_i}\right)\\ \end{aligned}$ which is now, using Lemma (B) $\begin{aligned} & = \sum_{k=1}^n \left[\frac{d}{dt}\left(\frac{\partial L}{\partial {\dot u}_k} \right)\right] \frac{\partial {u}_k}{\partial {q}_i} + \sum_{k=1}^n \frac{\partial L}{\partial {\dot u}_k} \frac{\partial {\dot{u}}_k}{\partial {q}_i} \, .\end{aligned}$ The second term in the Euler-Lagrange equations is easier. Again using the Chain Rule: $\frac{\partial L}{\partial q_i} = \sum_{k=1}^n \frac{\partial L}{\partial u_k} \frac{\partial u_k}{\partial q_i} + \sum_{k=1}^n \frac{\partial L}{\partial {\dot u}_k} \frac{\partial {\dot u}_k}{\partial q_i} + \frac{\partial L}{\partial t}\underbrace{\frac{\partial t}{\partial q_i}}_{=0}\, .$ Taking the difference of both equations we get $\frac{d}{dt}\left(\frac{\partial L}{\partial {\dot q}_i}\right) - \frac{\partial L}{\partial q_i} = \sum_{k=1}^n \left(\frac{d}{dt}\left(\frac{\partial L}{\partial {\dot u}_k}\right) - \frac{\partial L}{\partial u_k} \right) \frac{\partial u_k}{\partial q_i}\, .$

We are almost there. In order to exhibit the rest of the argument most clearly, we will switch to matrix notation. Denote the matrix associated to the change of variables by $\mathsf{J}_{ik}:=\frac{\partial u_k}{\partial q_i}\, .$ This matrix (known as the “Jacobian matrix”) is invertible, since by assumption the change of coordinates is invertible. Denote the vector of Euler-Lagrange equations on the $q$ coordinates by $\mathsf{E}^{(q)}_i = \frac{\partial L}{\partial q_i} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{q}_i}\right)$ and similarly for the $u$ coordinates $\mathsf{E}^{(u)}_k = \frac{\partial L}{\partial u_k} - \frac{d}{dt}\left(\frac{\partial L}{\partial \dot{u}_k}\right)\, .$ Using these definitions we can rewrite the Euler-Lagrange equations as the vector equations $\vec{\mathsf{E}}^{(q)}=0$ and $\vec{\mathsf{E}}^{(u)}=0$ , and we have just shown that $\vec{\mathsf{E}}^{(q)} = \mathsf{J}\, \vec{\mathsf{E}}^{(u)}$ with $\mathsf{J}$ invertible, so $\vec{\mathsf{E}}^{(q)}=0$ iff $\vec{\mathsf{E}}^{(u)}=0$ , which is what we wanted to show.

You might want to remind yourself of section 1.9 of the Calculus I Epiphany notes.↩︎
My conventions are that smooth functions are those which have continuous derivatives to all orders.↩︎
This goes under various names in the literature. Common ones are action principle, least action principle, extremal action principle and (less precisely) variational principle. I will mostly use “action principle”, which has the advantage of being concise.↩︎
To bring the main point to light here: note that from the point of view of the Lagrangian $x(t)$ , $\delta x(t)$ , $\dot{x}(t)$ and $\delta \dot{x}(t)$ are simply numbers, not functions. Let me call them $a$ , $\epsilon \alpha$ , $b$ and $\epsilon \beta$ , respectively, to emphasize this point, where $a,\alpha,b,\beta\in\mathbb{R}$ , and $\epsilon\in\mathbb{R}$ is as in definition [def:stationary-function]. Then all we are doing here is taking the first order in the Taylor expansion of the Lagrangian in $\epsilon$ : $L(a+\epsilon\alpha, b+\epsilon\beta) = L(a,b) + \epsilon\alpha \frac{\partial L(r,s)}{\partial r}\biggr|_{(r,s)=(a,b)} + \epsilon\beta \frac{\partial L(r,s)}{\partial s}\biggr|_{(r,s)=(a,b)} + \ldots$ ↩︎
Recall from definition [def:stationary-function] that any time we talk about expanding on $\delta x$ we are really expanding on a small parameter $\epsilon$ inside $\delta x(t)=\epsilon z(t)$ (where $z(t)$ is as in definition definition [def:stationary-function]). The variation $\delta \dot{x}(t)=\epsilon \dot{z}(t)$ clearly has the same dependence on $\epsilon$ , since $\epsilon$ is just a constant that does not depend on time. We therefore have that “ $\delta \dot{x}(t)$ is first order in $\delta x(t)$ ”, at least for the purposes of counting degrees when expanding.↩︎
Recall that $\frac{d(f(t)g(t))}{dt} = \frac{df(t)}{dt}g(t) + f(t)\frac{dg(t)}{dt}$ or equivalently $f(t)\frac{dg(t)}{dt} = \frac{d(f(t)g(t))}{dt} - \frac{df(t)}{dt}g(t)\, .$ In the text we have $f(t) = \frac{\partial L}{\partial \dot x} \qquad ; \qquad g(t) = \delta x(t)\, .$ ↩︎
We know that $\mathbf{q}$ lives in $\mathcal{C}$ , by definition. Where does $\mathbf{\dot{q}}$ live? Imagine that at each point in $\mathcal{C}$ we attach a tangent space $T(\mathbf{q})$ , the space of all tangent vectors at that point. The vector $\mathbf{\dot{q}}$ is a velocity, so it is a vector in $T(\mathbf{q})$ . The total space of all such tangent spaces over all points in $\mathcal{C}$ is known as $T\mathcal{C}$ (the “tangent bundle”). So, if I wanted to be fully precise, I would say that $L\colon T\mathcal{C}\to\mathbb{R}$ . While this is the true geometric nature of the Lagrangian function, and the resulting geometric ideas are beautiful to explore, during the course we will take the more pedestrian approach of looking at things locally in $T\mathcal{C}$ , where $T\mathcal{C}\approx \mathbb{R}^{\dim(\mathcal{C})}\times \mathbb{R}^{\dim(\mathcal{C})}$ . The Lagrangian is then $L\colon \mathbb{R}^{\dim(\mathcal{C})}\times \mathbb{R}^{\dim(\mathcal{C})}\to \mathbb{R}$ , that is, a function of two vectors, which we call $\mathbf{q}$ and $\mathbf{\dot{q}}$ .↩︎
Alternatively, you can derive the Euler-Lagrange equations in any fixed coordinate system, and check that they stay invariant when you change to a different coordinate system, as done in the appendix.↩︎
Once we know that the conserved charged is there, it is not difficult to find its expression in Cartesian coordinates: we have $mr^2\dot{\theta} = m(x\dot{y}-y\dot{x})$ .↩︎
I leave this as an exercise. All you need to do is to convince yourself that our derivation of the Euler-Lagrange equations, above equation $\eqref{eq:Euler-Lagrange}$ , is not modified if the Lagrangian includes an explicit dependence on time.↩︎
An ansatz is an assumed form for the solution of the problem. We test the assumption by inserting the ansatz into the equation, and verifying that it does provide a solution for an appropriate choice of $f(t)$ .↩︎
The simplest way to prove this is to note that $\mathsf{A}$ is real symmetric, and thus diagonalisable by an orthogonal transformation $O$ as $\mathsf{A}=O D O^t$ . We can then define $\mathsf{A}^{\frac{1}{2}} = O D^{\frac{1}{2}} O^t$ .↩︎
The proof that we gave above for the fundamental lemma of the calculus of variations was for functions of a single variable. I leave it as a small exercise to generalise the proof to the case of functions of multiple variables.↩︎
The $d>1$ case can be treated similarly, with the total charge in some region being the integral of the function $\mathcal{Q}$ over that region.↩︎
We could consider more general cases, in which the Lagrangian density also depends explicitly on the space and time coordinates $t,x_0,\ldots,x_d$ . I leave the generalization of the discussion to this case as an (optional) exercise.↩︎
To see this, choose for instance $t=0$ and $t=\pi/(2pc)$ .↩︎
This will be true in the examples that we discuss during this course, at any rate. There are interesting situations in which this inversion cannot be done, but we will not study them during this course. I encourage those of you who are curious to search for material on “Dirac brackets” if you want to see how our story below generalises to these more complicated cases. A good reference is the book “Quantization of Gauge Systems”, by Henneaux and Teitelboim.↩︎
Or Newton’s formalism, for that matter! We went through all this trouble during the past weeks not because we wanted to find more efficient methods of solving the dynamics of classical systems (although that is sometimes a useful byproduct of switching perspectives), but rather because we wanted to understand better the structure of classical mechanics — important ideas like the action principle or the relation between symmetries and conserved charges become much more transparent in the Lagrangian and Hamiltonian formalisms.↩︎
Not to overload notation too much, I will leave implicit the fact that we are also keeping fixed any explicit time parameters in $A$ , unless explicitly stated otherwise.↩︎
For instance, we could have $u_i=e^{t} q_i$ , giving a sort of “expanding” set of coordinates. Such things appear fairly naturally when one is studying cosmology, for example.↩︎

Lagrangian and Hamiltonian Mechanics

1 Introduction